WO2025152945A1 - Methods and apparatus of inheriting cross-component models based on cascaded vector for video coding improvement of inter chroma - Google Patents
Methods and apparatus of inheriting cross-component models based on cascaded vector for video coding improvement of inter chromaInfo
- Publication number
- WO2025152945A1 WO2025152945A1 PCT/CN2025/072404 CN2025072404W WO2025152945A1 WO 2025152945 A1 WO2025152945 A1 WO 2025152945A1 CN 2025072404 W CN2025072404 W CN 2025072404W WO 2025152945 A1 WO2025152945 A1 WO 2025152945A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- current
- cascaded
- information
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/521—Processing of motion vectors for estimating the reliability of the determined motion vectors or motion vector field, e.g. for smoothing the motion vector field or for correcting motion vectors
Definitions
- the present invention relates to video coding system using coding tools including one or more cross component models related modes.
- the present invention relates to inter coding the chroma component using cross-component model information based on a cascaded motion vector or block vector.
- the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues.
- the residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data.
- the reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
- a method and apparatus for coding colour pictures or video using coding tools including one or more cross component models related modes are disclosed. According to this method, input data associated with a current block comprising a current first-colour block and a current second-colour block is received, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side, where the current block is coded in a non-intra mode.
- the current block has a first MV (Motion Vector) or a first BV (Block Vector) available or when referencing CCM (Cross-Component Model) information from one or more neighbouring blocks
- MV Motion Vector
- BV Block Vector
- the following steps are performed: one or more cascaded vectors are derived, wherein each cascaded vector is derived recursively as a sum of traced vectors starting from the first MV or the first BV or starting from the second MV or the second BV; target CCM information is determined based on said one or more cascaded vectors; a merge list comprising the target CCM information is determined; and the current second-colour block is encoded or decoded by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block when the target CCM information is selected.
- Fig. 13 illustrates the patterns of the m taps in a window region M2 x N2 around/including the position (i C , j C ) to derive the sourceTermSet1 (i, j) , where only the centre is used.
- Fig. 14 illustrates a flowchart of an exemplary video coding system that uses CCM information associated with a cascaded vector for inter chroma coding according to an embodiment of the present invention.
- the CCLM parameters ( ⁇ and ⁇ ) are derived with at most four neighbouring chroma samples and their corresponding down-sampled luma samples.
- ⁇ LM_LA, LM_A, LM_L ⁇ and ⁇ CCLM_LT, CCLM_T, CCLM_L ⁇ are used interchangeably.
- MMLM Multiple Model CCLM
- MMLM multiple model CCLM mode
- JEM J. Chen, E. Alshina, G.J. Sullivan, J.-R. Ohm, and J. Boyce, Algorithm Description of Joint Exploration Test Model 7, document JVET-G1001, ITU-T/ISO/IEC Joint Video Exploration Team (JVET) , Jul. 2017
- MMLM multiple model CCLM mode
- neighbouring luma samples and neighbouring chroma samples of the current block are classified into two groups, each group is used as a training set to derive a linear model (i.e., a particular ⁇ and ⁇ are derived for a particular group) .
- the samples of the current luma block are also classified based on the same rule for the classification of neighbouring luma samples.
- CCCM mode with 3x2 filter using non-down-sampled luma samples which consists of 6-tap spatial terms, four nonlinear terms and a bias term.
- the 6-tap spatial terms correspond to 6 neighbouring luma samples (i.e., L0, L1, ..., L5) around the chroma sample (i.e., C) to be predicted, the four non-linear terms are derived from the samples L0, L1, L2, and L3 as shown as follows, where the locations of the non-down-sampled luma samples are shown in Fig. 3.
- the reconstructed Cb signal is formed by combining the filtered-predicted Cb 440 and residual Cb signal (i.e., resCb) using an adder 442.
- the reconstructed Cr signal is formed by combining the filtered-predicted Cr 450 and residual Cr signal (i.e., resCr) using an adder 452.
- the proposed 8-tap filter consist of 6 spatial luma samples, a nonlinear term, and a bias term.
- the spatial luma samples (L0, ..., L5) are obtained from the luma grid selecting the 6 luma samples closest to the chroma position C without down sampling as shown in Fig. 5.
- Intra template matching prediction is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches the current template. For a predefined search range, the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.
- the derivation of spatial merge candidates in VVC is the same as that in HEVC except that the positions of first two merge candidates are swapped.
- a maximum of four merge candidates (B 0, A 0, B 1 and A 1 ) for current CU 610 are selected among candidates located in the positions depicted in Fig. 6.
- the order of derivation is B 0, A 0, B 1, A 1 and B 2 .
- Position B 2 is considered only when one or more neighbouring CU of positions B 0 , A 0 , B 1 , A 1 are not available (e.g. belonging to another slice or tile) or is intra coded.
- After candidate at position A 0 is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with the same motion information are excluded from the list so that coding efficiency is improved.
- a scaled motion vector is derived based on the co-located CU 820 belonging to the collocated reference picture as shown in Fig. 8.
- the reference picture list and the reference index to be used for the derivation of the co-located CU is explicitly signalled in the slice header.
- the scaled motion vector 830 for the temporal merge candidate is obtained as illustrated by the dotted line in Fig.
- Pairwise average candidates are generated by averaging predefined pairs of candidates in the existing merge candidate list, using the first two merge candidates.
- the first merge candidate is defined as p0Cand and the second merge candidate can be defined as p1Cand, respectively.
- the averaged motion vectors are calculated according to the availability of the motion vector of p0Cand and p1Cand separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures, and its reference picture is set to the one of p0Cand; if only one motion vector is available, use the one directly; if no motion vector is available, keep this list invalid. Also, if the half-pel interpolation filter indices of p0Cand and p1Cand are different, it is set to 0.
- the second scheme is that for a coding unit (under single tree splitting) including luma (Y) and chroma (Cb and/or Cr) components or for a coding unit (under chroma dual tree splitting) including chroma (Cb and/or Cr) components, the prediction for Cr is improved by applying the cross-component models to information (current reconstructed or predicted) from Cb.
- inter CCLM inter cross-component linear model
- CCCM convolutional cross-component model
- inter CCCM inter cross-component convolution model
- the used model parameters can be saved and/or referenced by the following coding blocks.
- the self-derived cross-component being CCRM
- all or any subset of the model parameters can be saved.
- the following coding block is intra, it is allowed to use the saved model parameters.
- the following coding block is inter or any mode-type (e.g., IBC)
- the following coding block and the current block have different mode-types (e.g., one being an inter block and one being not an inter block) , it is not allowed to use the saved model parameters.
- the used model parameters can be saved and/or referenced by the following coding blocks.
- the inherited CCCM all or any subset of the model parameters can be saved.
- the following coding block is intra, it is allowed to use the saved model parameters.
- the following coding block is inter or any mode-type (e.g. IBC) , it is allowed to use the saved model parameters.
- the following coding block has a different mode-type (e.g., not an inter block) , it is not allowed to use the saved model parameters.
- a second-round valid checking is further used when the mentioned valid checking (e.g., neighbouring block not being cross-component mode or neighbouring block not using/combining cross-component mode)
- the motion vectors and/or block vectors of the neighbouring block can be used to find the cross-component models. Variations of how to use motion vector and/or block vectors to find the model can reference the description of “Temporal model information from collocated blocks” in the above candidate type list. If the model is found, the second-round valid checking for the neighbouring block is satisfied and the found models can be inserted in the list; otherwise, the neighbouring block is not valid for inserting. When scanning the spatial neighbouring blocks, a candidate is added into the list if the candidate is valid.
- the collocated block in another sub-embodiment of the candidate type being “Temporal model information from collocated blocks” , in the first case, the collocated block is from the block in the reference picture or the pre-defined collocated picture as inter mode by using the current block position and/or the current block motion, and/or in the second case, the collocated block is from the block in the reference picture or the pre-defined collocated picture as inter mode by using the current block position and/or the neighbouring block motion.
- the collocated block in the first case, for example, when the current block is coded by inter prediction mode, the collocated block is referred by the motion information (including the motion vectors and the reference picture indicated by the reference index) of the current block.
- the forbidden method or the scaling method can be used in the second case. If the proposed methods are applied to an IBC block or any mode using block vectors (in the first case, the current block being IBC; in the second case, the neighbouring block being IBC) , block vector information is used as motion vector where the block vector information is determined by signalling and/or template matching in a pre-defined searching range like intraTMP and/or any implicit or explicit pre-defined rules.
- the default model information is added if the list is not full after inserting all pre-defined candidates.
- the default model can be CCLM models.
- the default alpha (or named as ⁇ , a, or scaling parameters) are selected from ⁇ 0, 1/8, -1/8, 2/8, -2/8, 3/8, -3/8, ... ⁇
- the beta (or named as ⁇ , b, or offset parameter) is based on the selected default alpha, average neighbouring reconstructed luma sample value, and average neighbouring reconstructed chroma (Cb/Cr) sample value.
- one or more self-derived cross-component candidates are included.
- the self-derived cross-component candidates are described in the section entitled “Self-derived Cross-Component Model” .
- the self-derived cross-component candidates are added only when the list does not contain enough inherited candidates. For example, the self-derived candidates are added before the default candidates or treated as the default candidates.
- the self-derived cross-component candidates are added in any pre-defined position in the modelList. For example, the position is after the spatial adjacent candidates. For another example, the position is after the spatial non-adjacent candidates. For another example, the position is after all or any subset of temporal candidates. After building the list, in one embodiment, the list is reordered.
- the choice between applying inter CCLM or not applying inter CCLM depends on signalling.
- the additional signal is not required and the one or more models are selected according to an implicit rule.
- the first candidate in the list is used. If the list is reordered by the template cost, then, the first candidate is the candidate with the smallest template cost.
- prediction or reconstruction-based model is used to generate one hypothesis of prediction for the current chroma component.
- the predicted samples for the first component are down-sampled with the downsampling filters, which may be fixed at one-predefined filter or selected among some candidate filters.
- the reconstructed samples for the first component are down-sampled with the downsampling filters, which may be fixed at one-predefined filter or selected among some candidate filters.
- Prediction or reconstruction based convolution model is similar to the proposed methods for the prediction or reconstruction based linear model.
- the main difference is that the model coefficient pattern follows CCCM (not CCLM) and the luma samples may or may not be down-sampled first.
- CCLM multiple hypotheses (MH) of cross-component predictions are blended or multiple models are used to generate a hypothesis of prediction for the current block.
- Multiple-hypothesis CCLM is proposed to blend the predictions from multiple CCLM methods.
- the term “CCLM methods” can refer all the cross-component modes.
- the to-be-blended CCLM methods can be from (but are not limited to) the above mentioned CCLM methods (e.g., CCLM, MMLM, CCCM, GLM, CCRM, ...) and/or models defined in the embodiment described in noteA.
- a weighting scheme is used for blending.
- CCLM for inter block can also be named as “inter CCLM” and “CCLM” can be extended to any LM mode (or any cross-component mode) or replaced with any LM mode (or any cross-component mode) .
- CCLM for inter block can also be named as inter CCCM.
- an implicit rule (not using the additional flag) is used to determine whether to use the re-derived model.
- the candidate with the smallest cost (e.g., the first candidate in the modelList) is implicitly selected to generate the cross-component prediction.
- an index is signalled to select one or more candidates from the modelList. More details can be found in Section II.
- the current block has motion vector or block vector available and the current block is cross-component prediction (CCP) coded
- CCP cross-component prediction
- the CCM information of the reference block located by the cascaded vector as described in the section entitled “Cascaded Vector Cross-Component Models” can also be stored in the current block for future referencing.
- the current block is not CCP code
- the CCM information of the reference block located by the cascaded vector as described in the section entitled “Cascaded Vector Cross-Component Models” can also be stored in the current block for future referencing.
- the maximum number of sets of CCM information allowed to be stored in one block can be pre-defined. If the available number of sets of CCM information exceeds the maximum allowed number, the priority of the CCM information to be stored can be pre-defined. For example, if the current block is CCP coded, the current CCM information has the highest priority. For another example, the priority can be determined based on the trace depth of the cascaded vector. The shorter the trace depth is, the higher the priority is for the CCM information. The above rules can be combined.
- an example of the self-derived cross-component model is CCRM.
- the model filtering shape/pattern, parameter terms
- the model is unified with the cross-component model in regular intra mode.
- CCRM model can be unified with any pre-defined existing intra cross-component model (e.g. CCCM using non-downsampled luma samples, GLM, MMLM) and/or the self-derivation only means that the input for deriving model parameters is from the current chroma and collocated luma samples (e.g. motion compensation results if the current block is inter) .
- SourceTermSet0 (i, j) includes one or more luma source terms denoted as sourceTerm00, sourceTerm01, ..., and/or sourceTerm0n-1.
- the value of n means the number of taps for the source term set.
- the pattern of the n taps refers to a pattern defined as any subset of a window region M x N around/including the position (i L , j L ) as shown in Fig. 12. If the target sample is luma, (i L , j L ) is (i, j) . If the target sample is chroma (e.g. Cb or Cr) , (i L , j L ) is the collocated luma position from (i, j) .
- chroma e.g. Cb or Cr
- SourceTermSet1 (i, j) includes one or more chroma (Cb or Cr) source terms denoted as sourceTerm00, sourceTerm01, ..., and/or sourceTerm0m-1.
- the value of m means the number of taps for the source term set.
- the source terms can be linear terms and/or non-linear terms, only linear terms, and/or only non-linear terms.
- m is a pre-defined value such as 1, 2, ...or any positive integer. For example, the pre-defined value is fixed in the standard.
- the pattern of the m taps refers to a pattern defined as any subset of a window region M2 x N2 around/including the position (i C , j C ) as shown in Fig. 13. If the target sample is chroma (Cb or Cr) , (i C , j C ) is (i, j) . If the target sample is luma, (i C , j C ) is the collocated chroma position from (i, j) .
- the following embodiments are used to determine generation of source content.
- the source content is based on a predicted sample generated by a prediction mode and/or a reconstructed sample generated based on the predicted sample based on a prediction mode and a reconstructed residual.
- the source content is the filtered source or the source with any pre-processing.
- the source content is the predicted/reconstructed sample after filtering with a pre-defined model or filter.
- the source content is gradient information from the predicted samples and/or reconstructed samples.
- the source term may further include location information. For example, if the target sample refers to chroma, the horizontal location (i) of (i, j) is used in a source term and the vertical location (j) of (i, j) is used in a source term.
- block in this invention can refer to TU/TB, CU/CB, PU/PB, or CTU/CTB.
- LM in this invention can be viewed as one kind of CCLM/MMLM modes or any other extension/variation of CCLM (e.g. the proposed CCLM extension/variation in this invention) .
- One variation is MMLM which uses thresholds to decide different models for different samples in the current chroma component.
- Another variation is that for Cb (or Cr) , deriving model parameters from multiple collocated luma blocks.
- Cb or Cr
- the variations of CCLM here mean that some optional modes can be selected when the block indication refers to using one of cross-component modes (e.g.
- any of the foregoing proposed methods of cascaded vector derivation and associated CCM information can be implemented in encoders and/or decoders.
- any of the proposed methods can be implemented in an inter/intra/prediction module of an encoder, and/or an inter/intra/prediction module of a decoder.
- any of the proposed methods can be implemented as a circuit coupled to the inter/intra/prediction module of the encoder and/or the inter/intra/prediction module of the decoder, so as to provide the information needed by the inter/intra/prediction module.
- the method of cascaded vector derivation and associated CCM information as described above can be implemented in an encoder side or a decoder side.
- any of the proposed method can be implemented in an Intra coding module (e.g. Intra Pred 150 in Fig. 1B) in a decoder or an Intra coding module is an encoder (e.g. Intra Pred. 110 in Fig. 1A) .
- Any of the proposed propagated cross-component prediction can also be implemented as a circuit coupled to the intra/inter coding module at the decoder or the encoder.
- the decoder or encoder may also use additional processing unit to implement the propagated cross-component prediction processing. While the intra prediction units (e.g. unit 110 in Fig.
- FIG. 1A and unit 150 in Fig. 1B are shown as individual processing units, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
- a CPU Central Processing Unit
- programmable devices e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) .
- Fig. 14 illustrates a flowchart of an exemplary video coding system that uses CCM information associated with a cascaded vector for inter chroma coding according to an embodiment of the present invention.
- the steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g. one or more CPUs) at the encoder or decoder side.
- the steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart.
- input data associated with a current block comprising a current first-colour block and a current second-colour block is received in step 1410, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side, where the current block is coded in a non-intra mode.
- the current block has a first MV (Motion Vector) or a first BV (Block Vector) available, or when referencing CCM (Cross-Component Model) information from one or more neighbouring blocks, whether at least one neighbouring block has a second MV or a second BV is checked in step 1420.
- steps 1430-1460 are performed. Otherwise (i.e., the “No” path) , steps 1430-1460 are skipped.
- step 1430 one or more cascaded vectors are derived, wherein each cascaded vector is derived recursively as a sum of traced vectors starting from the first MV or the first BV or starting from the second MV or the second BV.
- step 1440 target CCM information is determined based on said one or more cascaded vectors.
- a merge list comprising the target CCM information is determined.
- the current second-colour block is encoded or decoded by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block when the target CCM information is selected.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Color Television Systems (AREA)
Abstract
Method and apparatus for coding the chroma component using Cross-Component Model (CCM) information based on a cascaded motion vector or block vector are disclosed. According to this method, if the current block has an MV/BV available or when referencing CCM information from neighbouring blocks, if at least one neighbouring block has an MV/BV: one or more cascaded vectors are derived, wherein each cascaded vector is derived recursively as a sum of traced vectors starting from the first MV/BV or the second MV/BV; target CCM information is determined based on said one or more cascaded vectors and a merge list is derived accordingly; and the current second-colour block is inter encoded or decoded by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/620,920, filed on January 15, 2024. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.
The present invention relates to video coding system using coding tools including one or more cross component models related modes. In particular, the present invention relates to inter coding the chroma component using cross-component model information based on a cascaded motion vector or block vector.
BACKGROUND AND RELATED ART
BACKGROUND AND RELATED ART
Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing. For Intra Prediction 110, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, is provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
The decoder, as shown in Fig. 1B, can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
In order to improve the inter coding performance for a system using cross-component models, methods and apparatus of using cross-component model associated with a cascaded vector are disclosed.
BRIEF SUMMARY OF THE INVENTION
BRIEF SUMMARY OF THE INVENTION
A method and apparatus for coding colour pictures or video using coding tools including one or more cross component models related modes are disclosed. According to this method, input data associated with a current block comprising a current first-colour block and a current second-colour block is received, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side, where the current block is coded in a non-intra mode. If the current block has a first MV (Motion Vector) or a first BV (Block Vector) available or when referencing CCM (Cross-Component Model) information from one or more neighbouring blocks, if at least one neighbouring block has a second MV or a second BV, the following steps are performed: one or more cascaded vectors are derived, wherein each cascaded vector is derived recursively as a sum of traced vectors starting from the first MV or the first BV or starting from the second MV or the second BV; target CCM information is determined based on said one or more cascaded vectors; a merge list comprising the target CCM information is determined; and the current second-colour block is encoded or decoded by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block when the target CCM information is selected.
In one embodiment, if a second reference block indicated by a current cascaded vector has a third MV or a third BV, the third MV or the third BV is used as a next traced vector and the next cascaded vector is formed by adding the next traced vector to the current cascaded vector, wherein the current cascaded vector is initially set to the first MV or the first BV or set to the second MV or the second BV respectively. In one embodiment, each traced vector corresponds to an L0 MV, an L1 MV, or one BV.
In one embodiment, for each recursion, if one reference block indicated by one cascaded vector has multiple MVs or BVs, multiple cascaded vectors are derived. In one embodiment, a set of cascaded vectors is derived for different numbers of trace depth. In one embodiment, a set of cascaded vectors is derived from all possible sums of the traced vectors, and wherein the traced vectors correspond to a target trace depth.
In one embodiment, trace depth associated with said one or more cascaded vectors corresponds to a finite number smaller than a maximum limit. In another embodiment, trace depth associated with said one or more cascaded vectors corresponds an infinite number. In yet another embodiment, trace depth associated with said one or more cascaded vectors corresponds to a pre-defined number.
In one embodiment, each of said one or more neighbouring block corresponds to a CU/CB, PU, TU/TB or a corresponding block with a same size as the current block.
In one embodiment, when deriving corresponding CCM information to be stored in a target block, after finishing encoding/decoding the target block, multiple sets of CCM information are stored in the target block. In one embodiment, if the target block has a target MV or BV available and the target block is CCP (Cross-Component Prediction) coded, the CCM information of one or more reference blocks located by one or more cascaded vectors are also stored in the target block in addition to storing the CCM information used by the target block. In another embodiment, if the target block has a target MV or BV available and the target block is not CCP (Cross-Component Prediction) coded, the CCM information of one or more reference blocks located by one or more cascaded vectors are stored. In yet another embodiment, a maximum number of sets of the CCM information allowed to be stored in one block is pre-defined.
In one embodiment, if an available number of sets of the CCM information exceeds a maximum allowed number, priority of the CCM information to be stored is pre-defined. In one embodiment, if the target block is CCP coded, the CCM information used by the target block has a highest priority. In one embodiment, the priority is determined based on trace depth of a target cascaded vector. In one embodiment, the CCM information associated with a shorter trace depth has a higher priority.
Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing.
Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
Fig. 2 shows 16 gradient patterns for GLM.
Fig. 3 illustrates the 6-tap spatial terms corresponding to 6 neighbouring luma samples (i.e., L0, L1, …, L5) around the chroma sample (i.e., C) to be predicted for CCCM mode.
Fig. 4 shows an exemplary system block diagram for Cross-component residual model (CCRM) .
Fig. 5 illustrates the luma samples L0, . ., L5 in relation to the chroma sample C.
Fig. 6 illustrates the 5 neighbouring blocks used for deriving spatial merge candidates for VVC.
Fig. 7 illustrates an exemplary pattern of the adjacent and non-adjacent spatial merge candidates.
Fig. 8 illustrates an example of temporal candidate derivation, where a scaled motion vector is derived according to POC (Picture Order Count) distances.
Fig. 9 illustrates examples of CCM information propagation based on block vectors, where the blocks with dash line (i.e., A, E, G) are coded in a cross-component model.
Fig. 10 illustrates examples of CCM information propagation based on motion vectors, where the blocks with dash line (i.e., A, E, G) are coded in a cross-component model.
Fig. 11 illustrates an example of the cascaded vector derivation, where the cascaded vector is derived as the sum of the recursively traced motion vectors and block vectors based on the motion vector or the block vector of the neighbouring block.
Fig. 12 illustrates the patterns of the n taps in a window region M x N around/including the position (iL, jL) to derive the sourceTermSet0 (i, j) , where only the centre is used.
Fig. 13 illustrates the patterns of the m taps in a window region M2 x N2 around/including the position (iC, jC) to derive the sourceTermSet1 (i, j) , where only the centre is used.
Fig. 14 illustrates a flowchart of an exemplary video coding system that uses CCM information associated with a cascaded vector for inter chroma coding according to an embodiment of the present invention.
It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.
Cross-Component Linear Model (CCLM) Prediction
To reduce the cross-component redundancy, a cross-component linear model (CCLM) prediction mode is used in the VVC, for which the chroma samples are predicted based on the reconstructed luma samples of the same CU by using a linear model as follows:
predC (i, j) =α·recL′ (i, j) + β (1)
where predC (i, j) represents the predicted chroma samples in a CU and recL′ (i, j) represents
the downsampled reconstructed luma samples of the same CU. α is called the scaling parameter and β is called the offset parameter.
predC (i, j) =α·recL′ (i, j) + β (1)
where predC (i, j) represents the predicted chroma samples in a CU and recL′ (i, j) represents
the downsampled reconstructed luma samples of the same CU. α is called the scaling parameter and β is called the offset parameter.
The CCLM parameters (α and β) are derived with at most four neighbouring chroma samples and their corresponding down-sampled luma samples. Suppose the current chroma block dimensions are W×H, then W’a nd H’a re set as
–W’ = W, H’ = H when LM_LA mode is applied;
–W’ =W + H when LM_Amode is applied;
–H’ = H + W when LM_L mode is applied.
–W’ = W, H’ = H when LM_LA mode is applied;
–W’ =W + H when LM_Amode is applied;
–H’ = H + W when LM_L mode is applied.
In this disclosure, the term {LM_LA, LM_A, LM_L} and {CCLM_LT, CCLM_T, CCLM_L} are used interchangeably.
Multiple Model CCLM (MMLM)
In the JEM (J. Chen, E. Alshina, G.J. Sullivan, J.-R. Ohm, and J. Boyce, Algorithm Description of Joint Exploration Test Model 7, document JVET-G1001, ITU-T/ISO/IEC Joint Video Exploration Team (JVET) , Jul. 2017) , multiple model CCLM mode (MMLM) is proposed for using two models for predicting the chroma samples from the luma samples for the whole CU. In MMLM, neighbouring luma samples and neighbouring chroma samples of the current block are classified into two groups, each group is used as a training set to derive a linear model (i.e., a particular α and β are derived for a particular group) . Furthermore, the samples of the current luma block are also classified based on the same rule for the classification of neighbouring luma samples.
Threshold is calculated as the average value of the neighbouring reconstructed luma samples. A neighbouring sample with Rec′L [x, y] <= Threshold is classified into group 1; while a neighbouring sample with Rec′L [x, y] > Threshold is classified into group 2.
In this disclosure, the term {MMLM_LA, MMLM_A} and {MMLM_LT, MMLM_T} are used interchangeably.
Convolutional Cross-Component Model (CCCM)
In CCCM, a convolutional model is applied to improve the chroma prediction performance. The convolutional model has 7-tap filter consisting of a 5-tap plus sign shape spatial component, a nonlinear term and a bias term.
Output of the filter is calculated as a convolution between the filter coefficients and the input values and clipped to the range of valid chroma samples.
The filter coefficients are calculated by minimising MSE between predicted and reconstructed chroma samples in the reference area.
Gradient Linear Model (GLM)
Compared with the CCLM, instead of down-sampled luma values, the GLM utilizes luma sample gradients to derive the linear model. Specifically, when the GLM is applied, the input to the CCLM process, i.e., the down-sampled luma samples L, are replaced by luma sample gradients G. The other parts of the CCLM (e.g., parameter derivation, prediction sample linear transform) are kept unchanged:
C=α·G+β.
C=α·G+β.
Fig. 2 shows the 16 gradient filters (210-240) for the gradient calculation.
Intra Block Copy
Intra block copy (IBC) is a tool adopted in HEVC extensions on screen content coding (SCC) . It is well known that it significantly improves the coding efficiency of screen content materials. Since IBC mode is implemented as a block level coding mode, block matching (BM) is performed at the encoder to find the optimal block vector (or motion vector) for each CU. Here, a block vector is used to indicate the displacement from the current block to a reference block, which is already reconstructed inside the current picture. The luma block vector of an IBC-coded CU is in integer precision. The chroma block vector is rounded to integer precision as well. When combined with AMVR, the IBC mode can switch between 1-pel and 4-pel motion vector precisions. An IBC-coded CU is treated as the third prediction mode other than intra or inter prediction modes. The IBC mode is applicable to the CUs with both width and height smaller than or equal to 64 luma samples.
CCCM Using Non-Down-sampled Luma Samples
CCCM mode with 3x2 filter using non-down-sampled luma samples is used, which consists of 6-tap spatial terms, four nonlinear terms and a bias term. The 6-tap spatial terms correspond to 6 neighbouring luma samples (i.e., L0, L1, …, L5) around the chroma sample (i.e., C) to be predicted, the four non-linear terms are derived from the samples L0, L1, L2, and L3 as shown as follows, where the locations of the non-down-sampled luma samples are shown in Fig. 3.
Cross-Component Residual Model (CCRM)
As in JVET-AD0108 (Pekka Astola, et. al., “AHG12: Cross-component residual model (CCRM) for inter prediction” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 30th Meeting, Antalya, TR, 21–28 April 2023, Document: JVET-AD0108) , it is to apply cross-component residual model (CCRM) to predict chroma samples from reconstructed luma samples when the block uses inter prediction or intra block copy (IBC) . Fig. 4 illustrates the decoder side of the method. The cross-component filters are derived using the prediction signals of luma and chroma. The derived filters are applied to the reconstructed luma signal producing the final chroma predictions. Filter coefficients are derived in step 420 for each chroma component separately using the prediction signals (i.e., predY 410, and predCb 412 or predCr 414) and the filters are applied to the reconstructed luma signal in step 430 as shown in Fig. 4. The reconstructed luma signal is formed by combining the luma prediction (PredY) 410 and residual luma signal (resY) using an adder 422. After applying the filters, the step 430 generates filtered-predicted Cb 440 and filtered-predicted Cr 450. The reconstructed Cb signal is formed by combining the filtered-predicted Cb 440 and residual Cb signal (i.e., resCb) using an adder 442. Similarly, the reconstructed Cr signal is formed by combining the filtered-predicted Cr 450 and residual Cr signal (i.e., resCr) using an adder 452.
The proposed 8-tap filter consist of 6 spatial luma samples, a nonlinear term, and a bias term. The spatial luma samples (L0, …, L5) are obtained from the luma grid selecting the 6 luma samples closest to the chroma position C without down sampling as shown in Fig. 5. The predicted chroma value is obtained as,
predChromaVal = c0 L0+ c1L1 + c2L2 + c3L3 + c4L4 + c5L5
+ c6 nonlinear ( (L0+L3+1) >> 1) + c7 B,
where nonlinear is CCCM’s nonlinear operator and B is bias.
predChromaVal = c0 L0+ c1L1 + c2L2 + c3L3 + c4L4 + c5L5
+ c6 nonlinear ( (L0+L3+1) >> 1) + c7 B,
where nonlinear is CCCM’s nonlinear operator and B is bias.
The filter coefficients are derived using ECM’s division-free Gaussian elimination method and the necessary offsets are applied to samples prior to filter derivation.
Intra reference samples are used as additional input samples in filter derivation when the block has less than 64 chroma samples. CCCM’s design of at most 6 rows and columns of intra reference samples is used.
Intra Template Matching
Intra template matching prediction (IntraTMP) is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches the current template. For a predefined search range, the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.
Extended Merge Prediction
In VVC, the merge candidate list is constructed by including the following five types of candidates in order:
1) Spatial MVP from spatial neighbour CUs
2) Temporal MVP from collocated CUs
3) History-based MVP from an FIFO table
4) Pairwise average MVP
5) Zero MVs.
1) Spatial MVP from spatial neighbour CUs
2) Temporal MVP from collocated CUs
3) History-based MVP from an FIFO table
4) Pairwise average MVP
5) Zero MVs.
Spatial Candidate Derivation
The derivation of spatial merge candidates in VVC is the same as that in HEVC except that the positions of first two merge candidates are swapped. A maximum of four merge candidates (B0, A0, B1 and A1) for current CU 610 are selected among candidates located in the positions depicted in Fig. 6. The order of derivation is B0, A0, B1, A1 and B2. Position B2 is considered only when one or more neighbouring CU of positions B0, A0, B1, A1 are not available (e.g. belonging to another slice or tile) or is intra coded. After candidate at position A0 is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with the same motion information are excluded from the list so that coding efficiency is improved.
In addition to the above-mentioned spatial candidates, the non-adjacent spatial merge candidates as in JVET-L0399 (Yu Han, et al., “CE4.4.6: Improvement on Merge/Skip mode” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 12th Meeting: Macao, CN, 3–12 Oct. 2018, Document: JVET-L0399) are inserted after the TMVP in the regular merge candidate list. An example of the pattern of spatial merge candidates is shown in Fig. 7. The distances between non-adjacent spatial candidates and current coding block are based on the width and height of current coding block. The line buffer restriction is not applied.
Temporal Candidates Derivation
In this step, only one candidate is added to the list. Particularly, in the derivation of this temporal merge candidate for a current CU 810, a scaled motion vector is derived based on the co-located CU 820 belonging to the collocated reference picture as shown in Fig. 8. The reference picture list and the reference index to be used for the derivation of the co-located CU is explicitly signalled in the slice header. The scaled motion vector 830 for the temporal merge candidate is obtained as illustrated by the dotted line in Fig. 8, which is scaled from the motion vector 840 of the co-located CU using the POC (Picture Order Count) distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero.
History-based Merge Candidates Derivation
The history-based MVP (HMVP) merge candidates are added to merge list after the spatial MVP and TMVP. In this method, the motion information of a previously coded block is stored in a table and used as MVP for the current CU. The table with multiple HMVP candidates is maintained during the encoding/decoding process. The table is reset (emptied) when a new CTU row is encountered. Whenever there is a non-subblock inter-coded CU, the associated motion information is added to the last entry of the table as a new HMVP candidate.
Pair-wise Average Merge Candidates Derivation
Pairwise average candidates are generated by averaging predefined pairs of candidates in the existing merge candidate list, using the first two merge candidates. The first merge candidate is defined as p0Cand and the second merge candidate can be defined as p1Cand, respectively. The averaged motion vectors are calculated according to the availability of the motion vector of p0Cand and p1Cand separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures, and its reference picture is set to the one of p0Cand; if only one motion vector is available, use the one directly; if no motion vector is available, keep this list invalid. Also, if the half-pel interpolation filter indices of p0Cand and p1Cand are different, it is set to 0.
When the merge list is not full after pair-wise average merge candidates are added, the zero MVPs are inserted in the end until the maximum merge candidate number is encountered.
The cross-component information is used to improve prediction accuracy of an inter block. To improve the prediction accuracy of the chroma component of the inter block, the luma information from the corresponding luma component of the current chroma block, and/or the chroma information from the current chroma block, and/or the chroma information from the previous coded chroma component are used.
-The first scheme is that for a coding unit (under single tree splitting)
including luma (Y) and chroma (Cb and/or Cr) components, the prediction for Cb and/or Cr is improved by applying the cross-component models to information (current reconstructed or predicted) from Y.
-The second scheme is that for a coding unit (under single tree splitting)
including luma (Y) and chroma (Cb and/or Cr) components or for a coding unit (under chroma dual tree splitting) including chroma (Cb and/or Cr) components, the prediction for Cr is improved by applying the cross-component models to information (current reconstructed or predicted) from Cb.
-The first scheme is that for a coding unit (under single tree splitting)
including luma (Y) and chroma (Cb and/or Cr) components, the prediction for Cb and/or Cr is improved by applying the cross-component models to information (current reconstructed or predicted) from Y.
-The second scheme is that for a coding unit (under single tree splitting)
including luma (Y) and chroma (Cb and/or Cr) components or for a coding unit (under chroma dual tree splitting) including chroma (Cb and/or Cr) components, the prediction for Cr is improved by applying the cross-component models to information (current reconstructed or predicted) from Cb.
In the following, several embodiments related to the first scheme are proposed to use an inherited cross-component mode for the current chroma block with the following steps: Step (1) building a candidate list (modelList) for the current block where the candidate list includes cross-component models; Step (2) selecting one or more sets of model information in the list; and Step (3) using the model information (similar to intra chroma cross-component mode) to generate one or more hypotheses of predictions for the current chroma component (Cb or Cr) by applying and/or modifying the selected model information to the reconstructed or predicted samples for the corresponding luma component.
When the selected model information refers to traditional cross-component linear model (s) , the proposed method is called as inter cross-component linear model (inter CCLM) mode. When the selected model information refers to convolutional cross-component model (s) (CCCM) derived by a regression-based method (e.g. CCCM) , the proposed method is called as inter cross-component convolution model (inter CCCM) mode.
Moreover, in some embodiments, a self-derived (a. k. a. re-derived) cross-component mode is proposed and can be added into the candidate list in Step (1) . In some embodiments, the self-derived cross-component mode is not added into the list and a selection of using the proposed inherited mode and/or using the proposed self-derived mode is designed. In some embodiments, the selection of using the proposed inherited mode and/or using the proposed self-derived mode is determined following an explicit rule, an implicit rule, or both. More details are described in the section entitled “IV. Selection of Using the Proposed Inherited Mode and/or Using the Proposed Self-Derived Mode” . In this disclosure, the term “self-derived” and “re-derived” are used interchangeably.
In one embodiment, the proposed embodiments can also be used for the second scheme by using the previous coded chroma component (Cb) as the luma component in the first scheme.
Storage and Inheritance of the Model
In another embodiment, when the current inter block uses the model parameters from the self-derived cross-component mode, the used model parameters can be saved and/or referenced by the following coding blocks. For an example of the self-derived cross-component being CCRM, all or any subset of the model parameters can be saved. In one embodiment, if the following coding block is intra, it is allowed to use the saved model parameters. If the following coding block is inter or any mode-type (e.g., IBC) , it is allowed to use the saved model parameters. In another embodiment, if the following coding block and the current block have different mode-types (e.g., one being an inter block and one being not an inter block) , it is not allowed to use the saved model parameters.
In another embodiment, when the current inter block uses the inherited cross-component mode, the used model parameters can be saved and/or referenced by the following coding blocks. For an example of the inherited CCCM, all or any subset of the model parameters can be saved. In one embodiment, if the following coding block is intra, it is allowed to use the saved model parameters. If the following coding block is inter or any mode-type (e.g. IBC) , it is allowed to use the saved model parameters. In another embodiment, if the following coding block has different mode-type (e.g., not an inter block) , it is not allowed to use the saved model parameters.
In another embodiment, when the current inter block uses any cross-component models (e.g. the inherited cross-component model, the self-derived cross-component model, cross-component model used in chroma fusion which means the chroma prediction is based on adding one or more hypotheses of cross-component prediction to one or more existing hypotheses of prediction of non-cross-component prediction, or any combination of the above) , the used model parameters can be saved and/or referenced by the following coding blocks. For an example of the inherited CCCM, all or any subset of the model parameters can be saved. In one embodiment, if the following coding block is intra, it is allowed to use the saved model parameters. If the following coding block is inter or any mode-type (e.g. IBC) , it is allowed to use the saved model parameters. In another embodiment, if the following coding block has a different mode-type (e.g., not an inter block) , it is not allowed to use the saved model parameters.
I. Building a Candidate List Including Cross-Component Models
In one embodiment, when building the merge-like candidate model list (modelList) , one or more sets of the following candidate model information are included. For each candidate in the list, it refers to a candidate model information. The definition of the model information can be found in the section entitled: “V.1. Inheriting CCM Information” .
-Spatial model information from spatial neighbour blocks (corresponding to “Spatial MVP
from spatial neighbour CUs” for inter)
-Temporal model information from collocated blocks (corresponding to “Temporal MVP
from collocated CUs” for inter)
-History-based model information from a FIFO table (corresponding to “History-based
MVP from a FIFO table” for inter)
-Pairwise average model information (corresponding to “Pairwise average MVP” for inter)
-Default model information (corresponding to “Zero MVs” for inter)
-Spatial model information from spatial neighbour blocks (corresponding to “Spatial MVP
from spatial neighbour CUs” for inter)
-Temporal model information from collocated blocks (corresponding to “Temporal MVP
from collocated CUs” for inter)
-History-based model information from a FIFO table (corresponding to “History-based
MVP from a FIFO table” for inter)
-Pairwise average model information (corresponding to “Pairwise average MVP” for inter)
-Default model information (corresponding to “Zero MVs” for inter)
In one sub-embodiment of the candidate type being “spatial model information from spatial neighbour blocks” in the above candidate type list, a valid spatial neighbouring block (s) can be from one of spatial adjacent and non-adjacent neighbours (or any subset of the blocks in a neighbouring search region for the current block) which satisfies a pre-defined condition. For an example of non-adjacent neighbours, the pre-defined condition (e.g., valid/available checking) refers that the non-adjacent neighbour is in the available region of non-adjacent spatial candidates. For example, the pre-defined condition is that the neighbour is coded by a cross-component mode or combining with cross-component mode. The cross-component mode refers to modes such as CCLM, MMLM, CCCM, GLM, the mode with mode information inherited from a merge-like candidate list, MH CCLM, and/or any cross-component mode with syntax belonging cross-component branch (containing many cross-component modes) and not belonging to tradition intra prediction modes) . Combining with cross-component mode refers to modes such as chroma fusion (or named LM assisted Angular/Planar Mode) , inter CCLM, inter CCCM and/or any traditional mode with syntax not belonging to cross-component branch, but using the cross-component information to generate the prediction. In another sub-embodiment, when checking the validation of a neighbouring coding block, a second-round valid checking is further used when the mentioned valid checking (e.g., neighbouring block not being cross-component mode or neighbouring block not using/combining cross-component mode) , the motion vectors and/or block vectors of the neighbouring block can be used to find the cross-component models. Variations of how to use motion vector and/or block vectors to find the model can reference the description of “Temporal model information from collocated blocks” in the above candidate type list. If the model is found, the second-round valid checking for the neighbouring block is satisfied and the found models can be inserted in the list; otherwise, the neighbouring block is not valid for inserting. When scanning the spatial neighbouring blocks, a candidate is added into the list if the candidate is valid.
In another sub-embodiment of the candidate type being “Temporal model information from collocated blocks” , in the first case, the collocated block is from the block in the reference picture or the pre-defined collocated picture as inter mode by using the current block position and/or the current block motion, and/or in the second case, the collocated block is from the block in the reference picture or the pre-defined collocated picture as inter mode by using the current block position and/or the neighbouring block motion. In the first case, for example, when the current block is coded by inter prediction mode, the collocated block is referred by the motion information (including the motion vectors and the reference picture indicated by the reference index) of the current block. If the current block is a subblock motion mode (e.g., affine mode) , each subblock in the current block has its own collocated temporal model information. Collocated temporal model information from all or any subset of collocated temporal information that are referred by the different subblock motions (of each subblock) is added into the list. For another example, when the reference picture indicated by the reference index is different from the pre-defined collocated picture, which can be the collocated picture used for temporal motion vector prediction in inter mode or any collocated picture specified in any video coding standard to keep the motion or cross-component model information stored and available for the current block, the temporal information from the reference picture is forbidden to be used. For another example, when the reference picture indicated by the reference index is different from the pre-defined collocated picture, which can be the collocated picture used for temporal motion vector prediction in inter mode or any collocated picture specified in any video coding standard to keep the motion or cross-component model information stored and available for the current block, the motion vector is scaled to refer the pre-defined collocated picture and the scaled motion vector is used to find the collocated block in the collocated picture to get the cross-component model in the collocated block. The scaling process is shown in the section of “Temporal Candidates Derivation” . Some examples are described for the second case. For one example, the temporal model information can be from the collocated block referred by the motion information of the neighbouring blocks for the current block. Similar to the first case, the forbidden method or the scaling method can be used in the second case. If the proposed methods are applied to an IBC block or any mode using block vectors (in the first case, the current block being IBC; in the second case, the neighbouring block being IBC) , block vector information is used as motion vector where the block vector information is determined by signalling and/or template matching in a pre-defined searching range like intraTMP and/or any implicit or explicit pre-defined rules.
In another sub-embodiment of the candidate type being “History-based model information” , a history-based table (the FIFO table) is built and stores the model information from the previous coded blocks. The table can be reset as the beginning and/or the end of a CTU, slice, picture, tile, and/or sequence. One or more history-based candidates can be added into the candidate list by the order from the head to tail of the table or from the tail to head of the table.
In another sub-embodiment of the candidate type being “Pairwise average model information” , the model information of this candidate is derived based on the model information from more than one of the previous candidates in the list. For example, it can average and/or modify the model parameters of more than one candidate as the to-be-applied model parameters. For another example, it can combine more than one prediction as the final prediction, where each of more than one prediction is generated by applying one of models in the candidate list.
In another sub-embodiment, the default model information is added if the list is not full after inserting all pre-defined candidates. For example, the default model can be CCLM models. The default alpha (or named as α, a, or scaling parameters) are selected from {0, 1/8, -1/8, 2/8, -2/8, 3/8, -3/8, …} , and the beta (or named as β, b, or offset parameter) is based on the selected default alpha, average neighbouring reconstructed luma sample value, and average neighbouring reconstructed chroma (Cb/Cr) sample value.
In another embodiment, when building modelList, one or more self-derived cross-component candidates are included. The self-derived cross-component candidates are described in the section entitled “Self-derived Cross-Component Model” . In another sub-embodiment, the self-derived cross-component candidates are added only when the list does not contain enough inherited candidates. For example, the self-derived candidates are added before the default candidates or treated as the default candidates. In another sub-embodiment, the self-derived cross-component candidates are added in any pre-defined position in the modelList. For example, the position is after the spatial adjacent candidates. For another example, the position is after the spatial non-adjacent candidates. For another example, the position is after all or any subset of temporal candidates. After building the list, in one embodiment, the list is reordered.
II. Signalling of Enabling or Disabling and Selecting One or More Model Information in the List if Enabled
In this section, the term “inter CCLM” refers to “inter CCLM or inter CCCM” .
When not applying the proposed inter CCLM (or inter CCCM) , the prediction of the current block is from the original inter prediction.
In another embodiment, the choice between applying inter CCLM or not applying inter CCLM depends on signalling.
In another sub-embodiment, when the signalling indicates to apply inter CCLM (or inter CCCM) , additional signalling is used to select one or more models from total candidates. The candidate index is referred as modelIdx in this disclosure. If the modelList containing total candidates (e.g., candidates as described in the section entitled “Building a Candidate List Including Cross-Component Models” , CCLM_LT, CCLM_L, CCLM_T, MMLM_LT, MMLM_L, MMLM_T) , or any subset of candidates are reordered, the additional signalling specifies the candidate index in the reordered list. For example, if one LM mode is selected, the LM prediction is generated by the selected one LM. For another example, if more than one LM modes are selected the LM prediction is generated by blending hypotheses of predictions from multiple LM modes.
In another sub-embodiment, the additional signal is not required and the one or more models are selected according to an implicit rule. For example, the first candidate in the list is used. If the list is reordered by the template cost, then, the first candidate is the candidate with the smallest template cost.
In another embodiment, original inter prediction (generated by motion compensation) is used for luma and the predictions of chroma components are generated by CCLM and/or any other LM modes.
In another embodiment, the one or more LM modes (i.e., cross-component modes) which will be used to generate the one or more hypotheses of predictions for LM assisted Angular/Planar Mode/inter CCLM/inter CCCM/MH CCLM are selected from a pre-defined merging candidate list (i.e., modelList) . One modelIdx is signalled to select a candidate from the candidate list (modelList) and the selected candidate is used for the current block. The modelList contains one or more candidates where each candidate refers to a model (or cross-component mode) information. If only one candidate is in the list (i.e., the size of the list being 1) , the modelIdx is not signalled and/or can be inferred as 0 or a default value. In one embodiment, the modelIdx is implicitly determined or the one or more models used for the current block are determined without signalling modelIdx. For example, the first candidate in the list is used. If the list is reordered by the template cost, the first candidate is the candidate with the smallest template cost. For another example, the used candidate/model is implicitly selected from the list by using a pre-defined rule depending on the coding information of the block for the to-be-used candidate. This embodiment is denoted as “noteA” .
The above proposed methods can be also applied to IBC blocks or the blocks with any IBC sub-modes (e.g., IBC merge or IBC AMVP or any IBC mode under IBC syntax) . The term “inter” in this invention can be changed to IBC. That is, for chroma components, the block vector prediction can be combined or replaced with cross-component prediction.
III. Using the Model Information to Generate One or More Hypotheses of Predictions for the Current Chroma Component
III. 1. Concept
In one embodiment, prediction or reconstruction-based model is used to generate one hypothesis of prediction for the current chroma component.
In one sub-embodiment of a prediction based linear model, the derived model parameters are applied to the predicted samples for the first component (Y) to get the predicted samples for the second or third component.
P(i, j ) = a ·pred′L (i, j ) + b
P(i, j ) = a ·pred′L (i, j ) + b
The predicted samples for the first component are down-sampled with the downsampling filters, which may be fixed at one-predefined filter or selected among some candidate filters.
In another sub-embodiment of a reconstruction based linear model, the derived model parameters are applied to the reconstructed samples for the first component (Y) to get the predicted samples for the second or third component.
P(i, j ) = a ·reco′L (i, j ) + b
P(i, j ) = a ·reco′L (i, j ) + b
The reconstructed samples for the first component are down-sampled with the downsampling filters, which may be fixed at one-predefined filter or selected among some candidate filters.
Prediction or reconstruction based convolution model is similar to the proposed methods for the prediction or reconstruction based linear model. The main difference is that the model coefficient pattern follows CCCM (not CCLM) and the luma samples may or may not be down-sampled first.
In another embodiment, multiple hypotheses (MH) of cross-component predictions are blended or multiple models are used to generate a hypothesis of prediction for the current block. Multiple-hypothesis CCLM is proposed to blend the predictions from multiple CCLM methods. The term “CCLM methods” can refer all the cross-component modes. The to-be-blended CCLM methods can be from (but are not limited to) the above mentioned CCLM methods (e.g., CCLM, MMLM, CCCM, GLM, CCRM, …) and/or models defined in the embodiment described in noteA. A weighting scheme is used for blending.
III. 2. CCLM for Inter Block
The term “CCLM for inter block” can also be named as “inter CCLM” and “CCLM” can be extended to any LM mode (or any cross-component mode) or replaced with any LM mode (or any cross-component mode) . When convolutional cross-component models derived by a regression based method is used, CCLM for inter block can also be named as inter CCCM.
In one embodiment, for chroma components, in addition to original inter prediction (generated by motion compensation which can be uni-prediction and/or bi-prediction, multiple hypotheses of prediction from multiple motion candidates which may refer to one or more merge candidates, one or more AMVP candidates, any combination of above, or which can be only uni-prediction) , one or more hypotheses of predictions (generated by CCLM and/or any other LM modes, CIIP, regular inter merge mode, GPM or GPM variations) are used to generate the current prediction.
In one sub-embodiment, the current prediction is the weighted sum of inter prediction and CCLM prediction.
In another embodiment, inter CCLM is supported only when one or more of the pre-defined inter modes are used for the current block, or inter CCLM is supported when any one (or more than one) of the enabling flag (s) of the pre-defined inter mode is (are) indicated as enabled. The meaning of supporting inter CCLM is that the prediction of the current block can be chosen between applying inter CCLM or not applying inter CCLM.
For another example, if CCLM mode is used for generating the chroma prediction samples and luma prediction is from an inter coding tool, a flag is used to indicate if the CCLM model used for the chroma prediction is inherited from the CCLM models used in the previous coded blocks or the CCLM model is from a predetermined CCLM mode. If the CCLM model is inherited from the CCLM models used in the previous coded blocks, an index is used to indicate which model in the list is inherited or modified. Otherwise, a predetermined CCLM mode is used to implicitly derive the CCLM model for the current chroma prediction.
IV. Selection of Using the Proposed Inherited Mode and/or Using the Proposed Self-Derived Mode
In one embodiment, a flag can be signalled to indicate/select if the re-derived model is used. If the flag is 0, the cross-component model used to encode the neighbour merge candidate is inherited. If the flag is 1, the re-derived method is used.
In another embodiment, an implicit rule (not using the additional flag) is used to determine whether to use the re-derived model.
In another embodiment, if no model can be inherited during building the modelList, or the spatial adjacent/non-adjacent candidates, history candidates, temporal candidates, or all or any subset (e.g., before default candidates) of mentioned candidates in this invention are not available, use the re-derived model.
In another embodiment, when the proposed inherited method is used. The candidate with the smallest cost (e.g., the first candidate in the modelList) is implicitly selected to generate the cross-component prediction. For another example, an index is signalled to select one or more candidates from the modelList. More details can be found in Section II.
V. Details of the Cross-Component Mode (Including Model Information) in the Candidate List
V. 1. Inheriting CCM Information
In one embodiment, the cross-component model (CCM) information of inherited cross-component model can be stored together with the inherited model parameters. The CCM information can be inherited together with the inherited model parameters. The prediction of the current block can be generated based on the inherited CCM information and inherited model parameters. The CCM information can include, but not limited to, prediction mode (e.g., CCLM, MMLM, CCCM, 2-parameter GLM, 3-parameter GLM (GLM model with luma term) , model index for indicating which model shape is used in the convolutional model, classification threshold for multi-model, information to indicate that non-downsampled samples are used in the convolutional model, down-sampling filter flag (whether to do down-sampling) , down-sampling filtering index when multiple down-sampling filters are used, number of neighbouring lines used to derive the model, types of templates used to derive model, post-filtering flag, and model parameters.
In one embodiment, a mixed CCCM model consisting of various terms (e.g., spatial term, gradient term, location term, non-linear term and bias term) can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information to indicate that the inherited model is a mixed CCCM model consisting of various terms. If there are multiple types of mixed CCCM models, a model index can also be stored in the CCM information to indicate which type of mixed CCCM model is inherited. For example, gradient and location based CCCM (GL-CCCM) proposed in JVET-AB0119 (Ramin G. Youvalari, et al., “Non-EE2: Gradient and location based convolutional cross-component model (GL-CCCM) for intra prediction” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 28th Meeting, Mainz, DE, 20–28 October 2022, Document: JVET-AB0119) is a mixed CCCM model which consists of one spatial term in the centre position, two gradient terms for the horizontal direction and vertical direction, two location terms X and Y for the relative horizontal location and relative vertical location, one non-linear term and one bias term. A prediction mode can be stored in the CCM information to indicate that the inherited model is a GL-CCCM model.
V. 2 Propagated CCM Information
In one embodiment, after encoding/decoding a block, the cross-component model (CCM) information of the current block is derived and stored in the current block. The stored CCM information can be referenced by the following coding blocks. The following coding blocks can inherit CCM information from the current block. The definition of CCM information is in the section entitled “Inheriting CCM Information” . The stored CCM information can be inherited as, but not limited to, the following types of candidates: spatial candidates, non-adjacent candidates, temporal candidates, historical candidates.
In one embodiment, if the current block is not cross-component prediction (CCP) coded (i.e., the block does not use cross-component model, such as the inherited cross-component model, the self-derived cross-component model, cross-component model used in chroma fusion which means the chroma prediction is based on adding one or more hypotheses of cross-component prediction to one or more existing hypotheses of prediction of non-cross-component prediction, or any combination of the above) , and there are block vectors available in the current block, (e.g., the current luma block is coded in IBC or IntraTMP mode, the collocated luma block is coded in IBC or IntraTMP mode) , the CCM information of the current block can be derived by copying the CCM information of the reference block located by the block vector. For example, as shown in Fig. 9, block B is not CCP coded and there are block vectors available at block B. The reference block A is located by the block vector. The CCM information of the reference block A, which uses cross-component model, is copied to and stored in block B. In one embodiment, if the reference block located by the block vector is also not CCP coded, but there is CCM information stored in the reference block, the CCM information of the current block can be derived by copying the CCM information stored in the reference block. That is, even when the reference block is not CCP coded, as long as it has valid stored CCM information, the stored CCM information can be referenced by the current block. For example, as shown in Fig. 9, the current block C has a block vector available, and its reference block B, which is not CCP coded, but has CCM information stored. The CCM information of block B is copied to and stored in block C. Since the CCM information stored in block B was copied from block A, the CCM information stored in block C is originally from block A (i.e., the CCM information of block A is propagated to block C) . By only accessing block B, block C can retrieve CCM information originally from block A. In one embodiment, if the reference block located by the block vector is not CCP coded and does not have CCM information stored, no CCM information is stored for the current block.
In one embodiment, the block vectors used to derive the reference block is the block vectors at the centre of the collocated luma block. In another embodiment, the block vectors used to derive the reference block are the block vectors at the top-left corner of the collocated luma block.
In one embodiment, when the current block has multiple block vectors available (e.g., the block vector can be bi-directional, the block can have multiple IntraTMP block vectors, or the current chroma block is collocated with multiple luma blocks and more than one of the luma block has block vectors) , to derive the CCM information of the current block, if only one of the reference blocks located by the block vectors has CCM information, the CCM information from the reference block which has CCM information is copied to and stored in the current block.
For another embodiment, when the current block has multiple block vectors, and more than one of the reference block located by the block vectors has CCM information, one of the reference blocks is selected based on a set of pre-defined rules. The CCM information of the selected reference block is then copied to and stored in the current block.
For one sub-embodiment, the reference block which is CCP coded is selected.
For one sub-embodiment, the reference block whose distance to the current block is the smallest is selected. The CCM information of the selected reference block is copied to and stored in the current block. The distance between the reference block and the current block, located at (xr, yr) and (xc, yc) respectively, can be computed bywhere (xr, yr) and (xc, yc) can be the top-left, top-right, bottom-left, bottom-right, or centre positions of the reference block and the current block.
For one sub-embodiment, the reference block which has the smallest horizontal distance, |xr -xc|, is selected.
For another sub-embodiment, the reference block which has the smallest vertical distance, |yr -yc|, is selected.
For one sub-embodiment, the rules described previously can be combined, and not all the rules described previously need to be applied.
In one embodiment, if the current block is not CCP coded and there are motion vectors available in the current block (e.g. the current luma block is inter-coded) , the CCM information of the current block can be derived by copying the CCM information of its reference block in a reference picture, located by the motion vectors of the current block. For example, as shown in Fig. 10, block B is not CCP coded and there are motion vectors available at block B. The reference block A is located by the motion vector. The CCM information of the reference block A, which uses cross-component model, is copied to and stored in block B. For one embodiment, if the reference block located by the motion vector is also not CCP coded, but there is CCM information stored in the reference block, the CCM information of the current block can be derived by copying the CCM information stored in the reference block. That is, even when the reference block is not CCP coded, as long as it has valid stored CCM information, the stored CCM information can be referenced by the current block. For example, as shown in Fig. 10, the current block C has motion vector available, and its reference block B, which is not CCP coded, has CCM information stored. The CCM information of block B is copied to and stored in block C. Since the CCM information stored in block B was copied from block A, the CCM information stored in block C is originally from block A (i.e., the CCM information of block A is propagated to block C) .By only accessing block B, block C can retrieve CCM information originally from block A. For one embodiment, if the reference block located by the motion vector is not CCP coded and does not have CCM information stored, no CCM information is stored for the current block.
For one embodiment, when the current block is inter-coded with bi-directional prediction, to derive the CCM information of the current block, if only one of the reference blocks located by the motion vectors has CCM information, the CCM information from the reference block which has CCM information is copied to and stored in the current block.
For another embodiment, when the current block is inter-coded with bi-directional prediction, and both reference blocks located by the motion vectors have stored CCM information, one of the reference blocks is selected based on a set of pre-defined rules. The CCM information of the selected reference block is then copied to and stored in the current block.
For one sub-embodiment, the reference block which is CCP coded is selected.
For one sub-embodiment, the reference block whose reference picture has the smaller POC distance to the current picture is selected.
For one sub-embodiment, the reference block whose reference picture has the smaller QP difference from the current picture is selected.
For one sub-embodiment, the reference block whose reference picture has the smaller QP value is selected.
For another sub-embodiment, the reference block whose reference picture has the larger QP values is selected.
For one sub-embodiment, the CCM information of both reference blocks are applied on the reconstructed luma samples of the template of the current block to generate the prediction of the chroma samples of the template of the current block. The distortion between the prediction and the reconstructed chroma samples is computed. The reference block associated with the smaller distortion is selected.
For one sub-embodiment, the rules described previously can be combined, and not all the rules described previously need to be applied. For example, the reference block which is CCP coded is selected. If both blocks are CCP coded, then the block whose reference picture have the smaller POC distance to the current picture is selected. If both blocks are CCP coded and has the same POC distance to the current picture, the reference block whose reference picture has the smaller QP difference from the current picture is selected. If both blocks are CCP coded and have the same POC distance to the current picture, and has the same QP difference from the current picture, then the reference block whose reference picture has the smaller QP value is selected.
In one embodiment, if the current block is not CCP coded and the current slice/picture is a non-intra slice/picture, the CCM information of the current block can be derived by copying the CCM information of its collocated block in a collocated picture.
In one embodiment, when the current block is not CCP coded, the CCM information derivation process is performed after the encoding/decoding of the current picture.
In one embodiment, when multiple types of propagated CCM information are available for the current block, the propagated CCM information to be stored in the current block is determined based on a set of pre-defined rules. For example, if both the reference blocks located by the motion vectors and the reference block which is the current block’s collocated block in the collocated picture have valid CCM information, the CCM information of the reference block located by the motion vectors is copied to and stored in the current block.
V.2.1 Cascaded Vector Cross-Component Models
In one embodiment, if the current encoding/decoding block has motion vector or block vector available, cascaded vectors can be derived based on the motion vector or the block vector of the current block. The CCM information located by the cascade vector can be inherited by the current block. The definition of CCM information is the section entitled “Inheriting CCM Information” .
In another embodiment, when referencing the cross-component model (CCM) information from neighbouring blocks, if the neighbouring block has motion vector or block vector available, cascaded vectors can be derived based on the motion vector or the block vector of the neighbouring block. CCM information of the blocks indicated by the cascaded vectors can also be inherited by the current block. The neighbouring blocks can be, but not limited to, the following types of candidates: spatial candidates, non-adjacent candidates, temporal candidates, and historical candidates.
As depicted in Fig. 11, the cascaded vector can be derived as the sum of the recursively traced motion vectors and block vectors based on a base vector. The base vector can be the motion vector or the block vector of the current block. The base vector can also be the motion vector or the block vector of a neighbouring block. For example, a cascade vector can be derived as following: Let the L0 motion vector of the neighbour block be MVL0 (0) , the block vector of the block indicated by MVL0 (0) is BV (0) , the L0 motion vector of the block indicated by BV (0) is MVL0 (1) and so on. The cascaded vector MV_m is:
MV_m = MVL0 (0) + BV (0) + MVL0 (1) + …+ MVL0 (m) .
MV_m = MVL0 (0) + BV (0) + MVL0 (1) + …+ MVL0 (m) .
And the reference picture of MVL0_m is:
RefPicL0_m = RefPicL0 (m)
RefPicL0_m = RefPicL0 (m)
Here m is the trace depth, that is the number of reference picture referenced back.
Cascaded vectors can be the sum of L0 motion vectors, L1 motion vectors and block vectors.
For each recursion, multiple cascaded vectors can be derived if the based block has multiple motion vectors or multiple block vectors. For example, assume the block indicated by MVL0 (1) is bi-prediction and has two motion vectors MVL0 (2) and MVL1 (2) . Two cascaded vectors can be derived as MV_1 + MVL0 (2) and MV_1 + MVL1 (2) , where MV_1 = MVL0 (0) +BV(0) + MVL0 (1) .
The trace depth m can be a finite value, for example, m = 1. That is, the maximum value of m is a finite number. There is a limit to the number of reference pictures reference back. The trace depth m can also be infinite. That is, all trace depth values are allowed, and there is no limit to the number of reference pictures reference back. The trace depth m can be pre-defined. That is, there can be a pre-defined threshold for the maximum number for trace depth.
A set of cascaded vectors can be derived for different m. That is, for all the allowed trace depth, one or more cascaded vectors can be derived.
The neighbouring block can be a CU/CB, PU, TU/TB or a corresponding block with the same size of the current block.
V.2.2 Multiple Vector-Propagated Cross-Component Models
As described in the section entitled “Propagated CCM Information” , after encoding/decoding a block, the cross-component model (CCM) information of the current block is derived and stored in the current block. In one embodiment, more than one set of CCM information can be stored in the current block.
In one embodiment, if the current block has motion vector or block vector available and the current block is cross-component prediction (CCP) coded, in addition to storing the cross-component model used by the current block, the CCM information of the reference block located by the cascaded vector, as described in the section entitled “Cascaded Vector Cross-Component Models” can also be stored in the current block for future referencing. If the current block is not CCP code, the CCM information of the reference block located by the cascaded vector, as described in the section entitled “Cascaded Vector Cross-Component Models” can also be stored in the current block for future referencing.
The maximum number of sets of CCM information allowed to be stored in one block can be pre-defined. If the available number of sets of CCM information exceeds the maximum allowed number, the priority of the CCM information to be stored can be pre-defined. For example, if the current block is CCP coded, the current CCM information has the highest priority. For another example, the priority can be determined based on the trace depth of the cascaded vector. The shorter the trace depth is, the higher the priority is for the CCM information. The above rules can be combined.
VI. Self-Derived Cross-Component Model
In one embodiment, an example of the self-derived cross-component model is CCRM. When doing the self-derivation, the model (filtering shape/pattern, parameter terms) is unified with the cross-component model in regular intra mode. For example, CCRM model can be unified with any pre-defined existing intra cross-component model (e.g. CCCM using non-downsampled luma samples, GLM, MMLM) and/or the self-derivation only means that the input for deriving model parameters is from the current chroma and collocated luma samples (e.g. motion compensation results if the current block is inter) .
In another embodiment, the self-derived cross-component candidate refers to one or more models and the models are used to generate the cross-component prediction of the current block as follows. The cross-component prediction (used for generating target predicted samples) of the current bock is formed by combining one or more proposed source terms and the models (referring to a proposed weighting setting) . As shown in equation (3) , pred (i, j) is a target (predicted) sample in the current block which can be obtained after our proposed mechanism, sourceTermSet0 includes one or more source terms from luma component, sourceTermSet1 includes one or more source terms from chroma components, and biasTermSet includes one or more bias terms.
Equation (3) is just an example and our proposed mechanism can use any subset or extension of sourceTermSet0, sourceTermSet1, and biasTermSet. Each sample or any subset of samples in the current block gets its target (predicted) sample according to equation (3) . In the following, the content of sourceTermSet0 is described in the section entitled “Content of sourceTermSet0 (i, j) ” , the content of sourceTermSet1 is described in the section entitled “Content of sourceTermSet1 (i, j) ” , the content of biasTermSet is described in the section entitled “Content of biasTermSet” .
pred (i, j) = (sourceTermSet0 (i, j) + sourceTermSet1 (i, j) + …+ biasTermSet) with the
proposed weighting setting, (3)
where (i, j) is a sample position in the current block.
pred (i, j) = (sourceTermSet0 (i, j) + sourceTermSet1 (i, j) + …+ biasTermSet) with the
proposed weighting setting, (3)
where (i, j) is a sample position in the current block.
VII. Content of Various Terms Used to Derive Prediction
VII. 1 Content of sourceTermSet0 (i, j)
SourceTermSet0 (i, j) includes one or more luma source terms denoted as sourceTerm00, sourceTerm01, …, and/or sourceTerm0n-1. The value of n means the number of taps for the source term set.
In one embodiment, the source terms can be linear terms and/or non-linear terms, only linear terms, and/or only non-linear terms.
In another embodiment, n is a pre-defined value such as 1, 2, …or any positive integer. For example, the pre-defined value is fixed in the standard.
In another embodiment, the pattern of the n taps refers to a pattern defined as any subset of a window region M x N around/including the position (iL, jL) as shown in Fig. 12. If the target sample is luma, (iL, jL) is (i, j) . If the target sample is chroma (e.g. Cb or Cr) , (iL, jL) is the collocated luma position from (i, j) .
For a source term in the source term set, the following embodiments are used to determine generation of source content.
In one embodiment, the source content is based on a predicted sample generated by a prediction mode and/or a reconstructed sample generated based on the predicted sample by a prediction mode and a reconstructed residual.
In another sub-embodiment, the source content is the filtered source or the source with any pre-processing. For example, the source content is the predicted/reconstructed sample after filtering with a pre-defined model or filter.
In another sub-embodiment, the source content is gradient information from the predicted samples and/or reconstructed samples. In another embodiment, the source term may further include location information. For example, if the target sample refers to luma, the horizontal location (i) of (i, j) is used in a source term and the vertical location (j) of (i, j) is used in a source term;
VII. 2 Content of sourceTermSet1 (i, j)
SourceTermSet1 (i, j) includes one or more chroma (Cb or Cr) source terms denoted as sourceTerm00, sourceTerm01, …, and/or sourceTerm0m-1. The value of m means the number of taps for the source term set. In one embodiment, the source terms can be linear terms and/or non-linear terms, only linear terms, and/or only non-linear terms. In another embodiment, m is a pre-defined value such as 1, 2, …or any positive integer. For example, the pre-defined value is fixed in the standard.
In another embodiment, the pattern of the m taps refers to a pattern defined as any subset of a window region M2 x N2 around/including the position (iC, jC) as shown in Fig. 13. If the target sample is chroma (Cb or Cr) , (iC, jC) is (i, j) . If the target sample is luma, (iC, jC) is the collocated chroma position from (i, j) .
For a source term in the source term set, the following embodiments are used to determine generation of source content.
In one embodiment, the source content is based on a predicted sample generated by a prediction mode and/or a reconstructed sample generated based on the predicted sample based on a prediction mode and a reconstructed residual.
In another sub-embodiment, the source content is the filtered source or the source with any pre-processing. For example, the source content is the predicted/reconstructed sample after filtering with a pre-defined model or filter.
In another sub-embodiment, the source content is gradient information from the predicted samples and/or reconstructed samples.
In another embodiment, the source term may further include location information. For example, if the target sample refers to chroma, the horizontal location (i) of (i, j) is used in a source term and the vertical location (j) of (i, j) is used in a source term.
VII. 3 Content of biasTermSet
Bias term is a pre-defined value. In one embodiment, the bias term is a midValue according to bitDepth specified in the standard. For example, the bias term can be set as (1<<(bitDepth-1) ) . In another embodiment, the bias term is the same for each sample in the current block. That is, the bias term is independent of the position (i, j) .
The term “block” in this invention can refer to TU/TB, CU/CB, PU/PB, or CTU/CTB.
The term “LM” in this invention can be viewed as one kind of CCLM/MMLM modes or any other extension/variation of CCLM (e.g. the proposed CCLM extension/variation in this invention) . One variation is MMLM which uses thresholds to decide different models for different samples in the current chroma component. Another variation is that for Cb (or Cr) , deriving model parameters from multiple collocated luma blocks. The following show more possible variations. The variations of CCLM here mean that some optional modes can be selected when the block indication refers to using one of cross-component modes (e.g. CCLM_LT, MMLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, and/or an intra prediction mode, which is not one of traditional DC, planar, and angular modes) for the current block. The following shows an example of using convolutional cross-component mode (CCCM) as an optional mode. When this optional mode is applied to the current block, cross-component information with a model, including non-linear term, is used to generate the chroma prediction. The optional mode may follow the template selection of CCLM, so CCCM family includes CCCM_LT CCCM_L, and/or CCCM_T.
The proposed methods (for CCLM) in this invention can be used for any other cross-component modes.
Any combination of the proposed methods in this invention can be applied.
Any of the foregoing proposed methods of cascaded vector derivation and associated CCM information can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in an inter/intra/prediction module of an encoder, and/or an inter/intra/prediction module of a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter/intra/prediction module of the encoder and/or the inter/intra/prediction module of the decoder, so as to provide the information needed by the inter/intra/prediction module.
The method of cascaded vector derivation and associated CCM information as described above can be implemented in an encoder side or a decoder side. For example, any of the proposed method can be implemented in an Intra coding module (e.g. Intra Pred 150 in Fig. 1B) in a decoder or an Intra coding module is an encoder (e.g. Intra Pred. 110 in Fig. 1A) . Any of the proposed propagated cross-component prediction can also be implemented as a circuit coupled to the intra/inter coding module at the decoder or the encoder. However, the decoder or encoder may also use additional processing unit to implement the propagated cross-component prediction processing. While the intra prediction units (e.g. unit 110 in Fig. 1A and unit 150 in Fig. 1B) are shown as individual processing units, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
Fig. 14 illustrates a flowchart of an exemplary video coding system that uses CCM information associated with a cascaded vector for inter chroma coding according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g. one or more CPUs) at the encoder or decoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, input data associated with a current block comprising a current first-colour block and a current second-colour block is received in step 1410, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side, where the current block is coded in a non-intra mode. Whether the current block has a first MV (Motion Vector) or a first BV (Block Vector) available, or when referencing CCM (Cross-Component Model) information from one or more neighbouring blocks, whether at least one neighbouring block has a second MV or a second BV is checked in step 1420. If the current block has a first MV or first BV, or at least one neighbouring block has a second MV or a second BV (i.e., the “Yes” path) , steps 1430-1460 are performed. Otherwise (i.e., the “No” path) , steps 1430-1460 are skipped. In step 1430, one or more cascaded vectors are derived, wherein each cascaded vector is derived recursively as a sum of traced vectors starting from the first MV or the first BV or starting from the second MV or the second BV. In step 1440, target CCM information is determined based on said one or more cascaded vectors. In step 1450, a merge list comprising the target CCM information is determined. In step 1460, the current second-colour block is encoded or decoded by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block when the target CCM information is selected.
The flowchart shown is intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.
The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims (19)
- A method of coding colour pictures using coding tools including one or more cross component models related modes, the method comprising:receiving input data associated with a current block comprising a current first-colour block and a current second-colour block, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side, where the current block is coded in a non-intra mode;if the current block has a first MV (Motion Vector) or a first BV (Block Vector) available or when referencing CCM (Cross-Component Model) information from one or more neighbouring blocks, if at least one neighbouring block has a second MV or a second BV:deriving one or more cascaded vectors, wherein each cascaded vector is derived recursively as a sum of traced vectors starting from the first MV or the first BV or starting from the second MV or the second BV;determining target CCM information based on said one or more cascaded vectors;determining a merge list comprising the target CCM information; andencoding or decoding the current second-colour block by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block when the target CCM information is selected.
- The method of Claim 1, wherein if a second reference block indicated by a current cascaded vector has a third MV or a third BV, the third MV or the third BV is used as a next traced vector and the next cascaded vector is formed by adding the next traced vector to the current cascaded vector, wherein the current cascaded vector is initially set to the first MV or the first BV or set to the second MV or the second BV respectively.
- The method of Claim 1, wherein each traced vector corresponds to an L0 MV, an L1 MV, or one BV.
- The method of Claim 1, wherein for each recursion, if one reference block indicated by one cascaded vector has multiple MVs or BVs, multiple cascaded vectors are derived.
- The method of Claim 4, wherein a set of cascaded vectors is derived for different numbers of trace depth.
- The method of Claim 4, wherein a set of cascaded vectors is derived from all possible sums of the traced vectors, and wherein the traced vectors correspond to a target trace depth.
- The method of Claim 1, wherein trace depth associated with said one or more cascaded vectors corresponds to a finite number smaller than a maximum limit.
- The method of Claim 1, wherein trace depth associated with said one or more cascaded vectors corresponds an infinite number.
- The method of Claim 1, wherein trace depth associated with said one or more cascaded vectors corresponds to a pre-defined number.
- The method of Claim 1, wherein each of said one or more neighbouring block corresponds to a CU/CB, PU, TU/TB or a corresponding block with a same size as the current block.
- The method of Claim 1, wherein when deriving corresponding CCM information to be stored in a target block, after finishing encoding/decoding the target block, multiple sets of CCM information are stored in the target block.
- The method of Claim 11, wherein if the target block has a target MV or BV available and the target block is CCP (Cross-Component Prediction) coded, the CCM information of one or more reference blocks located by one or more cascaded vectors are also stored in the target block in addition to storing the CCM information used by the target block.
- The method of Claim 11, wherein if the target block has a target MV or BV available and the target block is not CCP (Cross-Component Prediction) coded, the CCM information of one or more reference blocks located by one or more cascaded vectors are stored.
- The method of Claim 11, wherein a maximum number of sets of the CCM information allowed to be stored in one block is pre-defined.
- The method of Claim 11, wherein if an available number of sets of the CCM information exceeds a maximum allowed number, priority of the CCM information to be stored is pre-defined.
- The method of Claim 15, wherein if the target block is CCP coded, the CCM information used by the target block has a highest priority.
- The method of Claim 15, wherein the priority is determined based on trace depth of a target cascaded vector.
- The method of Claim 17, wherein the CCM information associated with a shorter trace depth has a higher priority.
- An apparatus for coding colour pictures or video using coding tools including one or more cross component models related modes, the apparatus comprising one or more electronic circuits or processors arranged to:receive input data associated with a current block comprising a current first-colour block and a current second-colour block, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side, where the current block is coded in a non-intra mode;if the current block has a first MV (Motion Vector) or a first BV (Block Vector) available or when referencing CCM (Cross-Component Model) information from one or more neighbouring blocks, if at least one neighbouring block has a second MV or a second BV:derive one or more cascaded vectors, wherein each cascaded vector is derived recursively as a sum of traced MVs or BVs starting from the first MV or the first BV or starting from the second MV or the second BV;determine target CCM information based on said one or more cascaded vectors;determine a merge list comprising the target CCM information;andencode or decode the current second-colour block by using the merge list, wherein corresponding prediction data for the current second-colour block is generated by applying a cross-component model with the target CCM information to the current first-colour block when the target CCM information is selected.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW114101647A TW202539236A (en) | 2024-01-15 | 2025-01-15 | Methods and apparatus of inheriting cross-component models based on cascaded vector for video coding improvement of inter chroma |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202463620920P | 2024-01-15 | 2024-01-15 | |
| US63/620,920 | 2024-01-15 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025152945A1 true WO2025152945A1 (en) | 2025-07-24 |
Family
ID=96470743
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2025/072404 Pending WO2025152945A1 (en) | 2024-01-15 | 2025-01-15 | Methods and apparatus of inheriting cross-component models based on cascaded vector for video coding improvement of inter chroma |
Country Status (2)
| Country | Link |
|---|---|
| TW (1) | TW202539236A (en) |
| WO (1) | WO2025152945A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023198142A1 (en) * | 2022-04-14 | 2023-10-19 | Mediatek Inc. | Method and apparatus for implicit cross-component prediction in video coding system |
| CN116965017A (en) * | 2021-02-01 | 2023-10-27 | 北京达佳互联信息技术有限公司 | Chroma codec enhancement in cross-component sample adaptive offset |
| US20230396793A1 (en) * | 2022-06-07 | 2023-12-07 | Tencent America LLC | Adjacent spatial motion vector predictor candidates improvement |
| US20240015280A1 (en) * | 2022-07-05 | 2024-01-11 | Qualcomm Incorporated | Template selection for intra prediction in video coding |
-
2025
- 2025-01-15 WO PCT/CN2025/072404 patent/WO2025152945A1/en active Pending
- 2025-01-15 TW TW114101647A patent/TW202539236A/en unknown
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116965017A (en) * | 2021-02-01 | 2023-10-27 | 北京达佳互联信息技术有限公司 | Chroma codec enhancement in cross-component sample adaptive offset |
| WO2023198142A1 (en) * | 2022-04-14 | 2023-10-19 | Mediatek Inc. | Method and apparatus for implicit cross-component prediction in video coding system |
| US20230396793A1 (en) * | 2022-06-07 | 2023-12-07 | Tencent America LLC | Adjacent spatial motion vector predictor candidates improvement |
| US20240015280A1 (en) * | 2022-07-05 | 2024-01-11 | Qualcomm Incorporated | Template selection for intra prediction in video coding |
Non-Patent Citations (1)
| Title |
|---|
| R. G. YOUVALARI (NOKIA), D. BUGDAYCI SANSLI, P. ASTOLA, J. LAINEMA (NOKIA): "EE2-2.1: Block vector guided CCCM", 31. JVET MEETING; 20230711 - 20230719; GENEVA; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-AE0100, 4 July 2023 (2023-07-04), XP030311315 * |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202539236A (en) | 2025-10-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11985324B2 (en) | Methods and apparatuses of video processing with motion refinement and sub-partition base padding | |
| US20230328278A1 (en) | Method and Apparatus of Overlapped Block Motion Compensation in Video Coding System | |
| WO2025152945A1 (en) | Methods and apparatus of inheriting cross-component models based on cascaded vector for video coding improvement of inter chroma | |
| WO2025045138A1 (en) | Methods and apparatus of propagated cross-component prediction models for video coding improvement of inter chroma | |
| WO2025082514A1 (en) | Methods and apparatus of using self-derived cross-component models for video coding improvement of inter chroma | |
| WO2025051137A1 (en) | Methods and apparatus of inheriting cross-component models from rescaled reference picture in video coding | |
| WO2025149025A1 (en) | Methods and apparatus of inheriting cross-component model based on cascaded vector | |
| WO2025007931A1 (en) | Methods and apparatus for video coding improvement by multiple models | |
| WO2025007972A1 (en) | Methods and apparatus for inheriting cross-component models from temporal and history-based neighbours for chroma inter coding | |
| WO2025153018A1 (en) | Methods and apparatus of bi-prediction candidates for auto-relocated block vector prediction or chained motion vector prediction | |
| WO2025149015A1 (en) | Methods and apparatus of extrapolation intra prediction model inheritance based on cascaded vectors | |
| WO2025026397A1 (en) | Methods and apparatus for video coding using multiple hypothesis cross-component prediction for chroma coding | |
| WO2024027784A1 (en) | Method and apparatus of subblock-based temporal motion vector prediction with reordering and refinement in video coding | |
| WO2026017074A1 (en) | Method and apparatus of subblock candidates for auto-relocated block vector prediction or chained motion vector prediction in video coding systems | |
| WO2024193428A1 (en) | Method and apparatus of chroma prediction in video coding system | |
| WO2025167844A1 (en) | Methods and apparatus of local illumination compensation model derivation and inheritance for video coding | |
| WO2025156991A1 (en) | Methods and apparatus of local illumination compensation model derivation and inheritance with chained motion vector for video coding | |
| WO2024078331A1 (en) | Method and apparatus of subblock-based motion vector prediction with reordering and refinement in video coding | |
| WO2025218694A1 (en) | Methods and apparatus of mvd candidate number selection in amvp with sbtmvp mode for video coding | |
| WO2025077859A1 (en) | Methods and apparatus of propagating models for extrapolation intra prediction model inheritance in video coding | |
| WO2026032350A1 (en) | Method and apparatus of tmvp and motion-trajectory-based motion vectors for affine model derivation in video coding systems | |
| WO2025153064A1 (en) | Inheriting cross-component model based on cascaded vector derived according to a candidate list | |
| WO2026017030A1 (en) | Method and apparatus of temporal and gpm-derived affine candidates in video coding systems | |
| WO2024141071A9 (en) | Method, apparatus, and medium for video processing | |
| WO2025153050A1 (en) | Methods and apparatus of filter-based intra prediction with multiple hypotheses in video coding systems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25741506 Country of ref document: EP Kind code of ref document: A1 |