WO2025153059A1

WO2025153059A1 - Inheriting cross-component models based on cascaded vectors for inter chroma prediction

Info

Publication number: WO2025153059A1
Application number: PCT/CN2025/072997
Authority: WO
Inventors: Hsin-Yi Tseng; Chia-Ming Tsai; Yu-Ling Hsiao; Chih-Wei Hsu; Ching-Yeh Chen; Tzu-Der Chuang
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2024-01-17
Filing date: 2025-01-17
Publication date: 2025-07-24
Anticipated expiration: 2026-07-17

Abstract

A method of using cascaded vector and cross-component prediction to encode or decode video pictures is provided. A current block comprises a first-color block and a second-color block. A video coder generates a reconstruction for the first-color block. The video coder selects a CCM information from a CCM candidate list, the CCM information being propagated from a source block that is located by a cascaded vector that is derived from a base vector that is a motion vector or block vector of the current block. The base vector is obtained from a candidate list of an inter-prediction coding tool that is used to encode or decode the current block. The video coder applies the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block.The generated cross-component prediction is used to encode or decode the current block.

Description

INHERITING CROSS-COMPONENT MODELS BASED ON CASCADED VECTORS FOR INTER CHROMA PREDICTION

CROSS REFERENCE TO RELATED PATENT APPLICATION (S)

The present disclosure is part of a non-provisional application that claims the priority benefit of U.S. Provisional Patent Application No. 63/621,645, filed on 17 January 2024. Contents of above-listed applications are herein incorporated by reference.

TECHNICAL FIELD

The present disclosure relates generally to video coding. In particular, the present disclosure relates to methods of coding pixel blocks by cascaded vectors and cross component models.

BACKGROUND

Unless otherwise indicated herein, approaches described in this section are not prior art to the claims listed below and are not admitted as prior art by inclusion in this section.

High-Efficiency Video Coding (HEVC) is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) . HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs) .

Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Expert Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11. The input video signal is predicted from the reconstructed signal, which is derived from the coded picture regions. The prediction residual signal is processed by a block transform. The transform coefficients are quantized and entropy coded together with other side information in the bitstream. The reconstructed signal is generated from the prediction signal and the reconstructed residual signal after inverse transform on the de-quantized transform coefficients. The reconstructed signal is further processed by in-loop filtering for removing coding artifacts. The decoded pictures are stored in the frame buffer for predicting the future pictures in the input video signal.

In VVC, a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) . The leaf nodes of a coding tree correspond to the coding units (CUs) . A coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order. A bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors (MVs) and reference indices to predict the sample values of each block. A predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block. An intra (I) slice is decoded using intra prediction only.

For each inter-predicted CU, motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation. The motion parameter can be signalled in an explicit or implicit manner. When a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC. The merge mode can be applied to any inter-predicted CU. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.

SUMMARY

The following summary is illustrative only and is not intended to be limiting in any way. That is, the following summary is provided to introduce concepts, highlights, benefits and advantages of the novel and non-obvious techniques described herein. Select and not all implementations are further described below in the detailed description. Thus, the following summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.

Some embodiments of the disclosure provide a method of using cascaded vector and cross-component prediction to encode or decode video pictures is provided. A video coder receives data to be encoded or decoded as a current block of pixels of a current picture of a video, wherein the current block comprises a first-color block (e.g., luma) and a second-color block (e.g., Cb or Cr) . The video coder generates a reconstruction for the first-color block. The video coder selects a CCM information from a CCM candidate list, the CCM information being propagated from a source block that is located by a cascaded vector. The cascaded vector is derived from a base vector that is a motion vector or block vector of the current block. The video coder applies the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block. The video coder uses the generated cross-component prediction to encode or decode the current block.

The base vector may be associated with (e.g., retrieved from) one of a set of pre-defined positions of the current block. The base vector may be a motion vector or a block vector that is obtained from a candidate list for a coding tool that is used to encode the current block. The coding tool may be a merge mode for inter-prediction, merge mode for intra-block copy (IBC) , advanced motion vector prediction (AMVP) , merge mode with motion vector difference (MMVD) , intra template matching (IntraTMP) , template matching merge mode, bilateral matching merge mode, combined inter and intra prediction with template matching (CIIP-TM) mode, affine merge mode, etc.

The cascaded vector starts at the base vector, then further derived by recursively updating the cascaded vector based on a motion vector or base vector of a current reference block and using the updated cascaded vector to locate a further reference block as the next current reference block. In some embodiments, when a cascaded vector is most recently updated by a block vector, a CCM information for generating the cross-component prediction is not located by the cascaded vector. The motion vector or the block vector of the current reference block may be retrieved from at least one of a set of pre-defined positions of the current reference block. In some embodiments, the number of cascaded vectors allowed to be derived based on the current reference block (at a particular trace depth) is constrained by a particular limit (e.g., 2) . In some embodiments, the number of cascaded vectors allowed to be derived from a same base vector is constrained by a particular limit.

In some embodiments, the CCM information located by the cascaded vector is inserted into the candidate list immediately after a CCM information that is located by the base vector. In some embodiments, the CCM information located by one or more cascaded vectors for the current block are included in the CCM candidate list as a group of candidates, e.g., after history-based candidates.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of the present disclosure. The drawings illustrate implementations of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. It is appreciable that the drawings are not necessarily in scale as some components may be shown to be out of proportion than the size in actual implementation in order to clearly illustrate the concept of the present disclosure.

FIG. 1 shows the predefined search areas for intra template matching (IntraTMP) mode.

FIG. 2 illustrates the decoder operations of inter convolutional cross-component prediction model (InterCCCM) .

FIG. 3 shows luma samples in relation to the derived chroma sample for a cross-component filter.

FIG. 4 illustrates examples of CCM information propagation based on block vectors.

FIG. 5 illustrates examples of CCM information propagation based on motion vectors.

FIG. 6 conceptually illustrates derivation of auto-relocated block vector.

FIG. 7 conceptually illustrates the positions to be checked for deriving auto-relocated block vector.

FIG. 8 shows an example derivation of a candidate for chained MV prediction (CMVP) .

FIG. 9 shows possible sources of a motion vector or a block vector for CMVP.

FIG. 10 illustrates derivation of a cascaded vector.

FIG. 11 illustrates an example video encoder that may implement cascaded vectors and cross-component prediction.

FIG. 12 illustrates portions of the video encoder that implement cross-component prediction using CCMs that are located by cascaded vectors.

FIG. 13 conceptually illustrates a process that encode pixel blocks by cross-component prediction based on CCM information propagated by cascaded vectors.

FIG. 14 illustrates an example video decoder that may implement cascaded vectors and cross-component prediction.

FIG. 15 illustrates portions of the video decoder that implement cross-component prediction using CCMs that are located by cascaded vectors.

FIG. 16 conceptually illustrates a process that decode pixel blocks by cross-component prediction based on CCM information propagated by cascaded vectors.

FIG. 17 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. Any variations, derivatives and/or extensions based on teachings described herein are within the protective scope of the present disclosure. In some instances, well-known methods, procedures, components, and/or circuitry pertaining to one or more example implementations disclosed herein may be described at a relatively high level without detail, in order to avoid unnecessarily obscuring aspects of teachings of the present disclosure.
I. Prediction by Candidate List

A. Intra Block Copy (IBC)

Intra block copy (IBC) is a tool implemented as a block level coding mode, for which block matching (BM) is performed at the encoder to find the optimal block vector (or motion vector) for each CU. Here, a block vector is used to indicate the displacement from the current block to a reference block, which is already reconstructed inside the current picture. The luma block vector of an IBC-coded CU is in integer precision. The chroma block vector is rounded to integer precision as well. When combined with AMVR, the IBC mode can switch between 1-pel and 4-pel motion vector precisions. An IBC-coded CU is treated as the third prediction mode other than intra or inter prediction modes. The IBC mode is applicable to the CUs with both width and height smaller than or equal to 64 luma samples.

At the encoder side, hash-based motion estimation is performed for IBC. The encoder performs RD check for blocks with either width or height no larger than 16 luma samples. For non-merge mode, the block vector search is performed using hash-based search first. If hash search does not return valid candidate, block matching based local search will be performed.

In the hash-based search, hash key matching (32-bit CRC) between the current block and a reference block is extended to all allowed block sizes. The hash key calculation for every position in the current picture is based on 4x4 subblocks. For the current block of a larger size, a hash key is determined to match that of the reference block when all the hash keys of all 4×4 subblocks match the hash keys in the corresponding reference locations. If hash keys of multiple reference blocks are found to match that of the current block, the block vector costs of each matched reference are calculated and the one with the minimum cost is selected.

In block matching (BM) search, the search range is set to cover both the previous and current CTUs. At CU level, IBC mode is signalled with a flag and it can be signaled as IBC AMVP mode or IBC skip/merge mode as follows:
- IBC skip/merge mode: a merge candidate index is used to indicate which of the block vectors in
the list from neighboring candidate IBC coded blocks is used to predict the current block. The merge list consists of spatial, HMVP, and pairwise candidates.
- IBC AMVP mode: block vector difference is coded in the same way as a motion vector difference.
The block vector prediction method uses two candidates as predictors, one from left neighbor and one from above neighbor (if IBC coded) . When either neighbor is not available, a default block vector will be used as a predictor. A flag is signaled to indicate the block vector predictor index.

B. Intra Template Matching (IntraTMP)

Intra template matching prediction (IntraTMP) is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame or picture, whose L-shaped template matches the current template. For a predefined search range, the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side. The prediction signal is generated by matching the L-shaped causal neighbor of the current block with another block in a predefined search area. FIG. 1 shows the predefined search areas for intra template matching (IntraTMP) mode. As illustrated, the predefined search areas for a current block 100 includes: R1: current CTU; R2: top-left CTU; R3: above CTU; R4: left CTU. The search may be conducted by comparing the neighboring template of the current block with neighboring template of a matching block in the predefined search areas.

Sum of absolute differences (SAD) may be used as a cost function. Within each region, the decoder searches for the template that has least SAD with respect to the current one and uses its corresponding block as a prediction block. The dimensions of all regions (SearchRange_w, SearchRange_h) are set proportional to the block dimension (BlkW, BlkH) to have a fixed number of SAD comparisons per pixel. That is:
SearchRange_w = a *BlkW
SearchRange_h = a *BlkH

Where ‘a’ is a constant that controls the gain/complexity trade-off. In some embodiments, ‘a’ equal to 5.

To speed-up the template matching process, the search range of all search regions may be subsampled by a factor of 2. This leads to a reduction of template matching search by 4. After finding the best match, a refinement process is performed. The refinement is done via a second template matching search around the best match with a reduced range. The reduced range is defined as min (BlkW, BlkH) /2.

The Intra template matching tool may be enabled for CUs with size less than or equal to 64 in width and height. This maximum CU size for Intra template matching is configurable. The Intra template matching prediction mode is signaled at CU level through a dedicated flag when decoder-side intra mode derivation (DIMD) is not used for current CU.

C. Merge Mode with Motion Vector Difference (MMVD)

Regular merge mode uses implicitly derived motion information for prediction samples generation of the current CU. Merge Mode with Motion Vector Difference (MMVD) is a coding tool in which the derived motion information under merge mode, as a base motion, is further refined by a motion vector difference MVD. MMVD also extends the list of candidates for merge mode by adding additional MMVD candidates based on predefined offsets (also referred to as MMVD offsets) .
II. Cross Component Coding

A. Cross-Component Linear Model (CCLM)

To reduce the cross-component redundancy, VVC uses a cross-component linear model (CCLM) prediction mode for which the chroma samples are predicted based on the reconstructed luma samples of the same CU by using a linear model as follows:

pred_C (i, j) =α·rec_L′ (i, j) + β (1)

where pred_C (i, j) represents the predicted chroma samples in a CU and rec_L (i, j) represents the downsampled reconstructed luma samples of the same CU. The CCLM parameters (αand β) are derived with at most four neighbouring chroma samples and their corresponding down-sampled luma samples. The above template and left template can be used to calculate the linear model coefficients together, called LM_LA mode, they also can be used alternatively in the other 2 LM modes, called LM_A, and LM_L modes. In LM_Amode, only the above template are used to calculate the linear model coefficients. In LM_L mode, only left template are used to calculate the linear model coefficients.

B. Convolutional Cross-Component Intra Prediction Model (CCCM)

In some embodiments, convolutional cross-component model (CCCM) is applied to predict chroma samples from reconstructed luma samples in a similar fashion as done by CCLM. Similar to CCLM, the reconstructed luma samples are down-sampled to match the lower resolution chroma grid when chroma sub-sampling is used, and left or top and left reference samples are used as templates for model derivation.

CCCM uses a 7-tap convolution filter. The convolutional 7-tap filter consist of a 5-tap plus sign shape spatial component, a nonlinear term and a bias term. The input to the spatial 5-tap component of the filter consists of a center (C) luma sample which is collocated with the chroma sample to be predicted and its above/north (N) , below/south (S) , left/west (W) and right/east (E) neighbors.

A nonlinear term P is represented as power of two of the center luma sample C and scaled to the sample value range of the content:
P = (C*C + midVal ) >> bitDepth

The bias term B represents a scalar offset between the input and output (similarly to the offset term in CCLM) and is set to middle chroma value (512 for 10-bit content) . Output of the filter is calculated as a convolution between the filter coefficients c_i and the input values and clipped to the range of valid chroma samples:
predChromaVal = c₀C + c₁N + c₂S + c₃E + c₄W + c₅P + c₆B

C. Gradient Linear Model (GLM)

For YUV 4: 2: 0 color format, a gradient linear model (GLM) method can be used to predict the chroma samples from luma sample gradients. Two modes are supported: a two-parameter GLM mode and a three-parameter GLM mode. Compared with the CCLM, instead of down-sampled luma values, the two-parameter GLM utilizes luma sample gradients to derive the linear model. Specifically, when the two-parameter GLM is applied, the input to the CCLM process, i.e., the down-sampled luma samples L, are replaced by luma sample gradients G. The other parts of the CCLM (e.g., parameter derivation, prediction sample linear transform) are kept unchanged:
C = α·G + β

In the three-parameter GLM, a chroma sample can be predicted based on both the luma sample gradients and down-sampled luma values with different parameters. The model parameters of the three-parameter GLM are derived from 6 rows and columns adjacent samples by the LDL decomposition based MSE minimization method as used in the CCCM:
C = α₀ ·G + α₁ ·L + α₂ ·β

D. InterCCCM

InterCCCM applies the CCCM method for predicting chroma samples from reconstructed luma samples when the CU uses inter prediction or intra block copy (IBC) . FIG. 2 illustrates the decoder operations of InterCCCM. The cross-component filters 210 are derived using the prediction blocks of luma and chroma. The derived filters are applied to the reconstructed luma block and blended with the prediction blocks of chroma to produce the final chroma prediction blocks. In the blending process the filtered reconstructed luma blocks use blending weight of 0.75 and chroma prediction blocks use blending weight of 0.25.

The cross-component filter of InterCCCM is a 8-tap filter that includes of 6 spatial luma samples, a nonlinear term, and a bias term. FIG. 3 shows luma samples in relation to the derived chroma sample for a cross-component filter. As illustrated, the spatial luma samples (L0, …, L5) are obtained from the luma grid selecting the 6 luma samples closest to the chroma position C without down sampling. The predicted chroma value may be obtained as:
predChromaVal = c₀L0+c₁L1+c₂L2+c₃L3+c₄L4+c₅L5
+ c₆nonlinear ( (L0+L3+1) >> 1) + c₇B,

where nonlinear is CCCM’s nonlinear operator and B is bias. The filter coefficients are derived using a division-free Gaussian elimination method and the necessary offsets are applied to samples prior to filter derivation. The offsets for division-free Gaussian elimination method may be obtained using a four-point average of the luma and chroma prediction blocks, where the four points correspond to the top-left, top-right, bottom-left and bottom-right corners of the blocks. For filter coefficient derivation at most 256 chroma samples are used.

Usage of the mode may be signalled with a CABAC coded TU level flag. One new CABAC context may be included to support this. The InterCCCM flag is only signalled if the TU’s luma Cbf is non-zero and the CU’s predMode is either MODE_INTER or MODE_IBC. The encoder may perform an rate-distortion (RD) decision in the transform selection loop for the chroma components when luma Cbf is non-zero and the CU’s predMode is either MODE_INTER or MODE_IBC.

E. Cross-Component Prediction Merge Mode (Non-Local CCP mode)

In some embodiments, for chroma coding, a flag may be signalled to indicate whether CCP mode (including the CCLM, CCCM, GLM and their variants) or non-CCP mode (chroma intra prediction mode, fusion of chroma intra prediction mode) is used. If the CCP mode is selected, one more flag is signalled to indicate how to derive the CCP type and parameters, i.e., either from a CCP merge candidate list or signalled/derived on-the-fly. A CCP merge candidate list may be constructed from the spatial adjacent, temporal, spatial non-adjacent, history-based m or shifted temporal candidates. After including these candidates, default models are further included to fill the remaining empty positions in the merge list. In order to remove redundant CCP models in the list, pruning operation is applied. After constructing the list, the CCP models in the list are reordered depending on the SAD costs, which are obtained using the neighbouring template of the current block.

For spatial adjacent and non-adjacent candidates, the positions and inclusion order of the spatial adjacent and non-adjacent candidates are the same as those defined for regular inter merge prediction candidates.

Temporal candidates are selected from the collocated picture. The position and inclusion order of the temporal candidates are the same as those defined for regular inter merge prediction candidates. Shifted temporal candidates are also selected from the collocated picture. The position of temporal candidates is shifted by a selected motion vector which is derived from motion vectors of neighboring blocks.

For history-based candidates, a history-based table is maintained to include the recently used CCP models, and the table is reset at the beginning of each CTU row. If the current list is not full after including spatial adjacent and non-adjacent candidates, the CCP models in the history-based table are added into the list.

For default candidates, CCLM candidates with default scaling parameters are considered, only when the list is not full after including the spatial adjacent, spatial non-adjacent, or history-based candidates. If the current list has no candidates with the single model CCLM mode, the default scaling parameters are {0, 1/8, -1/8, 2/8, -2/8, 3/8, -3/8, 4/8, -4/8, 5/8, -5/8, 6/8} . Otherwise, the default scaling parameters are {0, the scaling parameter of the first CCLM candidate + {1/8, -1/8, 2/8, -2/8, 3/8, -3/8, 4/8, -4/8, 5/8, -5/8, 6/8} } .

In some embodiment, a flag is signaled to indicate whether the CCP merge mode is applied or not. If CCP merge mode is applied, an index is signaled to indicate which candidate model is used by the current block. In addition, CCP merge mode is not allowed for the current chroma coding block when the current CU is coded by intra sub-partitions (ISP) with single tree, or the current chroma coding block size is less than or equal to 16.

F. CCP Merge for Chroma Inter-Blocks

The cross-component prediction merge mode described above may be extended to chroma inter coding. The CCP models including CCLM, MMLM, CCCM, GLM, chroma fusion, CCP merge modes, and interCCCM are stored and inherited for the following coding chroma intra and inter blocks. Similar to the CCP merge for chroma intra blocks, a flag is signaled to indicate whether a chroma inter block is coded using this mode. If the CCP merge mode is used, a CCP merge list is constructed in a similar way as that for chroma intra blocks except that additional shifted temporal candidate and on-the-fly derived candidates are included in the CCP merge list. The additional shifted temporal candidates are derived from the collocated picture. The positions of these candidates are the same as those defined for regular inter merge prediction candidates with a shift obtained from the motion vector of the current block. The on-the-fly derived candidates are only used for low delay pictures and are obtained using the neighboring reconstructed samples of the current block. At most one on-the-fly derived candidates including single/multi-model CCCM and single/multi-model CCLM are added to the CCP merge list.

After the CCP merge list is constructed, the candidate with the lowest template cost is selected for the chroma inter block. The chroma inter block is then predicted in the same way as that of inter CCCM. That is, the motion compensation predicted samples are blended with the cross-component predicted samples to form the final prediction.

G. Inheriting CCM Information

As described above, cross-component model (CCM) information from neighboring blocks can be inherited/referenced by the current chroma block. The CCM information includes but not limited to prediction mode (e.g., CCLM, MMLM, CCCM) , model index for indicating which model shape is used in convolutional model, classification threshold for multi-model, down-sampling filter flag (whether to do down-sampling) , down-sampling filtering index, number of neighboring lines used to derive model, types of templates used to derive model, post-filtering flag or model parameters.

In some embodiments, CCLM model can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a CCLM model. In some embodiments, CCLM model with non-linear term can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a CCLM model with non-linear term. In some embodiments, CCCM model can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a CCCM model. The luma offset and the chroma offset used for adjusting the input of CCCM model can also be stored in the CCM information. In some embodiments, CCCM model with different convolution filter shape can be inherited. In addition to model parameters and prediction mode, a CCCM mode index can also be stored in CCM information for indicating which convolution filter shape is used for the inherited CCCM model. For example, a CCCM model with different convolution filter shape can only contain the spatial terms in horizontal direction. For another example, a CCCM model with different convolution filter shape can only contain the spatial terms in vertical direction. For another example, a CCCM model with different convolution filter shape can only contain the spatial terms in diagonal direction. For another example, a CCCM model with different convolution filter shape can only contain the spatial terms in anti-diagonal direction. For another example, a CCCM model with different convolution filter shape can contain the X-shape spatial terms.

In some embodiments, CCCM model using non-downsampled samples can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a CCCM model using non-downsampled samples. In some embodiments, CCCM model with multiple down-sampling filters can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a CCCM model with multiple down-sampling filters, and a model index can be also stored in CCM information for indicating which variant of CCCM model with multiple down-sampling filters is inherited.

In some embodiments, a mixed CCCM model having various terms (e.g., spatial term, gradient term, location term, non-linear term and bias term) can be inherited. The gradient term can be calculated in either the downsampled domain or the non-downsampled domain. The location term can be calculated with respect to the top-left coordinates of the current block or the picture. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a mixed CCCM model consisting of various terms. If there are multiple types of mixed CCCM models, a model index can also be stored in the CCM information for indicating which type of mixed CCCM model is inherited. For example, gradient and location based CCCM (GL-CCCM) is a mixed CCCM model which consist of one spatial term in center position, two gradient terms for horizontal direction and vertical direction, two location term X and Y for the relative horizontal location and relative vertical location, one non-linear term and one bias term. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a GL-CCCM model.

In some embodiment, GLM model can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a GLM model, and a down-sampling filtering index can be also stored in CCM information for indicating which gradient down-sampled filter is used for the inherited GLM model. In some embodiments, GLM model with luma term can be inherited. In addition to storing model parameters, a prediction mode can be stored in the CCM information for indicating that the inherited model is a GLM model with luma term, and a down-sampling filtering index can be also stored in CCM information for indicating which gradient down-sampled filter is used for the inherited GLM model with luma term.

In some embodiments, any kind of cross-component multi-model can be inherited. In addition to storing model parameters and prediction mode, a multi-model on/off flag can be stored in the CCM information for indicating whether the inherited CCM model is a multi-model or not. If the multi-model on/off flag is true, the multi-model classification threshold is also stored in the CCM information.

In some embodiments, the CCM information can include information to indicate how the inherited model is derived. For example, the CCM information can include the number of neighboring lines used to derive the cross-component model and/or the types of templates used to derive model. For example, a set of templates can be used to derive a CCCM model. The set of templates include template with different positions, sizes, and shapes. The CCM information can store the index of the template the inherited CCCM model was derived based on. For example, the inherited CCCM model can be derived based on a top-only template, or a left-only template or a left-and-top template. For another example, the inherited CCCM model can be derived based on a 6-line template or a 2-line template.

In some embodiments, a post-filter flag can be stored in the CCM information. This information describes how the inherited model is used in the block the inherited model is from. If the post-filter flag is on, this indicates a filter is applied to the prediction of the block the inherited model is from. In some embodiments, the inherited model parameters can be refined and then used for the current block and/or saved/referenced for the following coding blocks.
In some embodiments, a propagation flag can be stored in the CCM information. This
information describes if the CCM information is copied from a reference block as described in Section II. H “Propagated CCM information” below. As described in section “Propagated CCM information” , if a block is a CCP coded block, the CCM information of the current block is derived and stored. The propagation flag of current block’s CCM information is set to false. If a block is not CCP coded, the CCM information of the current block is derived by copying the CCM information from a reference block. The propagation flag of current block’s CCM information is then set to true.

In some embodiments, the CCM information may include information to indicate the source of the propagated CCM information. For example, as described in Section II. H below, the reference block from which the CCM information is copied can be a block located by the block vector of the current block, a block located by the motion vector of the current block, or a collocated block of the current block. In some embodiments, the CCM information may include information to indicate the trace depth of cascaded vector based on which the CCM information is derived as described in Section IV “Cascaded vector cross-component models” below.

H. Propagated CCM information

In some embodiments, after encoding/decoding a block, the cross-component model (CCM) information of the current block is derived and stored for the current block. The stored CCM information can be referenced by the following coding blocks. The following coding blocks may inherit CCM information from the current block. The definition of CCM information is described in Section II. G above “Inheriting CCM information” . The stored CCM information can be inherited as but not limited to the following types of candidates as described in Section II. E “Cross-Component Prediction Merge Mode (Non-Local CCP mode) ” : spatial candidates, non-adjacent candidates, temporal candidates, history-based candidates.

In some embodiments, if the current block is cross-component prediction (CCP) coded, the cross-component model used by the current block can be stored and be referenced by the following coding blocks. When a block is CCP coded, that means the block uses a cross-component model to generate the prediction of the block based on a cross-component model. The block may use a cross-component model inherited from neighboring block, a cross-component model derived based on neighboring luma and chroma predicted/reconstructed sample values (e.g., CCLM, MMLM, CCCM, InterCCCM) , cross-component model used in chroma fusion which means the chroma prediction is based on adding one or more hypotheses of cross-component prediction to one or more existing hypotheses of prediction of non-cross-component prediction, or any combination of the above.

In some embodiments, if the current block is not cross-component prediction (CCP) coded (i.e., the block does not use cross-component model, such as the inherited cross-component model, the self-derived cross-component model, cross-component model used in chroma fusion which means the chroma prediction is based on adding one or more hypotheses of cross-component prediction to one or more existing hypotheses of prediction of non-cross-component prediction, or any combination of the above) , and there are block vectors available in the current block, (e.g., when the current luma block is coded in IBC or IntraTMP mode, when the collocated luma block is coded in IBC or IntraTMP mode) , the CCM information of the current block can be derived by copying the CCM information of the reference block located by the block vector. FIG. 4 illustrates examples of CCM information propagation based on block vectors. In the figure, Blocks A, E, G (drawn in dash lines) are coded in cross-component mode (e.g., CCLM, MMLM, GLM, CCCM, Chroma Fusion) . Block B is not CCP coded and there are block vectors available at block B. The reference block A is located by the block vector. The CCM information of the reference block A, which uses cross-component model, is copied and stored for block B. (The reference block A may therefore be referred to as the source block of the CCM information. )

In some embodiments, if a reference block located by the block vector is also not CCP coded, but there is CCM information stored for the reference block, the CCM information of the current block can be derived by copying the CCM information stored in the reference block. That is, even when the reference block is not CCP coded, as long as it has valid stored CCM information, the stored CCM information can be referenced by the current block.

For example, as shown in FIG. 4, the current block C has block vector available, and its reference block B, which is not CCP coded, has CCM information stored. The CCM information of block B is copied and stored for block C. Since the CCM information stored in block B was copied from block A, the CCM information stored for block C is originally from block A (i.e., the CCM information of block A is propagated to block C) . By only accessing block B, block C can retrieve CCM information originally from block A. In some embodiments, if the reference block located by the block vector is not CCP coded and does not have CCM information stored, no CCM information is stored for the current block.

In some embodiments, the block vectors used to derive the reference block is the block vectors at the center of the collocated luma block. In some embodiments, the block vectors used to derive the reference block is the block vectors at the top-left corner of the collocated luma block. In some embodiments, the block vector used to derive the reference block is the block vector at a pre-defined position corresponding to the current block. For example, the pre-defined position can be the center, top-left, top-right, bottom-left, or bottom-right of the current block.

In some embodiments, when the current block has multiple block vectors available (e.g., the block vector can be bi-directional, the block can have multiple IntraTMP block vectors, or the current chroma block is collocated with multiple luma blocks and more than one of the luma blocks have block vectors) , to derive the CCM information of the current block, if only one of the reference blocks located by the block vectors has CCM information, the CCM information from the reference block which has CCM information is copied to and stored in the current block.

For some embodiments, when the current block has multiple block vectors, and more than one of the reference blocks located by the block vectors has CCM information, one of the reference blocks is selected based on a set of pre-defined rules. In some embodiments, the reference block which is CCP coded is selected. In some embodiments, the reference block whose distance to the current block is the smallest is selected. The CCM information of the selected reference block is copied to and stored for the current block. The distance between the reference block and the current block, located at (x_r, y_r) and (x_c, y_c) respectively, can be computed by where (x_r, y_r) and (x_c, y_c) can be the top-left, top-right, bottom-left, bottom-right, or center positions of the reference block and the current block. In some embodiments, the reference block which has the smallest horizontal distance, |x_r-x_c|, is selected. In some embodiments, the reference block which has the smallest vertical distance, |y_r-y_c|, is selected. In some embodiments, the rules described previously can be combined, and not all the rules described previously need to be applied. For example, the reference block which is CCP coded is selected. If there are more than one CCP coded reference blocks, then the block which has the shortest distance to the current block among the CCP coded reference blocks is selected. If there are more than one CCP coded reference blocks whose distances to the current block are the smallest, the reference block which has the smallest horizontal distance, |x_r-x_c|, is selected.

In some embodiments, if the current block is not CCP coded and there are motion vectors available in the current block (e.g. the current luma block is inter-coded) , the CCM information of the current block can be derived by copying the CCM information of its reference block in a reference picture, located by the motion vectors of the current block. FIG. 5 illustrates examples of CCM information propagation based on motion vectors. In the figure, Blocks A, E, G (drawn in dash lines) are coded in cross-component mode (e.g., CCLM, MMLM, GLM, CCCM) . As illustrated, block B is not CCP coded and there are motion vectors available at block B. The reference block A is located by the motion vector. The CCM information of the reference block A, which uses cross-component model, is copied and stored for block B.

In some embodiments, if the reference block located by the motion vector is also not CCP coded, but there is CCM information stored for the reference block, the CCM information of the current block can be derived by copying the CCM information stored in the reference block. That is, even when the reference block is not CCP coded, as long as it has valid stored CCM information, the stored CCM information can be referenced by the current block. For example, as shown in FIG. 5, the current block C has motion vector available, and its reference block B, which is not CCP coded, has CCM information stored. The CCM information of block B is copied and stored in block C. Since the CCM information stored in block B was copied from block A, the CCM information stored in block C is originally from block A (i.e., the CCM information of block A is propagated to block C) . By only accessing block B, block C can retrieve the CCM information originally from block A. For some embodiments, if the reference block located by the motion vector is not CCP coded and does not have CCM information stored, no CCM information is stored for the current block.

In some embodiments, when the current block is inter-coded with bi-directional prediction, to derive the CCM information of the current block, if only one of the reference blocks located by the motion vectors has CCM information, the CCM information from the reference block which has CCM information is copied to and stored for the current block.

In some embodiments, when the current block is inter-coded with bi-directional prediction, and both reference blocks located by the motion vectors have stored CCM information, one of the reference blocks is selected based on a set of pre-defined rules. The CCM information of the selected reference block is then copied and stored in the current block. In some embodiments, the reference block which is CCP coded is selected. In some embodiments, the reference block whose reference picture (i.e., the picture which the reference block is in) has the smaller POC distance to the current picture is selected. In some embodiments, the reference block whose reference picture has the smaller QP difference from the current picture is selected. In some embodiments, the reference block whose reference picture has the smaller QP value is selected. For some embodiments, the reference block whose reference picture has the larger QP values is selected. In some embodiments, the CCM information of both reference blocks are applied on the reconstructed luma samples of the template of the current block to generate the prediction of the chroma samples of the template of the current block. The distortion between the prediction and the reconstructed chroma samples is computed. The reference block associated with the smaller distortion is selected.

In some embodiments, the rules described previously can be combined, and not all the rules described previously need to be applied. For example, the reference block which is CCP coded is selected. If both blocks are CCP coded, then the block whose reference picture has the smaller POC distance to the current picture is selected. If both blocks are CCP coded and has the same POC distance to the current picture, the reference block whose reference picture has the smaller QP difference from the current picture is selected. If both blocks are CCP coded, has the same POC distance to the current picture, and has the same QP difference from the current picture, then the reference block whose reference picture has the smaller QP value is selected.

In some embodiments, if the current block is not CCP coded and the current slice/picture is a non-intra slice/picture, the CCM information of the current block can be derived by copying the CCM information of its collocated block in a collocated picture. In some embodiments, when the current block is not CCP coded, the CCM information derivation process is performed after the encoding/decoding of the current picture. In some embodiments, when multiple types of propagated CCM information are available for the current block, the propagated CCM information to be stored for the current block is determined based on a set of pre-defined rules. For example, if both the reference block located by the motion vectors and the reference block which is the current block’s collocated block in the collocated picture have valid CCM information, the CCM information of the reference block located by the motion vectors is copied and stored for the current block.
III. Cascaded Vector Prediction

A. Auto Relocated Block Vector Prediction (AR-BVP)

Auto-relocated block vector prediction (AR-BVP) is part of IBC merge/AMVP candidate list construction. FIG. 6 conceptually illustrates derivation of auto-relocated block vector. As illustrated, for a current picture 600, a guiding block vector BV_0, 1 associated with the current block B₀ points to a reference block B₁. If B₁ has a BV denoted as BV_1, 2 pointing to a reference block B₂, then BV_0, 2, given by BV_0, 2 = BV_0, 1 +BV_1, 2, is defined as the AR-BVP, guided by BV_0, 1. Similarly, BV_0, n+1 can be derived by

BV_0, n+1 =BV_0, n+BV_n, n+1 = BV_0, 1+BV_1, 2 +…+BV_n-1, n +BV_n, n+1.

In some embodiments, the length of the AR-BVP trace path is 1 (i.e., n=1) . In some embodiments, the length of the AR-BVP trace path is 2 (i.e., n=2) . In some embodiments, there is no constraint for the length of the AR-BVP trace path.

FIG. 7 conceptually illustrates the positions to be checked for deriving auto-relocated block vector. As illustrated, when deriving BV_n, n+1 guided by BV_0, n, all five positions including top-left ( “LT” ) , top-right ( “RT” ) , center ( “Ctr” ) , bottom-left ( “LB” ) , and bottom-right ( “RB” ) positions of B_n may be checked to find BV_n, n+1. In some embodiments, the initial guiding block vector BV_0, 1 is set to be an existing BVP already in the IBC merge/AMVP candidate list. In some embodiments, the AR-BVP candidates may be inserted after historical BVP candidates. The IBC merge/AMVP candidate list size may be kept unchanged.

B. Chained Motion Vector Prediction (CMVP)

In some embodiments, a chained MV prediction (CMVP) is included in the inter merge candidate list construction. FIG. 8 shows an example derivation of a candidate for chained MV prediction (CMVP) . As illustrated, CMVP candidates can be derived as the sum of the recursively traced MVs and BVs based on the pre-derived MVs for the inter merge candidate list. The illustrates a CMVP candidate that is a motion vectors MVL0_k/m locating a block in a reference picture a RefPicL0_k/m. The motion vector and the reference picture are determined according to:
MVL0_k/m = MVL0_k ₍₀₎ + BV_k ₍₀₎ + MVL0_k ₍₁₎ +MVL0_k ₍₂₎ + …+ MVL0_k _(m) ,
RefPicL0_k/m = RefPicL0_k _(m)

More generally, for a CMVP candidate, a set of motion vectors MV_k/m in a reference picture RefPic_k/m can be derived by
MV_k/m = MV_k ₍₀₎ + BV_k ₍₀₎ + MV_k ₍₁₎ +MV_k ₍₂₎ + …+ MV_k _(m) ,
RefPic_k/m = RefPic_k _(m) ,

where k and m indicate the number of merge index and trace depths of the CMVP.

FIG. 9 shows possible sources of a motion vector or a block vector for CMVP. The figure shows operation of referencing source and destination of tracing MVs CMVP. As illustrated, when deriving MV_k/m, MV_k ₍₀₎ (also referred to as the base vector of the CMVP) is found by checking the existence of MVs or BVs in MV/BV storage corresponding to all five positions of the current block (i.e., the Ctr, LT, RT, LB, and RB of the current block) . In the example, a MV that is found in the MV/BV storage corresponding to the center position of the current block, and that MV is used as MVL0_k ₍₀₎ or base vector of the CMVP.

When pre-derived merge candidates targeting CMVP candidates has two MVs, a MV_k/m is derived for each list (i.e., L0 and L1) and each trace depth. Up to two MVs can be derived for each list and each trace depth, and the MV set is sequentially inserted into inter merge candidate list. The traceable reference pictures are only within the reference picture list. CMVP candidates may be inserted after HMVP candidates for the regular merge and TM merge. When deriving CMVP candidates, hpelIfIdx, bcwIdx, licFlag, and mhpFlag may not be inherited. CMVP candidates may not be derived when the TMVP is disabled.
IV. Cascaded vector cross-component models

A. CCM propagated by Cascaded Vectors

In some embodiments, if the current encoding/decoding block has motion vector or block vector available, cascaded vectors can be derived based on the motion vector or the block vector of the current block. The motion vector or the block vector can be used as the base vector. The CCM information located by the cascade vector can be inherited by the current block. The definition of CCM information is provided in Section II. G.

In some embodiments, when referencing the cross-component model (CCM) information from neighboring blocks, if the neighboring block has motion vector or block vector available, cascaded vectors can be derived based on the motion vector or the block vector of the neighboring block. The motion vector or the block vector can be used as the base vector. CCM information of the blocks indicated by the cascaded vectors can also be inherited by the current block. The neighboring blocks can be but not limited to the following types of candidates as described in Section II. E for spatial candidates, non-adjacent candidates, temporal candidates, and history-based candidates. The neighboring block can be a CU/CB, PU, TU/TB or a corresponding block with the same size of the current block.

In some embodiments, if the current encoding/decoding block has a candidate list in which the candidate contains motion vector and/or block vector information, the cascaded vector can also be derived based on the candidates in the candidate list. The motion vector or the block vector information of the candidate can be used as the base vector. For example, the candidate list can be an IntraTMP candidate list or a MMVD candidate list. In some embodiments, the current block is coded in a merge mode, the candidate list is the merge candidate list. For example, if the current block is coded in inter merge mode, the candidate list can be the inter merge list. For another example, if the current block is coded in IBC merge mode, the candidate list can be the IBC merge candidate list. For other examples, the current block can be coded in template matching merge mode, bilateral matching merge mode, CIIP-TM mode, or affine merge mode, and the candidate list is the corresponding merge candidate list used in each mode. In some embodiments, the candidate list can be an MVP list. For example, if the current block is coded in AMVP mode, the candidate list is the MVP list of AMVP mode. Using the motion vector or block vector in the candidate list of the current block improves abundancy and diversity of the CCM candidates.

The cascaded vector is derived as the sum of the recursively traced motion vectors and block vectors based on a base vector. A cascade vector can be derived according to the following steps:
1) Starting from a base vector. A base vector can be obtained through methods described above (e.g.,
the motion vector or the block vector of the current block can be used as the base vector. )
2) For the i-th recursion:
a) Locate a block as the current reference block based on a cascaded vector CV (i-1)
b) Retrieve a vector from the current reference block. This vector can be an L0 motion vector,
an L1 motion vector or a block vector. Denote this vector as V (i)
c) One more cascaded vector is derived as CV (i) = CV (i-1) + V (i) = CV (0) + V (1) + V (2) + …+
V (i)

The Steps 2 (a) through 2 (c) are repeated for each recursion. For the first recursion CV (0) is the base vector.

FIG. 10 illustrates derivation of a cascaded vector. The figure illustrates a cascaded vector MVL0_m, whose base vector (denoted as MVL0 (0) in the figure) is the L0 motion vector of the current block 1010. The base vector MVL0 (0) indicates a block 1020, which has a block vector BV (0) , and the L0 motion vector of the block 1030 indicated by BV (0) is denoted as MVL0 (1) and so on. The cascaded vector MVL0_m is calculated as:

MVL0_m = MVL0 (0) + BV (0) + MVL0 (1) +…+ MVL0 (m)

And the reference picture of MVL0_m is: RefPicL0_m = RefPicL0 (m) . Here m is the trace depth, that is the number of reference picture referenced traced back. Cascaded vectors can be the sum of L0 motion vectors, and/or L1 motion vectors and/or block vectors. In some embodiments, the vector retrieved in Step 2 (b) is the L0 motion vector of the block if the block is inter-coded. In some embodiments, the vector retrieved in Step 2 (b) is the L1 motion vector of the block if the block is inter-coded. In some embodiments the vector retrieved in Step 2 (b) is the block vector of the block.

In some embodiments, the vector retrieved in Step 2 (b) can only be a motion vector. In some embodiments, the vector retrieved in Step 2 (b) can be a motion vector or a block vector. In some embodiments, in Step 2 (c) , a new cascaded vector CV (i) is only created if V (i) retrieved in Step 2 (b) is a motion vector. In other words, if V (i) retrieved in Step 2 (b) is not a motion vector (for example, V (i) retrieved in Step 2 (b) is a block vector) , a new cascaded vector CV (i) is not created.

In some embodiments, the cascaded vector CV (i) may be used to retrieve CCM information from the reference block that is located by CV (i) (such a reference block may be referred to as the source block of the CCM information) . In some embodiments, if CV (i) is used for the next recursion, then CV (i) is not used to locate reference block to retrieve CCM information. (For example, CV (i) is used for the next recursion only (for the condition that V (i) retrieved in Step 2 (b) is not a motion vector (for example, V (i) retrieved in Step 2 (b) is a block vector) ) , the CV (i) is not used to locate reference block to retrieve CCM information) .

In some embodiments, for each recursion, multiple new cascaded vectors can be derived at Step 2 (c) , if multiple motion vectors and/or multiple block vectors can be retrieved from the block in Step 2 (b) . The multiple newly derived cascaded vectors can all be the vector used in Step 2 (a) in the next recursion. For example, assume the block indicated by MVL0 (1) is bi-prediction and has two motion vector MVL0 (2) and MVL1 (2) . Two new cascaded vectors can be derived as MV_1 + MVL0 (2) and MV_1 + MVL1 (2) , where MV_1 = MVL0 (0) + BV (0) + MVL0 (1) . Each new cascaded vector can be fed into the next recursion as the cascaded vector in Step 2 (a) . Hence two branches of recursion can start.

In some embodiments, the multiple motion vectors and/or block vectors in Step 2 (b) can also be retrieved by checking some pre-defined positions (of the located block, or the current reference block) , similar to those corresponding to the 5 pre-defined positions of current block (the center, top-left, top-right, bottom-left, and bottom-right described above. ) The trace depth m can be a finite value, for example, m = 1. That is, the maximum value of m is a finite number. There is a limit to the number of reference pictures reference back. The trace depth m can also be infinite. That is, all trace depth values are allowed, and there is no limit to the number of reference pictures reference back. The trace depth m can be pre-defined. That is, there can be a pre-defined threshold for the maximum number for trace depth. A set of cascaded vectors can be derived for different m. (Aset of cascaded vectors can be derived for each trace depth. )

In some embodiment, CCM information located by a set of cascaded vectors (that are derived from the same base vector) are inserted into a (CCP merge) candidate list together as a group. In some embodiments, the CCM information located by the cascaded vectors are inserted into the candidate list after the history-based candidates. In some embodiments, the inclusion order of the CCM information located by the cascaded vectors is the same as that of the position that retrieves the base vector (i.e., the CCM information located by the cascaded vector is inserted into the candidate list right after the CCM information retrieved at the position that retrieved the base vector, or the CCM information located by the cascaded vector is inserted into the candidate list right after the CCM information located by the base vector. )

In some embodiments, there’s a limit to the number of cascaded vectors allowed to be derived for each trace depth m (i.e., based on the current reference block of step 2 (b) ) . For example, no more than two cascaded vectors are allowed to be derived for each trace depth m. In some embodiments, there’s a limit to the number of cascaded vectors allowed to be derived for each base vector.

B. Multiple Vector-propagated CCMs

As described in Section II. H above “Propagated CCM information” , after encoding/decoding a block, the cross-component model (CCM) information of the current block is derived and stored for the current block. In some embodiments, more than one CCM information can be stored for the current block.

In some embodiments, if the current block has motion vector and/or block vector available and the current block is cross-component prediction (CCP) coded, in addition to storing the cross-component model used by the current block, the CCM information of the reference block located by the cascaded vector can also be stored in the current block for future referencing. Note that the base vector can also be seen as a cascaded vector. If the current block is not CCP code, the CCM information of the reference block located by the cascaded vector can also be stored in the current block for future referencing.

In some embodiments, the CCM information of the collocated block in the collocated picture can also be stored. In some embodiments, if the reference block, located by cascaded vector, block vector, motion vector or collocated position as described above, has multiple CCM information stored, all the CCM information stored in the reference block can be stored for the current block.

In some embodiments, the maximum number of CCM information allowed to be stored in one block can be pre-defined. In some embodiments, if the available number of CCM information exceeds the maximum allowed number or allowed buffer size, the priority of the CCM information to be stored can be pre-defined.

When selecting the CCM information to be stored, inherited or applied to the current block, the selection priority can depend on the temporal reference picture distance, quantization parameter, spatial distance, or trace distance. For example, if the current block is CCP coded, the current CCM information has higher or the highest selection priority. For another example, if the CCM information are retrieved based on a cascaded vector, the priority can be determined based on the trace depth of the cascaded vector. The shorter the trace depth is, the higher the selection priority is for the CCM information.

For still another example, in some embodiments, the CCM information corresponding to the reference block whose reference picture (i.e., the picture which the reference block is in) has the smaller POC distance to the current picture has higher priority. For still another example, the CCM information corresponding to the reference block whose reference picture has the smaller QP difference from the current picture has higher priority. For still another example, the CCM information corresponding to the reference block whose reference picture has the smaller QP value has higher priority. For still another example, the CCM information corresponding to the reference block whose reference picture has the larger QP values has higher priority.

For still another example, if the reference block is located at the same picture as the current block, the CCM information corresponding to the reference block whose distance to the current block is the smallest has higher priority. The distance between the reference block and the current block, located at (x_r, y_r) and (x_c, y_c) respectively, can be computed by Euclidean distance, Manhattan Distance, Minkowski distance, or Chebyshev distance. (x_r, y_r) and (x_c, y_c) can be the top-left, top-right, bottom-left, bottom-right, or center positions of the reference block and the current block.

For still another example, if the reference block is located at the same picture as the current block, the CCM information corresponding to the reference block which has the smallest horizontal distance, |x_r –x_c|, has higher priority. For still another example, if the reference block is located at the same picture as the current block, the CCM information corresponding to the reference block which has the smallest vertical distance, |y_r -y_c|, has higher priority.

For some embodiments, the rules described in the above examples for assigning CCM selection priorities can be combined, and not all the rules described previously need to be applied. Any of the foregoing proposed methods can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in an inter/intra/prediction/IBC/transform/quantization module of an encoder, and/or an inter/intra/prediction/IBC/transform/quantization module of a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter/intra/prediction/transform/quantization module of the encoder and/or the inter/intra/prediction/IBC/transform/quantization module of the decoder, so as to provide the information needed by the inter/intra/prediction/IBC/transform/quantization module.
V. Example Video Encoder

FIG. 11 illustrates an example video encoder 1100 that may implement cascaded vectors and cross-component prediction. As illustrated, the video encoder 1100 receives input video signal from a video source 1105 and encodes the signal into bitstream 1195. The video encoder 1100 has several components or modules for encoding the signal from the video source 1105, at least including some components selected from a transform module 1110, a quantization module 1111, an inverse quantization module 1114, an inverse transform module 1115, an intra-picture estimation module 1124, an intra-prediction module 1125, a motion compensation module 1130, a motion estimation module 1135, an in-loop filter 1145, a reconstructed picture buffer 1150, a MV buffer 1165, and a MV prediction module 1175, and an entropy encoder 1190. The motion compensation module 1130 and the motion estimation module 1135 are part of an inter-prediction module 1140. The intra-prediction module 1125 and the intra-prediction estimation module 1124 are part of a current picture prediction module 1120, which uses current picture reconstructed samples as reference samples for prediction of the current block.

In some embodiments, the modules 1110 –1190 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 1110 –1190 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 1110 –1190 are illustrated as being separate modules, some of the modules can be combined into a single module.

The video source 1105 provides a raw video signal that presents pixel data of each video frame without compression. A subtractor 1108 computes the difference between the raw video pixel data of the video source 1105 and the predicted pixel data 1113 from the motion compensation module 1130 or intra-prediction module 1125 as prediction residual 1109. The transform module 1110 converts the difference (or the residual pixel data or residual signal 1108) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) . The quantization module 1111 quantizes the transform coefficients into quantized data (or quantized coefficients) 1112, which is encoded into the bitstream 1195 by the entropy encoder 1190.

The inverse quantization module 1114 de-quantizes the quantized data (or quantized coefficients) 1112 to obtain transform coefficients 1118, and the inverse transform module 1115 performs inverse transform on the transform coefficients 1118 to produce reconstructed residual 1119. The reconstructed residual 1119 is added with the predicted pixel data 1113 to produce reconstructed pixel data 1117. In some embodiments, the reconstructed pixel data 1117 is temporarily stored in a line buffer 1127 (or intra prediction buffer) for intra-picture prediction and spatial MV prediction. The reconstructed pixels are filtered by the in-loop filter 1145 and stored in the reconstructed picture buffer 1150. In some embodiments, the reconstructed picture buffer 1150 is a storage external to the video encoder 1100. In some embodiments, the reconstructed picture buffer 1150 is a storage internal to the video encoder 1100.

The intra-picture estimation module 1124 performs intra-prediction based on the reconstructed pixel data 1117 to produce intra prediction data. The intra-prediction data is provided to the entropy encoder 1190 to be encoded into bitstream 1195. The intra-prediction data is also used by the intra-prediction module 1125 to produce the predicted pixel data 1113.

The motion estimation module 1135 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 1150. These MVs are provided to the motion compensation module 1130 to produce predicted pixel data.

Instead of encoding the complete actual MVs in the bitstream, the video encoder 1100 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 1195.

The MV prediction module 1175 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 1175 retrieves reference MVs from previous video frames from the MV buffer 1165. The video encoder 1100 stores the MVs generated for the current video frame in the MV buffer 1165 as reference MVs for generating predicted MVs.

The MV prediction module 1175 uses the reference MVs to create the predicted MVs. The predicted MVs can be computed by spatial MV prediction or temporal MV prediction. The difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 1195 by the entropy encoder 1190.

The entropy encoder 1190 encodes various parameters and data into the bitstream 1195 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding. The entropy encoder 1190 encodes various header elements, flags, along with the quantized transform coefficients 1112, and the residual motion data as syntax elements into the bitstream 1195. The bitstream 1195 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.

The in-loop filter 1145 performs filtering or smoothing operations on the reconstructed pixel data 1117 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 1145 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) . In some embodiments, luma mapping chroma scaling (LMCS) is performed before the loop filters.

FIG. 12 illustrates portions of the video encoder 1100 that implement cross-component prediction using CCMs that are located by cascaded vectors. The luma component of the current block may be coded by inter-prediction 1140 and intra picture prediction 1125 (e.g., IBC) and generate a reconstruction 1215 for the luma component. A cross-component model 1210 is applied to the luma reconstruction 1215 to generate a cross-component predictor 1225 for a chroma component. The cross-component predictor 1225 is included in the predicted pixel data 1113.

The cross-component model 1210 may be generated by a model constructor 1205 based on samples (in and/or around the current block and/or a reference block) retrieved from the reconstructed picture buffer 1150 and/or the line buffer 1127. Section I above describe several types of cross-component models that may be used as the cross-component model 1210, such as InterCCCM as described by reference to FIG. 2 above.

The cross-component model 1210 may also be provided by a CCM merge module 1230, which selects a cross-component model (CCM) information or other cross-component prediction (CCP) information that are inherited from one of a plurality of candidates as a CCP merge candidate list. These candidates may include spatial candidates, temporal candidates, historical candidates, default candidates, etc. CCM information is retrieved from a CCM storage 1235 by the CCM merge module 1230.

The CCM storage 1235 stores CCM information and corresponding support data structures. The CCM storage 1235 represents (or is implemented by) any form of storage that is used to store CCM information which includes cross-component models generated by the model constructor 1205. The data structures stored within the CCM storage 1235 are used to associate CCM information with blocks in different pictures.

A CCM propagation module 1240 operates to propagate CCM information between blocks and/or between pictures. In some embodiments, for a position or a block that does not have CCM information, the CCM propagation module 1240 may follow the block’s BV or MV to locate the CCM information of a reference block. The CCM propagation module 1240 may then propagate the CCM /CCP information by copying the CCM information of the reference block to the block. The CCM propagation module 1240 may also propagate the CCM information to the CCM merge candidate list 1230 for coding the current block.

In some embodiments, the CCM propagation module 1240 may use one or more cascaded vectors provided by a vector cascading module 1250 to locate the CCM information to be propagated. The vector cascading module 1250 may perform step 1 and recursive steps 2 (a) -2 (c) described above to create the cascaded vectors. The base vector of the cascaded vector may be taken from a candidate list of a coding tool (e.g., IBC, IntraTMP, MMVD, AMVP, etc. ) that is used encode the current block. Multiple cascaded vectors may be generated off a same base vector, and at each tracing depth of the vector cascading module 1250 may following different motion vector or block vector to create different cascaded vectors. Numerical limits may be imposed on the number cascaded vectors that can be generated off the same base vector and/or at each tracing depth.

FIG. 13 conceptually illustrates a process 1300 that encode pixel blocks by cross-component prediction based on CCM information propagated by cascaded vectors. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the encoder 1100 performs the process 1300 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the encoder 1100 performs the process 1300.

The encoder receives (at block 1310) data to be encoded as a current block of pixels in a current picture. The current block includes a first-color block (e.g., for luma component) and a second-color block (e.g., for Cr or Cb component. ) .

The encoder generates (at block 1320) a reconstruction for the first-color block. The reconstruction for the first-color block may be based on non-intra-prediction (e.g., inter-prediction or IBC) .

The encoder selects (at block 1330) a cross-component model (CCM) information from a CCM candidate list. The CCM information being propagated from a source block is located by a cascaded vector that is derived from a base vector that is a motion vector or a block vector that is obtained from a candidate list for a coding tool that is used to encode the current block. The coding tool may be a merge mode for inter-prediction, merge mode for intra-block copy (IBC) , advanced motion vector prediction (AMVP) , merge mode with motion vector difference (MMVD) , intra template matching (IntraTMP) , template matching merge mode, bilateral matching merge mode, CIIP-TM mode, affine merge mode, etc.

The cascaded vector starts at the base vector, then further derived by recursively updating the cascaded vector based on a motion vector or a block vector of a current reference block and using the updated cascaded vector to locate a further reference block as the next current reference block. In some embodiments, a cascaded vector that is most recently updated by a block vector is not used to locate a CCM information for generating the cross-component prediction. In some embodiments, when a cascaded vector is most recently updated by a block vector, a CCM information for generating the cross-component prediction is not located by the cascaded vector. The motion vector or the block vector of the current reference block may be retrieved from at least one of a set of pre-defined positions of the current reference block. In some embodiments, the number of cascaded vectors allowed to be derived based on the current reference block (at a particular trace depth) is constrained by a particular limit (e.g., 2) . In some embodiments, the number of cascaded vectors allowed to be derived from a same base vector is constrained by a particular limit.

The encoder applies (at block 1340) the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block.

The encoder uses (at block 1350) the generated cross-component prediction to generate prediction residual for encoding the current block.
VI. Example Video Decoder

In some embodiments, an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.

FIG. 14 illustrates an example video decoder 1400 that may implement cascaded vectors and cross-component prediction. As illustrated, the video decoder 1400 is an image-decoding or video-decoding circuit that receives a bitstream 1495 and decodes the content of the bitstream into pixel data of video frames for display. The video decoder 1400 has several components or modules for decoding the bitstream 1495, including some components selected from an inverse quantization module 1414, an inverse transform module 1415, an intra-prediction module 1425, a motion compensation module 1430, an in-loop filter 1445, a decoded picture buffer 1450, a MV buffer 1465, a MV prediction module 1475, and a parser 1490. The motion compensation module 1430 is part of an inter-prediction module 1440. The intra-prediction module 1425 is part of a current picture prediction module 1420, which uses current picture reconstructed samples as reference samples for prediction of the current block.

In some embodiments, the modules 1414 –1490 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 1414 –1490 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 1414 –1490 are illustrated as being separate modules, some of the modules can be combined into a single module.

The parser 1490 (or entropy decoder) receives the bitstream 1495 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard. The parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 1412. The parser 1490 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.

The inverse quantization module 1414 de-quantizes the quantized data (or quantized coefficients) 1412 to obtain transform coefficients, and the inverse transform module 1415 performs inverse transform on the transform coefficients 1418 to produce reconstructed residual signal 1419. The reconstructed residual signal 1419 is added with predicted pixel data 1413 from the intra-prediction module 1425 or the motion compensation module 1430 to produce decoded pixel data 1417. The decoded pixels data are filtered by the in-loop filter 1445 and stored in the decoded picture buffer 1450. In some embodiments, the decoded picture buffer 1450 is a storage external to the video decoder 1400. In some embodiments, the decoded picture buffer 1450 is a storage internal to the video decoder 1400.

The intra-prediction module 1425 receives intra-prediction data from bitstream 1495 and according to which, produces the predicted pixel data 1413 from the decoded pixel data 1417 stored in the decoded picture buffer 1450. In some embodiments, the decoded pixel data 1417 is also stored in a line buffer 1427 (or intra prediction buffer) for intra-picture prediction and spatial MV prediction.

In some embodiments, the content of the decoded picture buffer 1450 is used for display. A display device 1405 either retrieves the content of the decoded picture buffer 1450 for display directly, or retrieves the content of the decoded picture buffer to a display buffer. In some embodiments, the display device receives pixel values from the decoded picture buffer 1450 through a pixel transport.

The motion compensation module 1430 produces predicted pixel data 1413 from the decoded pixel data 1417 stored in the decoded picture buffer 1450 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 1495 with predicted MVs received from the MV prediction module 1475.

The MV prediction module 1475 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 1475 retrieves the reference MVs of previous video frames from the MV buffer 1465. The video decoder 1400 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 1465 as reference MVs for producing predicted MVs.

The in-loop filter 1445 performs filtering or smoothing operations on the decoded pixel data 1417 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 1445 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) . In some embodiments, luma mapping chroma scaling (LMCS) is performed before the loop filters.

FIG. 15 illustrates portions of the video decoder 1400 that implement cross-component prediction using CCMs that are located by cascaded vectors. The luma component of the current block may be coded by inter-prediction 1440 and intra picture prediction 1425 (e.g., IBC) and generate a reconstruction 1515 for the luma component. A cross-component model 1510 is applied to the luma reconstruction 1515 to generate a cross-component predictor 1525 for a chroma component. The cross-component predictor 1525 is included in the predicted pixel data 1413.

The cross-component model 1510 may be generated by a model constructor 1505 based on samples (in and/or around the current block and/or a reference block) retrieved from the reconstructed picture buffer 1450 and/or the line buffer 1427. Section I above describe several types of cross-component models that may be used as the cross-component model 1510, such as InterCCCM as described by reference to FIG. 2 above.

The cross-component model 1510 may also be provided by a CCM merge module 1530, which selects a cross-component model (CCM) information or other cross-component prediction (CCP) information that are inherited from one of a plurality of candidates as a CCP merge candidate list. These candidates may include spatial candidates, temporal candidates, historical candidates, default candidates, etc. CCM information is retrieved from a CCM storage 1535 by the CCM merge module 1530.

The CCM storage 1535 stores CCM information and corresponding support data structures. The CCM storage 1535 represents (or is implemented by) any form of storage that is used to store CCM information which includes cross-component models generated by the model constructor 1505. The data structures stored within the CCM storage 1535 are used to associate CCM information with blocks in different pictures.

A CCM propagation module 1540 operates to propagate CCM information between blocks and/or between pictures. In some embodiments, for a position or a block that does not have CCM information, the CCM propagation module 1540 may follow the block’s BV or MV to locate the CCM information of a reference block. The CCM propagation module 1540 may then propagate the CCM /CCP information by copying the CCM information of the reference block to the block. The CCM propagation module 1540 may also propagate the CCM information to the CCM merge candidate list 1530 for coding the current block.

In some embodiments, the CCM propagation module 1540 may use one or more cascaded vectors provided by a vector cascading module 1550 to locate the CCM information to be propagated. The vector cascading module 1550 may perform step 1 and recursive steps 2 (a) -2 (c) described above to create the cascaded vectors. The base vector of the cascaded vector may be taken from a candidate list of a coding tool (e.g., IBC, IntraTMP, MMVD, AMVP, etc. ) that is used decode the current block. Multiple cascaded vectors may be generated off a same base vector, and at each tracing depth of the vector cascading module 1550 may following different motion vector or block vector to create different cascaded vectors. Numerical limits may be imposed on the number cascaded vectors that can be generated off the same base vector and/or at each tracing depth.

FIG. 16 conceptually illustrates a process 1600 that decode pixel blocks by cross-component prediction based on CCM information propagated by cascaded vectors. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the decoder 1400 performs the process 1600 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the decoder 1400 performs the process 1600.

The decoder receives (at block 1610) data to be decoded as a current block of pixels in a current picture. The current block includes a first-color block (e.g., for luma component) and a second-color block (e.g., for Cr or Cb component. ) .

The decoder generates (at block 1620) a reconstruction for the first-color block. The reconstruction for the first-color block may be based on non-intra-prediction (e.g., inter-prediction or IBC) .

The decoder selects (at block 1630) a cross-component model (CCM) information from a CCM candidate list. The CCM information being propagated from a source block is located by a cascaded vector that is derived from a base vector that is a motion vector or a block vector that is obtained from a candidate list for a coding tool that is used to decode the current block. The coding tool may be a merge mode for inter-prediction, merge mode for intra-block copy (IBC) , advanced motion vector prediction (AMVP) , merge mode with motion vector difference (MMVD) , intra template matching (IntraTMP) , template matching merge mode, bilateral matching merge mode, CIIP-TM mode, affine merge mode, etc.

The cascaded vector starts at the base vector, then further derived by recursively updating the cascaded vector based on a motion vector or a block vector of a current reference block and using the updated cascaded vector to locate a further reference block as the next current reference block. In some embodiments, a cascaded vector that is most recently updated by a block vector is not used to locate a CCM information for generating the cross-component prediction. The motion vector or the block vector of the current reference block may be retrieved from at least one of a set of pre-defined positions of the current reference block. In some embodiments, the number of cascaded vectors allowed to be derived based on the current reference block (at a particular trace depth) is constrained by a particular limit (e.g., 2) . In some embodiments, the number of cascaded vectors allowed to be derived from a same base vector is constrained by a particular limit.

The decoder applies (at block 1640) the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block.

The decoder uses (at block 1650) the generated cross-component prediction with corresponding prediction residual to reconstruct the current block. The decoder may then provide the reconstructed current block for display as part of the reconstructed current picture.
VII. Example Electronic System

Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium) . When these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.

In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.

FIG. 17 conceptually illustrates an electronic system 1700 with which some embodiments of the present disclosure are implemented. The electronic system 1700 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 1700 includes a bus 1705, processing unit (s) 1710, a graphics-processing unit (GPU) 1715, a system memory 1720, a network 1725, a read-only memory 1730, a permanent storage device 1735, input devices 1740, and output devices 1745.

The bus 1705 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1700. For instance, the bus 1705 communicatively connects the processing unit (s) 1710 with the GPU 1715, the read-only memory 1730, the system memory 1720, and the permanent storage device 1735.

From these various memory units, the processing unit (s) 1710 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure. The processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1715. The GPU 1715 can offload various computations or complement the image processing provided by the processing unit (s) 1710.

The read-only-memory (ROM) 1730 stores static data and instructions that are used by the processing unit (s) 1710 and other modules of the electronic system. The permanent storage device 1735, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1700 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1735.

Other embodiments use a removable storage device (such as a floppy disk, flash memory device, etc., and its corresponding disk drive) as the permanent storage device. Like the permanent storage device 1735, the system memory 1720 is a read-and-write memory device. However, unlike storage device 1735, the system memory 1720 is a volatile read-and-write memory, such a random access memory. The system memory 1720 stores some of the instructions and data that the processor uses at runtime. In some embodiments, processes in accordance with the present disclosure are stored in the system memory 1720, the permanent storage device 1735, and/or the read-only memory 1730. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1710 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.

The bus 1705 also connects to the input and output devices 1740 and 1745. The input devices 1740 enable the user to communicate information and select commands to the electronic system. The input devices 1740 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc. The output devices 1745 display images generated by the electronic system or otherwise output data. The output devices 1745 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.

Finally, as shown in FIG. 17, bus 1705 also couples electronic system 1700 to a network 1725 through a network adapter (not shown) . In this manner, the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1700 may be used in conjunction with the present disclosure.

Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) . Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc. ) , flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc. ) , magnetic and/or solid state hard drives, read-only and recordable discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

While the above discussion primarily refers to microprocessor or multi-core processors that execute software, many of the above-described features and applications are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) . In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs) , ROM, or RAM devices.

As used in this specification and any claims of this application, the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.

While the present disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the present disclosure can be embodied in other specific forms without departing from the spirit of the present disclosure. In addition, a number of the figures (including FIG. 13 and FIG. 16) conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the present disclosure is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.
Additional Notes

The herein-described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely examples, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermediate components. Likewise, any two components so associated can also be viewed as being "operably connected" , or "operably coupled" , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable" , to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

Further, with respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

Moreover, it will be understood by those skilled in the art that, in general, terms used herein, and especially in the appended claims, e.g., bodies of the appended claims, are generally intended as “open” terms, e.g., the term “including” should be interpreted as “including but not limited to, ” the term “having” should be interpreted as “having at least, ” the term “includes” should be interpreted as “includes but is not limited to, ” etc. It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an, " e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more; ” the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number, e.g., the bare recitation of "two recitations, " without other modifiers, means at least two recitations, or two or more recitations. Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. In those instances where a convention analogous to “at least one of A, B, or C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B. ”

From the foregoing, it will be appreciated that various implementations of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various implementations disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Claims

A video coding method comprising:

receiving data to be encoded or decoded as a current block of pixels of a current picture of a video, wherein the current block comprises a first-color block and a second-color block;

generating a reconstruction for the first-color block;

selecting a cross-component model (CCM) information from a CCM candidate list, the CCM information being propagated from a source block that is located by a cascaded vector, wherein the cascaded vector is derived from a base vector obtained from a candidate list of a non-intra-prediction coding tool that is used to encode or decode the current block;

applying the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block; and

using the generated cross-component prediction to encode or decode the current block.
The video coding method of claim 1, wherein the reconstruction for the first-color block is based on non-intra-prediction.
The video coding method of claim 1, wherein the coding tool is a merge mode for inter-prediction.
The video coding method of claim 1, wherein the coding tool is a merge mode for intra-block copy (IBC) that encodes or decodes the current block by referencing the current picture.
The video coding method of claim 1, wherein the coding tool is advanced motion vector prediction (AMVP) mode and the candidate list is a list of motion vector predictors (MVP) .
The video coding method of claim 1, wherein the cascaded vector starts from the base vector, and is further derived by recursively updating the cascaded vector based on a motion vector or a block vector of a current reference block and using the updated cascaded vector to locate a further reference block as the next current reference block.
The video coding method of claim 6, wherein when a cascaded vector is most recently updated by a block vector, a CCM information for generating the cross-component prediction is not located by the cascaded vector.
The video coding method of claim 6, wherein the motion vector or the block vector of the current reference block is retrieved from at least one of a set of pre-defined positions of the current reference block.
The video coding method of claim 6, wherein the number of cascaded vectors allowed to be derived based on the current reference block is constrained by a particular limit.
The video coding method of claim 6, wherein the number of cascaded vectors allowed to be derived from the base vector is constrained by a particular limit.
The video coding method of claim 1, wherein CCM information located by cascaded vectors for the current block are included in the CCM candidate list as a group of candidates.
The video coding method of claim 1, wherein the CCM information located by the cascaded vector is inserted into the candidate list immediately after a CCM information that is located by the base vector.
An electronic apparatus comprising:

a video coder circuit configured to perform operations comprising:

receiving data to be encoded or decoded as a current block of pixels of a current picture of a video, wherein the current block comprises a first-color block and a second-color block;

generating a reconstruction for the first-color block;

selecting a cross-component model (CCM) information from a CCM candidate list, the CCM information being propagated from a source block that is located by a cascaded vector, wherein the cascaded vector is derived from a base vector obtained from a candidate list of a non-intra-prediction coding tool that is used to encode or decode the current block;

applying the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block; and

using the generated cross-component prediction to encode or decode the current block.
A video decoding method comprising:

receiving data to be encoded or decoded as a current block of pixels of a current picture of a video, wherein the current block comprises a first-color block and a second-color block;

generating a reconstruction for the first-color block;

selecting a cross-component model (CCM) information from a CCM candidate list, the CCM information being propagated from a source block that is located by a cascaded vector, wherein the cascaded vector is derived from a base vector obtained from a candidate list of a non-intra-prediction coding tool that is used to decode the current block;

applying the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block; and

using the generated cross-component prediction to reconstruct the current block.
A video encoding method comprising:

receiving data to be encoded or decoded as a current block of pixels of a current picture of a video, wherein the current block comprises a first-color block and a second-color block;

generating a reconstruction for the first-color block;

selecting a cross-component model (CCM) information from a CCM candidate list, the CCM information being propagated from a source block that is located by a cascaded vector, wherein the cascaded vector is derived from a base vector obtained from a candidate list of a non-intra-prediction coding tool that is used to encode the current block;

applying the selected CCM information to the reconstruction of the first-color block to generate a cross-component prediction of the second-color block; and

using the generated cross-component prediction to encode the current block.