WO2026012280A1

WO2026012280A1 - Method and apparatus of inter estimation region for decoder-side derived inter-prediction mode and interccp merge mode in video coding

Info

Publication number: WO2026012280A1
Application number: PCT/CN2025/107008
Authority: WO
Inventors: Yu-Cheng Lin; Man-Shu CHIANG; Tzu-Der Chuang; Chih-Wei Hsu; Ching-Yeh Chen; Yi-Wen Chen; Yu-Wen Huang
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2024-07-12
Filing date: 2025-07-04
Publication date: 2026-01-15
Anticipated expiration: 2027-01-12

Abstract

A method and apparatus for deriving a target inter prediction mode according to decoder-side derived inter-prediction mode derivation or deriving interCCP merge mode are disclosed. According to this method, an inter estimation region from the current picture is determined and partitioned into one or more blocks. The target inter prediction mode or the interCCP merge mode is derived for each block in the inter estimation region by using corresponding reconstruction samples in a vertical and/or horizontal direction, neighbouring information, or both adjacent to the inter estimation region, or by using only available reconstruction samples or the neighbouring information adjacent to the inter estimation/region, and adjacent to said each of said one or more blocks.

Description

METHOD AND APPARATUS OF INTER ESTIMATION REGION FOR DECODER-SIDE DERIVED INTER-PREDICTION MODE AND INTERCCP MERGE MODE IN VIDEO CODING

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/670,235, filed on July 12, 2024. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to video coding system. In particular, the present invention relates to schemes to facilitate parallel processing for decoder-side derived inter-prediction mode derivation or interCCP merge mode by avoiding or reducing multiple access of neighbouring reconstruction samples.
BACKGROUND AND RELATED ART

Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.

Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing. For Intra Prediction 110, the prediction data is derived based on previously encoded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.

As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.

The decoder, as shown in Fig. 1B, can use some of the functional blocks as the encoder. For example, the decoder can reuse Inverse Quantization 124 and Inverse Transform 126; however, Transform 118 and Quantization 120 are not needed at the decoder. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.

Intra Mode Coding with 67 Intra Prediction Modes

To capture the arbitrary edge directions presented in natural video, the number of directional intra modes in VVC is extended from 33, as used in HEVC, to 65. The new directional modes not in HEVC are depicted as dotted arrows in Fig. 2, and the planar and DC modes remain the same. These denser directional intra prediction modes apply for all block sizes and for both luma and chroma intra predictions.

In VVC, several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes for the non-square blocks.

In HEVC, every intra-coded block has a square shape and the length of each of its side is a power of 2. Thus, no division operations are required to generate an intra-predictor using DC mode. In VVC, blocks can have a rectangular shape that necessitates the use of a division operation per block in the general case. To avoid division operations for DC prediction, only the longer side is used to compute the average for non-square blocks.

Wide-Angle Intra Prediction for Non-Square Blocks

Conventional angular intra prediction directions are defined from 45 degrees to -135 degrees in clockwise direction. In VVC, several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes for non-square blocks. The replaced modes are signalled using the original mode indexes, which are remapped to the indexes of wide angular modes after parsing. The total number of intra prediction modes is unchanged, i.e., 67, and the intra mode coding method is unchanged.

To support these prediction directions, the top reference with length 2W+1, and the left reference with length 2H+1, are defined as shown in Fig. 3A and Fig. 3B respectively.

The number of replaced modes in wide-angular direction mode depends on the aspect ratio of a block. The replaced intra prediction modes are illustrated in Table 1.
Table 1 - Intra prediction modes replaced by wide-angular modes

As shown in Fig. 4, two vertically-adjacent predicted samples (samples 410 and 412) may use two non-adjacent reference samples (samples 420 and 422) in the case of wide-angle intra prediction. Hence, low-pass reference samples filter and side smoothing are applied to the wide-angle prediction to reduce the negative effect of the increased gap Δp_α. If a wide-angle mode represents a non-fractional offset. There are 8 modes in the wide-angle modes satisfy this condition, which are [-14, -12, -10, -6, 72, 76, 78, 80] . When a block is predicted by these modes, the samples in the reference buffer are directly copied without applying any interpolation. With this modification, the number of samples needed to be smoothing is reduced. Besides, it aligns the design of non-fractional modes in the conventional prediction modes and wide-angle modes.

In VVC, 4: 2: 2 and 4: 4: 4 chroma formats are supported as well as 4: 2: 0. Chroma derived mode (DM) derivation table for 4: 2: 2 chroma format was initially ported from HEVC extending the number of entries from 35 to 67 to align with the extension of intra prediction modes. Since HEVC specification does not support prediction angle below -135° and above 45°, luma intra prediction modes ranging from 2 to 5 are mapped to 2. Therefore, chroma DM derivation table for 4: 2: 2 chroma format is updated by replacing some values of the entries of the mapping table to convert prediction angle more precisely for chroma blocks.

Intra Prediction in Enhanced Compression Model (ECM)

Decoder Side Intra Mode Derivation (DIMD)

When DIMD is applied, up to five intra modes are derived from the reconstructed neighbour samples, and those five predictors are combined with the planar mode predictor with the weights derived from the histogram of gradients as described in JVET-O0449 (Mohsen Abdoli, et al., “Non-CE3: Decoder-side Intra Mode Derivation with Prediction Fusion Using Planar” , Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 15th Meeting: Gothenburg, SE, 3–12 July 2019, Document JVET-O0449) . The division operations in weight derivation are performed utilizing the same lookup table (LUT) based integerization scheme used by the CCLM. For example, the division operation in the orientation calculation,
Orient=G_y/G_x
is computed by the following LUT-based scheme:
x = Floor (Log2 (Gx) )
normDiff = ( (Gx<< 4) >> x) &15
x += (3 + (normDiff ! = 0) ? 1: 0)
Orient = (Gy* (DivSigTable [normDiff] | 8) + (1<< (x-1) ) ) >> x,
where
DivSigTable [16] = {0, 7, 6, 5 , 5, 4, 4, 3, 3, 2, 2, 1, 1, 1, 1, 0} .

For a block of size W×H, the weight for each of the five derived modes is modified if the above or left histogram magnitudes is twice larger than the other one. In this case, the weights are location dependent and computed as follows.

If the above histogram is twice the left, then:

If the left histogram is twice the above, then:

where wDimd_i is the unmodified uniform weight of the DIMD selected as in JVET-O0449,
Δ_i is pre-defined and set to 10.

Derived intra modes are included into the primary list of intra most probable modes (MPM) , so the DIMD process is performed before the MPM list is constructed. The primary derived intra mode of a DIMD block is stored with a block and is used for MPM list construction of the neighbouring blocks.

Finally, note the region of neighbouring reconstructed samples used for computing the histogram of gradients is modified compared to JVET-O0449 method, depending on reconstructed samples availability. The region of decoded reference samples of current WxH luma CB is extended towards the above-right side if available, up to W additional columns. It is extended towards the bottom-left side if available, up to H additional rows.

DIMD Chroma Mode

The DIMD chroma mode uses the DIMD derivation method to derive the chroma intra prediction mode of the current block based on the neighbouring reconstructed Y, Cb and Cr samples in the second neighbouring row and column as shown in Figs. 5A-C for Y, Cb and Cr components (Fig. 5A, Fig. 5B and Fig. 5C) respectively. Specifically, a horizontal gradient and a vertical gradient are calculated for each collocated reconstructed luma sample of the current chroma block, as well as the reconstructed Cb and Cr samples, to build a HoG. Then the intra prediction mode with the largest histogram amplitude values is used for performing chroma intra prediction of the current chroma block.

When the intra prediction mode derived from the DIMD chroma mode is the same as the intra prediction mode derived from the DM mode, the intra prediction mode with the second largest histogram amplitude value is used as the DIMD chroma mode. A CU level flag is signalled to indicate whether the proposed DIMD chroma mode is applied.

Finally, the luma region of reconstructed samples used for computing the histogram of gradients for chroma DIMD mode is modified compared to JVET-O0449. For a WxH pair of chroma CBs to predict, to build the histogram of gradients associated to the collocated luma CB, the pairs of a vertical gradient and a horizontal gradient are extracted from the second and third lines in this luma CB instead of being extracted from the regular set of DIMD decoded reference samples around this luma CB.

JVET-AG0141 AHG 12: Occurrence-Based Intra Coding (OBIC)

The Occurrence-Based Intra Coding (OBIC) method derives the intra prediction modes of the current block based on the sample-wise occurrence of the intra modes in the spatial neighbourhood of the block. For this, adjacent and non-adjacent spatial neighbouring blocks are checked and the intra prediction modes of the blocks are collected into an occurrence histogram. The occurrence histogram consists of the intra modes and their sample-wise occurrences. The occurrence values are calculated based on the number of samples that are coded in a certain intra prediction mode in that neighbourhood. For example, if a uiWidth × uiHeight block is coded with an IPM mode, the occurrence of the mode in that particular block is calculated as:
Histogram [IPM] += uiWidth *uiHeight;
where uiWidth and uiHeight are the width and height of a spatial neighbouring block.

The occurrences of the existing modes from the spatial neighbourhood blocks are aggregated into the histogram. Fig. 6 shows the non-adjacent spatial neighbouring blocks that are used in OBIC mode’s histogram generation. An example of the histogram of occurrences of IPM modes in the spatial neighbourhood of a CU is shown in Fig. 7.

Up to 5 angular modes with the highest occurrence along with the planar mode are selected from the histogram and used for final prediction by blending the prediction of the selected modes.

Some blocks, mentioned below, use more than one intra mode for prediction. In such cases, all the intra modes of such blocks are selected and used when creating the OBIC histogram:
· DIMD: up to 5 angular modes
· TIMD: up to 2 modes
· SGPM: 2 modes
· OBIC: up to 5 angular modes

Moreover, the intra modes of following blocks are not considered when creating the histogram of OBIC mode:
· MIP block
· IntraTMP block
· IBC block

The blending weights are calculated similar to the DIMD mode, but instead of using gradient values from the template, the occurrence values are used for OBIC. Moreover, the planar mode’s weight is also decided similarly to DIMD mode.

The OBIC mode is only used in luma blocks.

Bitstream Signalling

Usage of the mode is signalled with a CABAC coded PU level flag. The OBIC mode is used as a sub-mode of DIMD and its flag is signalled after DIMD flag.

Moreover, the OBIC mode accounts for one additional RD check at the encoder side.

JVET-AG0084 DIMD Merge Mode

DIMD merge mode includes a step of merging the HoG of neighbouring blocks to derive DIMD information. The DIMD information in the surroundings can be used to derive the merged HoG (MHoG) . The MHoG from up to 13 CUs is used to derive intra prediction modes and weights, as in conventional DIMD. The directional modes and the respective weights corresponding to the five highest amplitudes in the MHoG are selected, and the corresponding predictors are blended as in conventional DIMD. In JVET-AF012 (S. Blasi, I. Zupancic, J. Lainema, “EE2-2.1 DIMD merge, ” JVET-AF0120, October 2023) , a reduced storage version of the DIMD merge is proposed where the five highest amplitudes of the histograms are stored and averaged.

In JVET-AF0106 (J. Huo, J. Fan, Z. Zhang, Y. Ma, F. Yang, M. Li, “EE2-related: Non-adjacent spatial candidates for DIMD merge, ” JVET-AF0106, October 2023) , the DIMD merge mode is extended to include up to 31 surrounding CUs to derive the MHoG. Indeed, on top of the original 13 CUs, non-adjacent spatial candidates are considered as in Fig. 6.

A new flag is introduced and is signalled just after the DIMD flag if the DIMD merge mode is true.

DIMD Merge Mode List

This contribution proposes to create a DIMD merge list from neighbouring blocks’ DIMD information, which includes:
· DIMD information from spatial neighbours (as in Fig. 8) ,
· DIMD information from non-adjacent neighbours (as in Fig. 6) ,
· DIMD information derived from the MHoG.

Two redundancy checks (pruning stage) are applied: one comparing the DIMD merge candidates, and one comparing the DIMD information derived from the current block. Thus, a DIMD merge candidate is added to the DIMD merge list when the associated DIMD information is different from the existing information in DIMD merge candidates and the current block DIMD information.

Two additional flags are added conditionally to the DIMD flag, i.e., the DIMD Merge is considered as a sub-mode of DIMD. The two flags are:
· The DIMD merge mode flag
· The DIMD merge mode index representative of the DIMD merge candidate.

DIMD Merge Candidates’ Evaluation

DIMD merge is only available as an option if the current block has at least one neighbour coded with DIMD or DIMD merge modes using the same method as proposed in JVET-AE0071 (S. Blasi, I. Zupancic, J. Lainema, “AHG12 -Decoder-side Intra Mode Derivation Merge, ” JVET-AE0071, July 2023) . Besides, the neighbouring positions or neighbouring blocks are extended to include part of the non-adjacent candidates.

When the DIMD merge is available, the DIMD merge list candidate is derived both at the encoder and decoder sides. On the encoder side, each candidate is evaluated using an Hadamard pass, then in the RDO loop if the associated Hadamard-based cost is competitive.

A DIMD merge candidate may be discarded if the associated cost is not competitive compared to other modes.

JVET-AG0078 AHG12: Intra-Prediction Using Merged Histogram of Gradients

This contribution proposes to add a new intra prediction mode, referred to as Merged Intra Mode Derivation (MIMD) , based on the computation of a Merged Histogram of Gradients (MHoG) . Similar to DIMD, up to five MIMD modes are derived from the MHoG and are then blended together. The derivation of the modes and blending weights follows the same process to derive DIMD modes and blending weights from the HoG. But differently than DIMD, the MHoG is not computed directly analysing the template samples, but rather is computed based on information extracted from neighbouring blocks.

In particular, a number of N neighbouring blocks is considered. A neighbouring block is considered if it is encoded with at least one directional intra-prediction mode. In case the neighbouring block i is encoded using DIMD or MIMD, then its HoG or MHoG is directly considered as H_i, where H_i (m) refers to the amplitude of directional mode m in the HoG, where m can take values from 0 to M where M is the maximum number of intra-prediction modes. A normalisation process can be used when considering H_i.

In case the neighbouring block i is instead encoded using a non-DIMD intra-prediction directional mode m, then an HoG H_i is derived for that neighbouring block, where H_i (k) =0 for k=0, 1, …M, k≠m and H_i (m) =A, where value A depends on the size of the current block. For neighbouring blocks encoded using SGPM or TIMD where more than one directional intra prediction modes may be available, both directional modes can be considered in the derivation of H_i.

Then, the MHoG can be computed using all the HoGs extracted from available neighbouring blocks as:

Finally, the MHoG is used to compute MIMD modes and weights. The directional modes and their weights corresponding to the five highest amplitudes in the MHoG are selected as directional modes and weights for MIMD.

Integration in ECM

MIMD is signalled as a sub-mode of DIMD. In order to speed-up the encoding process, MIMD is only signalled for blocks with an area larger than 4x4 samples. Also, MIMD is only signalled for blocks that have one immediate neighbour above or on the left that is encoded using DIMD or MIMD. Under these conditions, MIMD is then signalled with a CABAC coded CU level flag. One new CABAC context was included to support coding of the MIMD flag. As a further complexity optimisation, the planar prediction used within the DIMD and MIMD blending process was modified to make use of PDPC. This allows the intra-prediction blocks computed during the SATD stage at the encoder side to be reused to form the DIMD and MIMD final predictors.

JVET-Q0185 AHG16: On Merge Estimation Region for VVC

MER is included in HEVC. MERs are non-overlapping square regions, and MER is used at the encoder side to estimate costs of merging candidates in parallel for different CUs in one MER. When MER is applied, a spatial merging candidate can be added into merging candidate list only when the current CU and the neighbouring CU are in the different MERs. In low cost real-time HEVC encoders, MER is commonly applied in many practical products. Therefore, it is desirable to make the MER feature available in VVC with minor normative changes and proper encoder-only constraints. Details are described in the following sections.

It is proposed to directly apply square MER in HEVC to VVC. The operations and syntax of the proposed MER for VVC are basically the same as those in HEVC. When MER is applied, a spatial merging candidate can be added into merging candidate list only when the current CU and the neighbouring CU are in the different MERs. Moreover, MER for VVC is extended to not only consider spatial merging candidates but also subblock-based merging candidates including subblock-based temporal motion vector prediction (SbTMVP) merging candidate, luma affine control point motion vectors from a neighbouring block (a.k.a. inherited affine merging candidates) , and constructed affine control point motion vector merging candidates (a. k. a. constructed affine merging candidates) .

MER is an important feature for a commercial hardware encoder because it can effectively reduce bubble cycles. The bubble cycles are similar to the waiting time for RDO stage results of previous CUs needed before starting the predictor stage of the current CU in the merge mode. Accordingly, corresponding circuit has to wait for some neighbouring data in order to generate the current predictor, and the waiting time can be regarded as bubbles for this circuit since the input and the output of this circuit are considered garbage values and not useful. As shown in Fig. 9, the predictor stage 910 and the rate-distortion optimization (RDO) stage 920 in an encoder are shown. The predictor stage includes intra and inter prediction, the RDO stage includes the rate and distortion calculations and the final mode decision. If without MER, before starting the predictor stage of the current CU in merge mode, the predictor stage needs to wait for the RDO stage results of previous CUs, because the MVs of the previous CUs can be spatial neighbours of the current CU. By using MER, such waiting time can be eliminated inside MER. Accordingly, merge mode decision of all CUs inside the MER region can be performed at the same time without waiting for each other, so the bubble cycles can be saved. In one example, in a hardware encoder architecture for VVC, if without MER, the bubble cycles will be 2.38 times compared to the actual processing (non-bubble) cycles. If with 32x32 MER region, the bubble cycles will be largely reduced to only 0.27 times compared to the actual processing cycles. Moreover, MER is also a commonly used tool for a commercial real-time hardware encoder. In our survey on several HEVC hardware encoders from several major companies, MER is activated in most cases.

In order to guarantee allowing parallel processing of merge modes inside MER, in addition to the HEVC-based MER, two encoder-only constraints are needed for VVC, detailed as follows.

Firstly, as a general rule for MER, any CU that is not smaller than MER size has to contain one or more complete MERs, and any CU that is smaller than MER size has to be located within one MER. To satisfy this rule, a new encoder-only constraint for binary tree (BT) split and ternary tree (TT) split is applied in inter slice. The following shows the detailed split constraint:
· When the current slice is inter slice, and the current CU width is larger than MER width or
the current CU height is larger than MER height, the following applies:
○ When CU height smaller than or equal to MER height, disallow horizontal BT split
○ When CU width smaller than or equal to MER width, disallow vertical BT split
○ When CU height smaller than or equal to 2 times MER height, disallow horizontal TT split
○ When CU width smaller than or equal to 2 times MER width, disallow vertical TT split.

Secondly, a new encoder-only constraint for history-based motion vector prediction (HMVP) merging candidates is applied. In this constraint, for any CU that is contained within one MER, the HMVP candidates and merging candidates after HMVP in the merging candidate list (i.e., pairwise average merging candidate and zero candidates) are not used. With the first encoder-only constraint, the second encoder-only constraint is equivalent to not using HMVP candidates and merging candidates after HMVP in the merging candidate list when the current CU width is smaller than MER width or the current CU height is smaller than MER height. The constraint is used to break the dependency caused by updating HMVP table between different CUs in one MER in order to allow parallel processing. The second constraint affects 4 merge modes including merge with motion vector difference (MMVD) mode, non-MMVD regular merge mode, combined inter/intra prediction (CIIP) mode, and triangular partitioning mode (TPM) .

ECM 11 Template Matching (TM)

Template matching is a decoder-side MV derivation method to refine the motion information of the current CU by finding the closest match between a template (i.e., top and/or left neighbouring blocks of the current CU) in the current picture and a template (i.e., same size to the template) in a reference picture. As illustrated in Fig. 10, a better MV (i.e., a refined MV) is searched around the initial motion of the current CU within a [–8, +8] -pel search range. The template matching method in JVET-J0021 is used with the following modifications: search step size is determined based on AMVR mode and TM can be cascaded with bilateral matching process in merge modes.

In AMVP mode, an MVP candidate is determined based on template matching error to select the one which reaches the minimum difference between the current block template and the reference block template, and then TM is performed only for this particular MVP candidate for MV refinement. TM refines this MVP candidate, starting from full-pel MVD precision (or 4-pel for 4-pel AMVR mode) within a [–8, +8] -pel search range by using iterative 16-point diamond search. The AMVP candidate may be further refined by using cross search with full-pel MVD precision (or 4-pel for 4-pel AMVR mode) , followed sequentially by half-pel and quarter-pel ones depending on AMVR mode as specified in Table 2. This search process ensures that the MVP candidate still keeps the same MV precision as indicated by the AMVR mode after TM process. In the search process, if the difference between the previous minimum cost and the current minimum cost in the iteration is less than a threshold that is equal to the area of the block, the search process terminates.
Table 2. Search patterns of AMVR and merge mode with AMVR.

In merge mode, similar search method is applied to the merge candidate indicated by the merge index. As shown in Table 2, TM may be performed all the way down to 1/8-pel MVD precision or skipping those beyond half-pel MVD precision, depending on whether the alternative interpolation filter (which is used when AMVR is half-pel mode) is used according to merged motion information. Besides, when TM mode is enabled, template matching may work as an independent process or an extra MV refinement process between block-based and subblock-based bilateral matching (BM) methods, depending on whether BM can be enabled or not according to its enabling condition check.

When TM is applied to bi-predictive blocks, an iterative process is used. Specifically, the initial motion vectors of L0 and L1 are firstly refined and TM costs Cost₀ and Cost₁ are calculated for L0 and L1, respectively. When Cost₀ is larger than Cost₁, the refined motion vector of L1 (MV’1) is used to derive a further refined motion vector of L0 (MV’0) . Then, the MV’1 is further refined using MV’0. Similarly, when Cost₀ is not larger than Cost₁, the refined motion vector of L0 (MV’0) is used to derive a further refined motion vector of L1 (MV’1) , and the MV’0 is further refined using MV’1. Besides, TM for bi-prediction is enabled when DMVR condition is satisfied.

In the present invention, techniques to facilitate parallel processing for decoder-side derived inter-prediction mode derivation or interCCP merge mode by avoiding or reducing multiple access of neighbouring reconstruction samples are disclosed.
BRIEF SUMMARY OF THE INVENTION

A method and apparatus for video coding are disclosed. According to this method, input data associated with a current block is received, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side. An inter estimation region from the current picture is determined, wherein the inter estimation region is partitioned into one or more blocks. A target inter prediction mode according to decoder-side derived inter-prediction mode derivation or interCCP merge mode is derived for each of said one or more blocks by using corresponding reconstruction samples in a vertical and/or horizontal direction, neighbouring information, or both adjacent to the inter estimation region, or by using only available reconstruction samples or the neighbouring information adjacent to the inter estimation region, and adjacent to said each of said one or more blocks, wherein any reconstructed samples are excluded from said deriving the target inter prediction mode or said deriving the interCCP merge mode. Said each of said one or more blocks is encoded or decoded according to the target inter prediction mode or the interCCP merge mode.

In one embodiment, when a spatial neighbouring block for a target block of said one or more blocks is located within the inter estimation region, corresponding vertical and/or horizontal samples or the neighbouring information adjacent to the inter estimation region are used to derive the target inter prediction mode or the interCCP merge mode.

In one embodiment, when a spatial neighbouring block for a target block of said one or more blocks is located within the inter estimation region, the spatial neighbouring block is excluded from deriving the target inter prediction mode or the interCCP merge mode.

In one embodiment, only the available reconstruction samples or the neighbouring information adjacent to the inter estimation region, and adjacent to a target block of said one or more blocks are used to derive the target inter prediction mode or the interCCP merge mode for the target block of said one or more blocks.

In one embodiment, corresponding vertical and/or horizontal samples or the neighbouring information adjacent to the inter estimation region are used to derive the target inter prediction mode or the interCCP merge mode.

In one embodiment, adjacent neighbouring positions of the inter estimation region, and non-adjacent neighbouring positions are used for generating the neighbouring information. In one embodiment, one or more inside blocks within the inter estimation region are used as a centre unit for the non-adjacent neighbouring positions and block width and block height of said one or more inside blocks are used to determine non-adjacent horizontal and vertical locations respectively.

In one embodiment, the neighbouring information used to generate the interCCP merge mode comprise sample value, template predictor, reconstruction samples, coded CU information, or a combination thereof.

In one embodiment, one or more inter lists for the interCCP merge mode are generated according to the neighbouring information for individual inter estimation region, and said one or more blocks inside the inter estimation region use said one or more inter lists to avoid or reduce multiple accesses of neighbouring reconstruction samples.

In one embodiment, when the interCCP merge mode is applied to a target block of said one or more blocks, prediction of the target block is formed by cross-component prediction or a combination of motion compensation prediction and the cross-component prediction. In one embodiment, the cross-component prediction is generated using one or more cross-component models selected from a cross-component candidate list (CCCL) containing one or more cross-component models. In one embodiment, the CCCL comprises one or more spatial adjacent and/or non-adjacent candidates, one or more temporal candidates, one or more history candidates, one or more default candidates, or a combination thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing.

Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.

Fig. 2 shows the intra prediction modes as adopted by the VVC video coding standard.

Figs. 3A-B illustrate examples of wide-angle intra prediction a block with width larger than height (Fig. 3A) and a block with height larger than width (Fig. 3B) .

Fig. 4 illustrate examples of two vertically-adjacent predicted samples using two non-adjacent reference samples in the case of wide-angle intra prediction.

Figs. 5A-C illustrate an example of the DIMD chroma mode using the DIMD derivation method to derive the chroma intra prediction mode of the current block based on the neighbouring reconstructed Y (Fig. 5A) , Cb (Fig. 5B) and Cr (Fig. 5C) samples in the second neighbouring row and column.

Fig. 6 illustrates an example of non-adjacent spatial neighbouring candidates for OBIC mode and DIMD mode.

Fig. 7 illustrates an example of the histogram of occurrences of IPM modes in the spatial neighbourhood of a CU.

Fig. 8 illustrates an example of spatial neighbours for deriving DIMD information.

Fig. 9 illustrates an example of hardware encoder architecture.

Fig. 10 illustrates an example of template matching performed on a search area around the initial MV to refine the MV.

Figs. 11A-B illustrate an example of neighbouring reconstruction usage in inter estimation region or the inter shared region according to one embodiment of the present invention.

Figs. 12A-B illustrate another example of neighbouring reconstruction usage in inter estimation region or the inter shared region according to one embodiment of the present invention.

Fig. 13 illustrates an example of reconstruction sample or neighbouring information usage of inter shared region or inter estimation region.

Fig. 14 illustrates a flowchart of an exemplary video coding system that uses a scheme to facilitate parallel processing for decoder-side derived inter-prediction mode derivation or interCCP merge mode by avoiding or reducing multiple access of neighbouring reconstruction samples according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.

In this disclosure, several new methods related to decoder-side derived inter-prediction mode and interCCP merge modes are disclosed.

The decoder-side derived inter-prediction mode can be template-matching related modes, boundary matching, or bilateral matching related modes. The interCCP merge modes can be CCP modes, CCP models, or CCP related modes. In the existing design, reconstruction samples can be frequently accessed because of template-matching or other decoder-side derived inter-prediction modes, which is not friendly for parallel process in hardware. To benefit parallel processing of decoder-side derived inter-prediction mode and interCCP merge mode, inter estimation region is proposed. Inside the inter estimation region, when computing decoder-derived inter-prediction mode or interCCP merge mode, blocks cannot use the neighbouring reconstruction samples within inter estimation region. Blocks may only use the reconstruction samples outside the inter estimation region or reconstruction samples adjacent to the inter estimation region. Therefore, the reconstruction samples for decoder-side derived inter-prediction mode or interCCP merge mode can remain unchanged or can be changed less frequently. Thus, parallel processing for decoder-side derived inter-prediction mode or interCCP merge mode is more feasible.

Inter Estimation Region for Decoder-Side Derived Inter-Prediction Mode

An inter estimation region is pre-defined. The inter estimation region can be square or non-square, and there can be one or more blocks inside the inter estimation region. Inside the region, when computing decoder-derived inter-prediction mode, blocks cannot use the neighbouring reconstruction samples or neighbouring information inside the inter estimation region. Blocks may only use the reconstruction samples or neighbouring information outside the inter estimation region or reconstruction samples adjacent to the region. The neighbouring information, such as sample value, template predictor, reconstruction samples, and coded CU information may be used to generate the decoder-side derived inter-prediction mode. One or multiple inter lists for decoder-side derived inter-prediction mode can be generated inside the inter estimation region.

Example 1: Inter estimation region &neighbouring reconstruction samples or neighbouring information usage of blocks inside inter estimation region

In one embodiment, as shown in Fig. 11A and Fig. 11B, one or more blocks are inside an inter estimation region. When utilizing neighbouring reconstruction samples or neighbouring information for decoder-side derived inter-prediction mode, block A can use the reconstruction samples at AL1, A1, AR1, L1 and LB1 as shown in Fig. 11A. But block B can only use reconstruction samples located at AL2, AR2, L2 and LB2 as shown in Fig. 11B since A2 is located inside the inter estimation region.

In another embodiment, as shown in Fig. 12A and Fig. 12B, one or more blocks are inside an inter estimation region. When utilizing neighbouring reconstruction samples or neighbouring information for decoder-side derived inter-prediction mode, block A can use the reconstruction samples at AL1, A1, AR1, L1 and LB1 as shown in Fig. 12A. But block B can only use reconstruction samples located at AL2, AR2, L2, LB2 and A1 as shown in Fig. 12B since A2 is located inside the inter estimation region and A1 is reconstruction samples adjacent to region.

Example 1-1. Reconstruction samples or neighbouring information usage of blocks inside inter estimation region

In one embodiment, in an inter estimation region, one or more blocks utilize the corresponding vertical and horizontal reconstruction samples or neighbouring information adjacent to the inter estimation region to perform decoder-side derived inter-prediction mode derivation, such as template matching. For example, as shown in Fig. 13, for block A inside the inter estimation region, reconstruction samples (a) and (c) are utilized to perform template matching. For another example, for block B inside the inter estimation region, reconstruction samples (b) and (c) are utilized to perform template matching.

In another embodiment, in an inter estimation region, one or more blocks utilize only available reconstruction samples or neighbouring information adjacent to the inter estimation region and adjacent to current block to perform decoder-side derived inter-prediction mode derivation, such as template matching. For example, as shown in Fig. 13, for block B inside the inter estimation region, reconstruction samples (b) are utilized to perform template matching and reconstruction samples (c) are not utilized since it is not adjacent to current block B.

In another embodiment, in an inter estimation region, one or more blocks utilize the corresponding vertical and horizontal reconstruction samples or neighbouring information adjacent to the inter estimation region to perform decoder-side derived inter-prediction mode derivation, for example, boundary matching. For example, as shown in Fig. 13, for block D inside the inter estimation region, reconstruction samples (b) and (d) are utilized to perform boundary matching.

In another embodiment, one or more blocks utilize only available reconstruction samples or neighbouring information adjacent to the inter estimation region and adjacent to current block to perform decoder-side derived inter-prediction mode derivation, for example, boundary matching. For example, as shown in Fig. 13, for block C inside the inter estimation region, reconstruction samples (d) are utilized to perform boundary matching and reconstruction samples (a) are not utilized since it is not adjacent to current block C.

In another embodiment, adjacent neighbouring positions of an inter estimation region are considered when generating neighbouring information, Besides, non-adjacent neighbouring positions may also be taken into consideration during the derivation.

In another embodiment, the estimation region as a centre unit for non-adjacent neighbouring positions is considered when generating neighbouring information. For instance, non-adjacent neighbouring positions may include block A and B as a unit and use (block width A + block width B) and (block height A + block height B) to determine the non-adjacent horizontal and vertical locations.

In another embodiment, one or more blocks inside estimation region as a centre unit for non-adjacent neighbouring positions are considered when generating neighbouring information. For instance, non-adjacent neighbouring positions may include block A as a unit and use block width A and block height A to determine the non-adjacent horizontal and vertical locations.

Example 1-2: History buffer of decoder-side derived inter-prediction mode and history buffer update

In another embodiment, decoder-side derived inter-prediction mode is stored in a history buffer during encoding and decoding. During encoding and decoding, the history buffer is also used in an inter estimation region to derive the neighbouring information for decoder-side derived inter-prediction mode. The history buffer can be a first-in-first-out or last-in-first-out buffer.

In another embodiment, the history buffer of decoder-side derived inter-prediction mode is not updated in the current inter estimation region during encoding or decoding. The history buffer is updated after the current inter estimation region is encoded or decoded.

In another embodiment, the history buffer of decoder-side derived inter-prediction mode is reset per region, per CTU, or per multiple CTUs, per slice, per multiple slices, or per frame.

In another embodiment, the history buffer for decoder-side derived inter-prediction mode will be updated only for the first M blocks or the last N blocks during encoding or decoding inside an inter estimation region, where M and N are integers larger than or equal to 0. For instance, as shown in Fig. 13, the history buffer for decoder-side derived inter-prediction mode will be updated after encoding or decoding block D. The updated history buffer for decoder-side derived inter-prediction mode is used for the following inter estimation regions.

In another embodiment, the history buffer for decoder-side derived inter-prediction mode will be updated only for some specific blocks during encoding or decoding inside an inter estimation region. For example, the specific blocks may correspond to the largest area inside an inter estimation region or the most frequently appeared block inside an inter estimation region.

Inter Estimation Region for interCCP Merge Mode

An inter estimation region is pre-defined and can be square or non-square, and there can be one or more blocks inside the inter estimation region. The interCCP merge mode can be CCP modes or CCP models or CCP related modes. Inside the region, when computing decoder-derived inter-prediction mode, blocks cannot use the neighbouring reconstruction samples or neighbouring information inside the inter estimation region. Blocks may only use the reconstruction samples or neighbouring information outside the inter estimation region or reconstruction samples adjacent to the region. Some neighbouring information, such as sample value, template predictor, reconstruction samples, or coded CU information, may be used to generate the interCCP merge mode. One or more inter lists for interCCP mode will be generated inside inter estimation region.

When interCCP merge mode is applied to the current block, the prediction of the current block is formed by cross-component prediction or a combination of motion compensation prediction and cross-component prediction. The cross-component prediction is generated using one or more cross-component models selected from a cross-component candidate list (CCCL) containing one or more cross-component models.

In one embodiment, CCCL includes one or more spatial adjacent and/or non-adjacent candidates which reference the cross-component models from the spatial adjacent and/or non-adjacent neighbouring coded blocks, one or more temporal candidates which reference the cross-component models from the coded blocks in the previous coded picture, one or more history candidates which reference the cross-component models from a history buffer, one or more default candidates which derive models using the existing candidates in CCCL or using any pre-defined method for the current block, or any subset or extension of the above-mentioned candidates.

An inter estimation region is used to build/derive CCCL.

Example 1: Inter estimation region and neighbouring information usage of blocks inside inter estimation region

In one embodiment, as shown in Fig. 11A and Fig. 11B, one or more blocks are inside an inter estimation region. When utilizing neighbouring information for interCCP merge mode, block A can use information (for example, any cross-component information such as cross-component models) located at AL1, A1, AR1, L1 and LB1 as shown in Fig. 11A. But block B can only use information located at AL2, AR2, L2 and LB2 as shown in Fig. 11B since A2 is located inside the inter estimation region.

In another embodiment, as shown in Fig. 12A and Fig. 12B, one or more blocks are inside an inter estimation region. When utilizing neighbouring information for interCCP merge mode, block A can use information located at AL1, A1, AR1, L1 and LB1 as shown in Fig. 12A. But block B can only use information located at AL2, AR2, L2, LB2 and A1 as shown in Fig. 12B since A2 is located inside the inter estimation region and A1 is adjacent to the inter estimation region.

In another embodiment, the design of inter estimation region used for building CCCL is unified with the design of inter estimation region used for other inter prediction tools.

Reconstruction Cost Calculation in Inter Estimation Region

In one embodiment, for reconstruction cost calculation, such as template cost or boundary cost, partial or all reconstruction samples adjacent to the inter estimation region are used. As shown in Fig. 13, reconstruction samples (a) , (b) , (c) and (d) can be used for block A. In another example, for block B, only reconstruction samples (b) and (c) are used.

In another embodiment, different weightings for reconstruction samples adjacent to inter estimation region are considered. For example, the reconstruction lines closer to region boundary use larger weightings.

Any of the foregoing proposed methods of target inter prediction mode derivation according to decoder-side derived inter-prediction mode derivation or interCCP merge mode derivation can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in predictor derivation module of an encoder, and/or a predictor derivation module of a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the predictor derivation module of the encoder and/or the predictor derivation module of the decoder, so as to provide the information needed by the predictor derivation module.

The proposed methods of target inter prediction mode derivation according to decoder-side derived inter-prediction mode derivation or interCCP merge mode derivation as described above can be implemented in an encoder side or a decoder side. For example, any of the proposed methods can be implemented in an Intra prediction module (e.g. Intra Pred. 150 in Fig. 1B) in a decoder or an Intra prediction module in an encoder (e.g. Intra Pred. 110 in Fig. 1A) . Any of the proposed methods can also be implemented as a circuit coupled to the intra coding module at the decoder or the encoder. However, the decoder or encoder may also use additional processing unit to implement the required processing. While the Intra prediction units (e.g. unit 110 in Fig. 1A and unit 150 in Fig. 1B) are shown as individual processing units, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .

Fig. 14 illustrates a flowchart of an exemplary video coding system that uses a scheme to facilitate parallel processing for decoder-side derived inter-prediction mode derivation or interCCP merge mode by avoiding or reducing multiple access of neighbouring reconstruction samples according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, input data associated with a current block are received in step 1410, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current block to be decoded at a decoder side. An inter estimation region from the current picture is determined in step 1420, wherein the inter estimation region is partitioned into one or more blocks. A target inter prediction mode according to decoder-side derived inter-prediction mode derivation or interCCP merge mode is derived for each of said one or more blocks in step 1430 by using corresponding reconstruction samples in a vertical and/or horizontal direction, neighbouring information, or both adjacent to the inter estimation region, or by using only available reconstruction samples or the neighbouring information adjacent to the inter estimation region, and adjacent to said each of said one or more blocks, wherein any reconstructed samples are excluded from said deriving the target inter prediction mode or said deriving the interCCP merge mode. Said each of said one or more blocks is encoded or decoded according to the target inter prediction mode or the interCCP merge mode in step 1440.

The flowchart shown is intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method of video coding, the method comprising:

receiving input data associated with a current picture, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current picture to be decoded at a decoder side;

determining an inter estimation region from the current picture, wherein the inter estimation region is partitioned into one or more blocks;

deriving a target inter prediction mode according to decoder-side derived inter-prediction mode derivation or deriving interCCP merge mode for each of said one or more blocks by using corresponding reconstruction samples in a vertical and/or horizontal direction, neighbouring information, or both adjacent to the inter estimation region, or by using only available reconstruction samples or the neighbouring information adjacent to the inter estimation region, and adjacent to said each of said one or more blocks, wherein any reconstructed samples inside the inter estimation region are excluded from said deriving the target inter prediction mode or said deriving the interCCP merge mode; and

encoding or decoding said each of said one or more blocks according to the target inter prediction mode or the interCCP merge mode.
The method of Claim 1, wherein when a spatial neighbouring block for a target block of said one or more blocks is located within the inter estimation region, corresponding vertical and/or horizontal samples or the neighbouring information adjacent to the inter estimation region are used to derive the target inter prediction mode or the interCCP merge mode.
The method of Claim 1, wherein when a spatial neighbouring block for a target block of said one or more blocks is located within the inter estimation region, the spatial neighbouring block is excluded from deriving the target inter prediction mode or the interCCP merge mode.
The method of Claim 1, wherein only the available reconstruction samples or the neighbouring information adjacent to the inter estimation region, and adjacent to a target block of said one or more blocks are used to derive the target inter prediction mode or the interCCP merge mode for the target block of said one or more blocks.
The method of Claim 1, wherein corresponding vertical and/or horizontal samples or the neighbouring information adjacent to the inter estimation region are used to derive the target inter prediction mode or the interCCP merge mode.
The method of Claim 1, wherein adjacent neighbouring positions of the inter estimation region, and non-adjacent neighbouring positions are used for generating the neighbouring information.
The method of Claim 6, wherein one or more inside blocks within the inter estimation region are used as a centre unit for the non-adjacent neighbouring positions and block width and block height of said one or more inside blocks are used to determine non-adjacent horizontal and vertical locations respectively.
The method of Claim 1, wherein the neighbouring information used to generate the interCCP merge mode comprises sample value, template predictor, reconstruction samples, coded CU information, or a combination thereof.
The method of Claim 1, wherein one or more inter lists for the interCCP merge mode are generated according to the neighbouring information for individual inter estimation region, and said one or more blocks inside the inter estimation region use the said one or more inter lists to avoid or reduce multiple accesses of neighbouring reconstruction samples.
The method of Claim 1, wherein when the interCCP merge mode is applied to a target block of said one or more blocks, prediction of the target block is formed by cross-component prediction or a combination of motion compensation prediction and the cross-component prediction.
The method of Claim 10, wherein the cross-component prediction is generated using one or more cross-component models selected from a cross-component candidate list (CCCL) containing one or more cross-component models.
The method of Claim 11, wherein the CCCL comprises one or more spatial adjacent and/or non-adjacent candidates, one or more temporal candidates, one or more history candidates, one or more default candidates, or a combination thereof.
An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:

receive input data associated with a current picture, wherein the input data comprise pixel data to be encoded at an encoder side or data associated with the current picture to be decoded at a decoder side;

determine an inter estimation region from the current picture, wherein the inter estimation region is partitioned into one or more blocks;

derive a target inter prediction mode according to decoder-side derived inter-prediction mode derivation or deriving interCCP merge mode for each of said one or more blocks by using corresponding reconstruction samples in a vertical and/or horizontal direction, neighbouring information, or both adjacent to the inter estimation region, or by using only available reconstruction samples or the neighbouring information adjacent to the inter estimation region, and adjacent to said each of said one or more blocks, wherein any reconstructed samples in the inter estimation region are excluded from said deriving the target inter prediction mode or said deriving the interCCP merge mode; and

encode or decode said each of said one or more blocks according to the target inter prediction mode or the interCCP merge mode.