[go: up one dir, main page]

WO2025227030A1 - Method, apparatus, and medium for video processing - Google Patents

Method, apparatus, and medium for video processing

Info

Publication number
WO2025227030A1
WO2025227030A1 PCT/US2025/026362 US2025026362W WO2025227030A1 WO 2025227030 A1 WO2025227030 A1 WO 2025227030A1 US 2025026362 W US2025026362 W US 2025026362W WO 2025227030 A1 WO2025227030 A1 WO 2025227030A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
motion information
list
coding
current video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2025/026362
Other languages
French (fr)
Inventor
Weijia Zhu
Yuwen He
Li Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ByteDance Inc
Original Assignee
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ByteDance Inc filed Critical ByteDance Inc
Publication of WO2025227030A1 publication Critical patent/WO2025227030A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/53Multi-resolution motion estimation; Hierarchical motion estimation

Definitions

  • Embodiments of the present disclosure relate generally to video processing techniques, and more particularly, to a non-local motion candidate (NLMC) coding mode.
  • NLMC non-local motion candidate
  • Video compression technologies such as motion picture expert group (MPEG)-2, MPEG-4, international telecommunication union telecommunication standardization sector (ITU-T) H.263, ITU-T H.264/MPEG-4 Part 10 advanced video coding (AVC), ITU-T H.265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding.
  • MPEG motion picture expert group
  • MPEG-4 international telecommunication union telecommunication standardization sector
  • AVC advanced video coding
  • HEVC high efficiency video coding
  • VVC versatile video coding
  • coding efficiency of video coding techniques is generally expected to be further improved.
  • Embodiments of the present disclosure provide a solution for video processing.
  • a method for video processing comprises: determining, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information for the current video block, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of nonlocal motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and performing the conversion based on the at least one prediction block.
  • the method in accordance with the first aspect of the present disclosure enables coding of video blocks with the NLMC coding mode, so that the coding performance of the video can be improved.
  • an apparatus for video processing comprises a processor and a non-transitory memory with instructions thereon.
  • the instructions upon execution by the processor cause the processor to perform a method in accordance with the first aspect of the present disclosure.
  • a non-transitory computer-readable storage medium stores instructions that cause a processor to perform a method in accordance with the first aspect of the present disclosure.
  • the non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing.
  • the method comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and generating the bitstream based on the at least one prediction block.
  • NLMC non-local motion candidate
  • a method for storing a bitstream of a video comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non- adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; generating the bitstream based on the at least one prediction block; and storing the bitstream in a non- transitory computer-readable recording medium.
  • NLMC non-local motion candidate
  • FIG. 1 illustrates a block diagram of an example video coding system in accordance with some embodiments of the present disclosure
  • FIG. 2 illustrates a block diagram of an example video encoder in accordance with some embodiments of the present disclosure
  • FIG. 3 illustrates a block diagram of an example video decoder in accordance with some embodiments of the present disclosure
  • Fig. 4 illustrates a derivation process for merge candidates list construction
  • Fig. 5 illustrates positions of spatial merge candidates
  • Fig. 6 illustrates candidate pairs considered for redundancy check of spatial merge candidates
  • Fig. 7A and Fig. 7B illustrate positions for the second PU of N*2N and 2N*N partitions, respectively;
  • Fig. 8 illustrates motion vector scaling for temporal merge candidate
  • Fig. 9 illustrates candidate positions for temporal merge candidate
  • Fig. 10 illustrates an example of combined bi-predictive merge candidate
  • Fig. 11 illustrates a derivation process for motion vector prediction candidates
  • Fig. 12 illustrates motion vector scaling for spatial motion vector candidate
  • FIG. 13 A and Fig. 13B illustrate two simplified affine motion models, respectively;
  • Fig. 14 illustrates an example of affine MVF per sub-block
  • Fig. 15 illustrates a coding process of NLMC according to some embodiments of the present disclosure
  • Fig. 16 illustrates an example of template of a block according to some embodiments of the present disclosure
  • FIG. 17 illustrates a flowchart of a method for video processing in accordance with some embodiments of the present disclosure.
  • Fig. 18 illustrates a block diagram of a computing device in which various embodiments of the present disclosure can be implemented.
  • references in the present disclosure to “one embodiment,” “an embodiment,” “an example embodiment,” and the like indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an example embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • first and second etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the listed terms.
  • Fig. 1 is a block diagram that illustrates an example video coding system 100 that may utilize the techniques of this disclosure.
  • the video coding system 100 may include a source device 110 and a destination device 120.
  • the source device 110 can be also referred to as a video encoding device, and the destination device 120 can be also referred to as a video decoding device.
  • the source device 110 can be configured to generate encoded video data and the destination device 120 can be configured to decode the encoded video data generated by the source device 110.
  • the source device 110 may include a video source 112, a video encoder 114, and an input/output (I/O) interface 116.
  • I/O input/output
  • the video source 112 may include a source such as a video capture device.
  • a source such as a video capture device.
  • the video capture device include, but are not limited to, an interface to receive video data from a video content provider, a computer graphics system for generating video data, and/or a combination thereof.
  • the video data may comprise one or more pictures.
  • the video encoder 114 encodes the video data from the video source 112 to generate a bitstream.
  • the bitstream may include a sequence of bits that form a coded representation of the video data.
  • the bitstream may include coded pictures and associated data.
  • the coded picture is a coded representation of a picture.
  • the associated data may include sequence parameter sets, picture parameter sets, and other syntax structures.
  • the I/O interface 116 may include a modulator/demodulator and/or a transmitter.
  • the encoded video data may be transmitted directly to destination device 120 via the I/O interface 116 through the network 130A.
  • the encoded video data may also be stored onto a storage medium/server 130B for access by destination device 120.
  • the destination device 120 may include an I/O interface 126, a video decoder 124, and a display device 122.
  • the I/O interface 126 may include a receiver and/or a modem.
  • the I/O interface 126 may acquire encoded video data from the source device 110 or the storage medium/server 130B.
  • the video decoder 124 may decode the encoded video data.
  • the display device 122 may display the decoded video data to a user.
  • the display device 122 may be integrated with the destination device 120, or may be external to the destination device 120 which is configured to interface with an external display device.
  • the video encoder 114 and the video decoder 124 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard, Versatile Video Coding (VVC) standard and other current and/or further standards.
  • HEVC High Efficiency Video Coding
  • VVC Versatile Video Coding
  • Fig. 2 is a block diagram illustrating an example of a video encoder 200, which may be an example of the video encoder 114 in the system 100 illustrated in Fig. 1, in accordance with some embodiments of the present disclosure.
  • the video encoder 200 may be configured to implement any or all of the techniques of this disclosure.
  • the video encoder 200 includes a plurality of functional components.
  • the techniques described in this disclosure may be shared among the various components of the video encoder 200.
  • a processor may be configured to perform any or all of the techniques described in this disclosure.
  • the video encoder 200 may include a partition unit 201, a prediction unit 202 which may include a mode select unit 203, a motion estimation unit 204, a motion compensation unit 205 and an intra-prediction unit 206, a residual generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 210, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy encoding unit 214.
  • a partition unit 201 may include a mode select unit 203, a motion estimation unit 204, a motion compensation unit 205 and an intra-prediction unit 206, a residual generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 210, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy encoding unit 214.
  • the video encoder 200 may include more, fewer, or different functional components.
  • the prediction unit 202 may include an intra block copy (IBC) unit.
  • the IBC unit may perform prediction in an IBC mode in which at least one reference picture is a picture where the current video block is located.
  • the partition unit 201 may partition a picture into one or more video blocks.
  • the video encoder 200 and the video decoder 300 may support various video block sizes.
  • the mode select unit 203 may select one of the coding modes, intra or inter, e.g., based on error results, and provide the resulting intra-coded or inter-coded block to a residual generation unit 207 to generate residual block data and to a reconstruction unit 212 to reconstruct the encoded block for use as a reference picture.
  • the mode select unit 203 may select a combined inter and intra prediction (CIIP) mode in which the prediction is based on an inter prediction signal and an intra prediction signal.
  • CIIP inter and intra prediction
  • the mode select unit 203 may also select a resolution for a motion vector (e.g., a sub-pixel or integer pixel precision) for the block in the case of inter-prediction.
  • the motion estimation unit To perform inter prediction on a current video block, the motion estimation unit
  • the motion compensation unit 204 may generate motion information for the current video block by comparing one or more reference frames from buffer 213 to the current video block.
  • the 205 may determine a predicted video block for the current video block based on the motion information and decoded samples of pictures from the buffer 213 other than the picture associated with the current video block.
  • the motion estimation unit 204 and the motion compensation unit 205 may perform different operations for a current video block, for example, depending on whether the current video block is in an I-slice, a P-slice, or a B-slice.
  • an “I-slice” may refer to a portion of a picture composed of macroblocks, all of which are based upon macroblocks within the same picture.
  • P-slices and B-slices may refer to portions of a picture composed of macroblocks that are not dependent on macroblocks in the same picture.
  • the motion estimation unit 204 may perform uni-directional prediction for the current video block, and the motion estimation unit 204 may search reference pictures of list 0 or list 1 for a reference video block for the current video block. The motion estimation unit 204 may then generate a reference index that indicates the reference picture in list 0 or list 1 that contains the reference video block and a motion vector that indicates a spatial displacement between the current video block and the reference video block. The motion estimation unit 204 may output the reference index, a prediction direction indicator, and the motion vector as the motion information of the current video block. The motion compensation unit 205 may generate the predicted video block of the current video block based on the reference video block indicated by the motion information of the current video block.
  • the motion estimation unit 204 may perform bidirectional prediction for the current video block.
  • the motion estimation unit 204 may search the reference pictures in list 0 for a reference video block for the current video block and may also search the reference pictures in list 1 for another reference video block for the current video block.
  • the motion estimation unit 204 may then generate reference indexes that indicate the reference pictures in list 0 and list 1 containing the reference video blocks and motion vectors that indicate spatial displacements between the reference video blocks and the current video block.
  • the motion estimation unit 204 may output the reference indexes and the motion vectors of the current video block as the motion information of the current video block.
  • the motion compensation unit 205 may generate the predicted video block of the current video block based on the reference video blocks indicated by the motion information of the current video block.
  • the motion estimation unit 204 may output a full set of motion information for decoding processing of a decoder. Alternatively, in some embodiments, the motion estimation unit 204 may signal the motion information of the current video block with reference to the motion information of another video block. For example, the motion estimation unit 204 may determine that the motion information of the current video block is sufficiently similar to the motion information of a neighboring video block.
  • the motion estimation unit 204 may indicate, in a syntax structure associated with the current video block, a value that indicates to the video decoder 300 that the current video block has the same motion information as the another video block.
  • the motion estimation unit 204 may identify, in a syntax structure associated with the current video block, another video block and a motion vector difference (MVD).
  • the motion vector difference indicates a difference between the motion vector of the current video block and the motion vector of the indicated video block.
  • the video decoder 300 may use the motion vector of the indicated video block and the motion vector difference to determine the motion vector of the current video block.
  • video encoder 200 may predictively signal the motion vector.
  • Two examples of predictive signaling techniques that may be implemented by video encoder 200 include advanced motion vector prediction (AMVP) and merge mode signaling.
  • AMVP advanced motion vector prediction
  • merge mode signaling merge mode signaling
  • the intra prediction unit 206 may perform intra prediction on the current video block.
  • the intra prediction unit 206 may generate prediction data for the current video block based on decoded samples of other video blocks in the same picture.
  • the prediction data for the current video block may include a predicted video block and various syntax elements.
  • the residual generation unit 207 may generate residual data for the current video block by subtracting (e.g., indicated by the minus sign) the predicted video block (s) of the current video block from the current video block.
  • the residual data of the current video block may include residual video blocks that correspond to different sample components of the samples in the current video block.
  • the residual generation unit 207 may not perform the subtracting operation.
  • the transform unit 208 may generate one or more transform coefficient video blocks for the current video block by applying one or more transforms to a residual video block associated with the current video block.
  • the quantization unit 209 may quantize the transform coefficient video block associated with the current video block based on one or more quantization parameter (QP) values associated with the current video block.
  • QP quantization parameter
  • the inverse quantization unit 210 and the inverse transform unit 211 may apply inverse quantization and inverse transforms to the transform coefficient video block, respectively, to reconstruct a residual video block from the transform coefficient video block.
  • the reconstruction unit 212 may add the reconstructed residual video block to corresponding samples from one or more predicted video blocks generated by the prediction unit 202 to produce a reconstructed video block associated with the current video block for storage in the buffer 213.
  • loop filtering operation may be performed to reduce video blocking artifacts in the video block.
  • the entropy encoding unit 214 may receive data from other functional components of the video encoder 200. When the entropy encoding unit 214 receives the data, the entropy encoding unit 214 may perform one or more entropy encoding operations to generate entropy encoded data and output a bitstream that includes the entropy encoded data.
  • Fig. 3 is a block diagram illustrating an example of a video decoder 300, which may be an example of the video decoder 124 in the system 100 illustrated in Fig. 1, in accordance with some embodiments of the present disclosure.
  • the video decoder 300 may be configured to perform any or all of the techniques of this disclosure.
  • the video decoder 300 includes a plurality of functional components.
  • the techniques described in this disclosure may be shared among the various components of the video decoder 300.
  • a processor may be configured to perform any or all of the techniques described in this disclosure.
  • the video decoder 300 includes an entropy decoding unit 301, a motion compensation unit 302, an intra prediction unit 303, an inverse quantization unit 304, an inverse transform unit 305, a reconstruction unit 306 and a buffer 307.
  • the video decoder 300 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 200.
  • the entropy decoding unit 301 may retrieve an encoded bitstream.
  • the encoded bitstream may include entropy coded video data (e.g., encoded blocks of video data).
  • the entropy decoding unit 301 may decode the entropy coded video data, and from the entropy decoded video data, the motion compensation unit 302 may determine motion information including motion vectors, motion vector precision, reference picture list indexes, and other motion information.
  • the motion compensation unit 302 may, for example, determine such information by performing the AMVP and merge mode.
  • AMVP is used, including derivation of several most probable candidates based on data from adjacent PBs and the reference picture.
  • Motion information typically includes the horizontal and vertical motion vector displacement values, one or two reference picture indices, and, in the case of prediction regions in B slices, an identification of which reference picture list is associated with each index.
  • a “merge mode” may refer to deriving the motion information from spatially or temporally neighboring blocks.
  • the motion compensation unit 302 may produce motion compensated blocks, possibly performing interpolation based on interpolation filters. Identifiers for interpolation filters to be used with sub-pixel precision may be included in the syntax elements.
  • the motion compensation unit 302 may use the interpolation filters as used by the video encoder 200 during encoding of the video block to calculate interpolated values for sub-integer pixels of a reference block.
  • the motion compensation unit 302 may determine the interpolation filters used by the video encoder 200 according to the received syntax information and use the interpolation filters to produce predictive blocks.
  • the motion compensation unit 302 may use at least part of the syntax information to determine sizes of blocks used to encode frame(s) and/or slice(s) of the encoded video sequence, partition information that describes how each macroblock of a picture of the encoded video sequence is partitioned, modes indicating how each partition is encoded, one or more reference frames (and reference frame lists) for each inter-encoded block, and other information to decode the encoded video sequence.
  • a “slice” may refer to a data structure that can be decoded independently from other slices of the same picture, in terms of entropy coding, signal prediction, and residual signal reconstruction.
  • a slice can either be an entire picture or a region of a picture.
  • the intra prediction unit 303 may use intra prediction modes for example received in the bitstream to form a prediction block from spatially adjacent blocks.
  • the inverse quantization unit 304 inverse quantizes, i.e., de-quantizes, the quantized video block coefficients provided in the bitstream and decoded by entropy decoding unit 301.
  • the inverse transform unit 305 applies an inverse transform.
  • the reconstruction unit 306 may obtain the decoded blocks, e.g., by summing the residual blocks with the corresponding prediction blocks generated by the motion compensation unit 302 or intra-prediction unit 303. If desired, a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts.
  • the decoded video blocks are then stored in the buffer 307, which provides reference blocks for subsequent motion compensation/intra prediction and also produces decoded video for presentation on a display device.
  • This disclosure is related to video coding technologies. Specifically, it is related to inter coding module in an encoder. It may be applied to the encoders compatible with existing video coding standards like H.264, HEVC, and VVC. It may be also used as a technology for future video coding standards or video codecs, and could be extended to other fields involving motion search algorithms, e.g. computer vision, pattern recognition, etc.
  • Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards.
  • the ITU-T produced H.261 and H.263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly produced the H.262/MPEG-2 Video and H.264/MPEG-4 Advanced Video Coding (AVC) and H.265/HEVC standards.
  • AVC H.264/MPEG-4 Advanced Video Coding
  • H.265/HEVC H.265/HEVC
  • the video coding standards are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized.
  • Joint Video Exploration Team JVET was founded by VCEG and MPEG jointly in 2015.
  • the Joint Video Expert Team (JVET) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) was created to work on the VVC standard targeting at 50% bitrate reduction compared to HEVC.
  • Each inter-predicted PU has motion parameters for one or two reference picture lists.
  • Motion parameters include a motion vector and a reference picture index. Usage of one of the two reference picture lists may also be signalled using inter _pred ide. Motion vectors may be explicitly coded as deltas relative to predictors.
  • a merge mode is specified whereby the motion parameters for the current PU are obtained from neighbouring PUs, including spatial and temporal candidates.
  • the merge mode can be applied to any inter-predicted PU, not only for skip mode.
  • the alternative to merge mode is the explicit transmission of motion parameters, where motion vector (to be more precise, motion vector differences (MVD) compared to a motion vector predictor), corresponding reference picture index for each reference picture list and reference picture list usage are signalled explicitly per each PU.
  • Such a mode is named Advanced motion vector prediction (AMVP) in this disclosure.
  • the PU is produced from one block of samples. This is referred to as ‘uni-prediction’. Uniprediction is available both for P-slices and B-slices.
  • the PU is produced from two blocks of samples. This is referred to as ‘bi-prediction’. Bi-prediction is available for B-slices only.
  • inter prediction is used to denote prediction derived from data elements (e.g., sample values or motion vectors) of reference pictures other than the current decoded picture.
  • data elements e.g., sample values or motion vectors
  • a picture can be predicted from multiple reference pictures.
  • the reference pictures that are used for inter prediction are organized in one or more reference picture lists.
  • the reference index identifies which of the reference pictures in the list should be used for creating the prediction signal.
  • a single reference picture list, List 0 is used for a P slice and two reference picture lists, List 0 and List 1 are used for B slices. It should be noted reference pictures included in List 0/1 could be from past and future pictures in terms of capturing/display order.
  • Step 1 Initial candidates derivation
  • Step 1.1 Spatial candidates derivation
  • Step 1.2 Redundancy check for spatial candidates
  • Step 1.3 Temporal candidates derivation.
  • Step 2 Additional candidates insertion
  • Step 2.1 Creation of bi-predictive candidates
  • Step 2.2 Insertion of zero motion candidates.
  • a maximum of four merge candidates are selected among candidates that are located in five different positions.
  • a maximum of one merge candidate is selected among two candidates. Since constant number of candidates for each PU is assumed at decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of merge candidate (MaxNumMergeCand) which is signalled in slice header. Since the number of candidates is constant, index of best merge candidate is encoded using truncated unary binarization (TU). If the size of CU is equal to 8, all the PUs of the current CU share a single merge candidate list, which is identical to the merge candidate list of the 2N*2N prediction unit.
  • TU truncated unary binarization
  • a candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information.
  • Another source of duplicate motion information is the “second PIT" associated with partitions different from 2Nx2N.
  • Fig. 7A depicts the second PU for the case of N*2N
  • Fig. 7B depicts the second PU for the case of 2N*N.
  • candidate at position Ai is not considered for list construction.
  • by adding this candidate will lead to two prediction units having the same motion information, which is redundant to just have one PU in a coding unit.
  • position Bi is not considered when the current PU is partitioned as 2N*N.
  • a scaled motion vector is derived based on co-located PU belonging to the picture which has the smallest POC difference with current picture within the given reference picture list.
  • the reference picture list to be used for derivation of the colocated PU is explicitly signalled in the slice header.
  • the scaled motion vector for temporal merge candidate is obtained as illustrated by the dotted line in Fig.
  • tb is defined to be the POC difference between the reference picture of the current picture and the current picture
  • td is defined to be the POC difference between the reference picture of the colocated picture and the co-located picture.
  • the reference picture index of temporal merge candidate is set equal to zero.
  • the position for the temporal candidate is selected between candidates Co and Ci, as depicted in Fig. 9. If PU at position Co is not available, is intra coded, or is outside of the current coding tree unit (CTU aka. LCU, largest coding unit) row, position Ci is used. Otherwise, position Co is used in the derivation of the temporal merge candidate.
  • CTU current coding tree unit
  • LCU largest coding unit
  • merge candidates Besides spatial and temporal merge candidates, there are two additional types of merge candidates: combined bi-predictive merge candidate and zero merge candidate.
  • Combined bi-predictive merge candidates are generated by utilizing spatial and temporal merge candidates.
  • Combined bi-predictive merge candidate is used for B-Slice only.
  • the combined bi-predictive candidates are generated by combining the first reference picture list motion parameters of an initial candidate with the second reference picture list motion parameters of another. If these two tuples provide different motion hypotheses, they will form a new bi-predictive candidate.
  • Zero motion candidates are inserted to fill the remaining entries in the merge candidates list and therefore hit the MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index which starts from zero and increases every time a new zero motion candidate is added to the list. Finally, no redundancy check is performed on these candidates.
  • AMVP exploits spatiotemporal correlation of motion vector with neighbouring PUs, which is used for explicit transmission of motion parameters.
  • a motion vector candidate list is constructed by firstly checking availability of left, above temporally neighbouring PU positions, removing redundant candidates and adding zero vector to make the candidate list to be constant length. Then, the encoder can select the best predictor from the candidate list and transmit the corresponding index indicating the chosen candidate. Similarly with merge index signalling, the index of the best motion vector candidate is encoded using a truncated unary. The maximum value to be encoded in this case is 2 (see Fig. 11). In the following sections, details about derivation process of motion vector prediction candidate are provided.
  • Fig. 11 summarizes derivation process for motion vector prediction candidate.
  • motion vector prediction two types of motion vector candidates are considered: spatial motion vector candidate and temporal motion vector candidate.
  • spatial motion vector candidate derivation two motion vector candidates are eventually derived based on motion vectors of each PU located in five different positions as depicted in Fig. 5.
  • one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatiotemporal candidates is made, duplicated motion vector candidates in the list are removed. If the number of potential candidates is larger than two, motion vector candidates whose reference picture index within the associated reference picture list is larger than 1 are removed from the list. If the number of spatiotemporal motion vector candidates is smaller than two, additional zero motion vector candidates is added to the list.
  • the no-spatial-scaling cases are checked first followed by the spatial scaling. Spatial scaling is considered when the POC is different between the reference picture of the neighbouring PU and that of the current PU regardless of reference picture list. If all PUs of left candidates are not available or are intra coded, scaling for the above motion vector is allowed to help parallel derivation of left and above MV candidates. Otherwise, spatial scaling is not allowed for the above motion vector.
  • the motion vector of the neighbouring PU is scaled in a similar manner as for temporal scaling, as depicted as Fig. 12.
  • the main difference is that the reference picture list and index of current PU is given as input; the actual scaling process is the same as that of temporal scaling.
  • AMVR adaptive motion vector difference resolution
  • affine prediction mode affine prediction mode
  • GPM geometric partition mode
  • ATMVP Advanced TMVP
  • SbTMVP aka SbTMVP
  • BDOF Bidirectional optical flow
  • MVDs motion vector differences
  • AMVR locally adaptive motion vector resolution
  • MVD can be coded in units of quarter luma samples, integer luma samples or four luma samples (i.e., ’A-pel, 1-pel, 4-pel).
  • the MVD resolution is controlled at the coding unit (CU) level, and MVD resolution flags are conditionally signalled for each CU that has at least one non-zero MVD component.
  • a flag at first is signalled to indicate whether quarter luma sample MV precision is used in the CU.
  • the first flag (equal to 1) indicates that quarter luma sample MV precision is not used, another flag is signalled to indicate whether integer luma sample MV precision or four luma sample MV precision is used.
  • the quarter luma sample MV resolution is used for the CU.
  • the MVPs in the AMVP candidate list for the CU are rounded to the corresponding precision.
  • HEVC High Efficiency Video Coding
  • MCP motion compensation prediction
  • a simplified affine transform motion compensation prediction is applied with 4-parameter affine model and 6- parameter affine model.
  • CPMVs control point motion vectors
  • 6-parameter affine model see Fig. 13B.
  • the motion vector field (MVF) of a block is described by the following equations with the 4-parameter affine model (wherein the 4-parameter are defined as the variables a, b, e and f) in equation (1) and 6-parameter affine model (wherein the 4-parameter are defined as the variables a, b, c, d, e and f) in equation (2) respectively: where (mv h o, mv h o) is motion vector of the top-left corner control point, and (mv h i, mv h i) is motion vector of the top-right corner control point and (mv h 2, mv h 2) is motion vector of the bottom-left corner control point, all of the three motion vectors are called control point motion vectors (CPMV), (x, y) represents the coordinate of a representative point relative to the top-left sample within current block and (mv h (x,y), mv v (x,y)) is the motion vector derived for
  • the CP motion vectors may be signaled (like in the affine AMVP mode) or derived on-the-fly (like in the affine merge mode), w and h are the width and height of the current block.
  • the division is implemented by right-shift with a rounding operation.
  • the representative point is defined to be the center position of a sub-block, e.g., when the coordinate of the left-top corner of a sub-block relative to the top-left sample within current block is (xs, ys), the coordinate of the representative point is defined to be (xs+2, ys+2).
  • the representative point is utilized to derive the motion vector for the whole sub-block.
  • sub-block based affine transform prediction is applied.
  • MxN both M and N are set to 4 in current VVC
  • the motion vector of the center sample of each sub-block is calculated according to Equation (1) and (2), and rounded to 1/16 fraction accuracy.
  • the motion compensation interpolation filters for 1/16-pel are applied to generate the prediction of each sub-block with derived motion vector.
  • the interpolation filters for 1/16-pel are introduced by the affine mode.
  • UMVE ultimate motion vector expression
  • MMVD ultimate motion vector expression
  • UMVE re-uses merge candidate as same as those included in the regular merge candidate list in VVC.
  • a base candidate can be selected, and is further expanded by the proposed motion vector expression method.
  • UMVE provides a new motion vector difference (MVD) representation method, in which a starting point, a motion magnitude and a motion direction are used to represent an MVD.
  • MVD motion vector difference
  • This technique uses a merge candidate list as it is. But only candidates which are default merge type (MRG TYPE DEFAULT N) are considered for UMVE’s expansion.
  • bi-prediction operation for the prediction of one block region, two prediction blocks, formed using a motion vector (MV) of listO and a MV of listl, respectively, are combined to form a single prediction signal.
  • MV motion vector
  • DMVR decoder-side motion vector refinement
  • MVD mirroring between list 0 and list 1 is performed to refine the MVs, i.e., to find the best MVD among several MVD candidates.
  • CUP Combined intra and inter prediction
  • multihypothesis prediction When the multi-hypothesis prediction is applied to improve intra mode, multihypothesis prediction combines one intra prediction and one merge indexed prediction.
  • a merge CU In a merge CU, one flag is signaled for merge mode to select an intra mode from an intra candidate list when the flag is true.
  • the intra candidate list is derived from 4 intra prediction modes including DC, planar, horizontal, and vertical modes, and the size of the intra candidate list can be 3 or 4 depending on the block shape.
  • horizontal mode is exclusive of the intra mode list and when the CU height is larger than the double of CU width, vertical mode is removed from the intra mode list.
  • One intra prediction mode selected by the intra mode index and one merge indexed prediction selected by the merge index are combined using weighted average.
  • DM is always applied without extra signaling.
  • the weights for combining predictions are described as follows. When DC or planar mode is selected, or the CB width or height is smaller than 4, equal weights are applied. For those CBs with CB width and height larger than or equal to 4, when horizontal/vertical mode is selected, one CB is first vertically/horizontally split into four equal-area regions.
  • the history-based MVP (HMVP) merge candidates are added to merge list after the spatial MVP and TMVP.
  • HMVP history-based MVP
  • the motion information of a previously coded block is stored in a table and used as MVP for the current CU.
  • the table with multiple HMVP candidates is maintained during the encoding/decoding process.
  • the table is reset (emptied) when a new CTU row is encountered. Whenever there is a non-subblock inter-coded CU, the associated motion information is added to the last entry of the table as a new HMVP candidate.
  • the HMVP table size S is set to be 6, which indicates up to 6 History-based MVP (HMVP) candidates may be added to the table.
  • HMVP History-based MVP
  • FIFO constrained first-in-first-out
  • HMVP candidates could be used in the merge candidate list construction process.
  • the latest several HMVP candidates in the table are checked in order and inserted to the candidate list after the TMVP candidate. Redundancy check is applied on the HMVP candidates to the spatial or temporal merge candidate.
  • N indicates number of existing candidates in the merge list and M indicates number of available HMVP candidates in the table.
  • Non-local motions are observed in videos usually.
  • the motion information of a block is only able to be merged with those of its adjacent blocks.
  • the non-local candidates could not be handled yet.
  • AMVP mode Although it can capture all motions in a video in theory, it will cost much encoding complexity due to an extensive motion search. A method is desirable if it could capture non-local motions with a low complexity.
  • NLMC non-local motion candidates coding mode
  • the motion candidate could be a motion candidate with its reference pointing to another picture, or within the current picture (e.g., also known as block vector candidate).
  • block may represent a video unit, e.g., a CU/PU/TU/a region/a group of video units.
  • Non-local motion candidates (NLMC) codins modes as a codins tool
  • a non-local motion information list is employed in the NLMC.
  • the list may be constructed according to the coded information of current block (e.g., block size, and/or dimension, and/or coded mode).
  • the list used in NLMC may contain one or multiple entries.
  • the maximum number of entries in the list may be pre-defined or signaled in the bitstream.
  • an entry of the list may contain one of more following elements, motion vectors, reference picture index and/or reference list index.
  • the motion vectors may be stored in integer precision or fractional precision.
  • a reference prediction block may be derived by an entry in the list. i. In one example, a reference prediction block may be in previous frames, or the current frame indicated by the motion information in the list.
  • the list employed in the NLMC may be generated/updated with neighboring motion information.
  • different video units may have same/different lists.
  • a video unit may be a block/CU/CTU/CTU row/slice/tile.
  • ii. In one example, a video unit may be a group of CUs with same size.
  • iii In one example, the size of list may depend on coded information in the bitstream.
  • the coded information may be the size of a current coding block.
  • the coded information may be the prediction mode, e.g. AMVP or merge mode, of a current coding block. b.
  • the list may include all the motion information of inter-coded blocks in a NxM neighboring region. i. Alternatively, the motion information may be added as motion candidates in an order until the list is full. a. Alternatively, furthermore, pruning may be applied before adding one motion candidate. ii. Alternatively, only certain of the motion information associated with a block within the NxM neighboring region is allowed to be added to the list. a. In one example, the motion information of merge/ skip coded blocks may not be included to the list. b. In one example, the motion information associated with a neighbouring block coded in the same mode (e.g., inter/IBC) as the current block may be added. c.
  • the motion information associated with a neighbouring block coded in the same mode e.g., inter/IBC
  • the motion information associated with a neighbouring block in the same size as the current block may be added.
  • a KxL block may only be able to fetch the motion information from the list associated with KxL blocks.
  • iii In one example, only one of the repeated motion information of inter-coded blocks may be included to the list.
  • the list may decide whether to include a motion information of an inter-coded block based on the motion vector magnitude, reference list and/or reference picture index.
  • the list may only include one motion information of a block for a given reference picture list.
  • the list may only include one motion information of a block for a given reference picture index.
  • the motion information of a block may be excluded if it is close to one of the motion information already in the list.
  • two motion information are close when their reference picture list and/or reference picture index are same.
  • two motion information are considered as close when only their motion vector is close.
  • i In one example, two motion vectors are close when both the vertical and horizontal positional difference is smaller than K. ii.
  • two motion vectors are close when either one of vertical or horizontal positional difference is smaller than K.
  • the list may decide whether to include a motion information of an inter-coded block based on the pixel values of its indicated reference block. a.
  • a motion information of an inter-coded block may be excluded if the reference block it indicated is identical to one of the reference blocks indicated by the motion information already in the NLMC list. b. In one example, a motion information of an inter-coded block may be excluded if the reference block it indicated is similar to one of the reference blocks indicated by the motion information already in the NLMC list. a) In one example, two block may be considered as similar when their difference is small. i. In one example, the difference may be measured by SAD, SATD, SSE or MSE. ii. In one example, the difference may be considered as small when the SAD, SATD, SSE or MSE value is smaller than K. vi.
  • the list for video units with different coded information may be separate.
  • the lists for coding blocks with different coding modes may be separate.
  • the AMVP mode, affine mode, IBC mode may have separate NLMC lists.
  • the AMVP mode, affine mode, IBC mode may share a same NLMC list.
  • the lists for coding blocks with different block sizes may be separate.
  • a NxN block may only be able to fetch the motion information from the list corresponding to the NxN size. vii.
  • a new NLMC list may be generated by combining the list of the current region which the current coding block belongs to and the list of its neighboring regions.
  • the neighboring region may denote spatial neighboring regions, such as top neighboring region, left neighboring region.
  • the neighboring region may denote temporal neighboring regions, such as the collocated region.
  • the list from a closer neighboring region may be combined first.
  • the number of regions to be used for the list generation may be coded in the bitstream. a) Alternatively, in one example, the number of regions to be used for the list generation may be inferred to N. c.
  • the NLMC list may include all the motion information of intercoded blocks in a region in a current frame and/or previous frames.
  • a region may be a video unit.
  • a video unit may be a block/CU/CTU/CTU row/slice/tile.
  • a video unit may be several blocks/CUs/CTUs/CTU rows/slices/tiles.
  • a video unit may be several CUs/blocks with same size. ii.
  • a regions may be set from (Cx - M, Cy - N) to (Cx + L, Cy + K) for different frames, where (Cx, Cy) is the position of a current block, and M, N, L and K are integer numbers.
  • the positions of the region for different frames may be different.
  • the region size may be fixed for different frames, e.g. MxN. a.
  • the region size may be different for different frames.
  • the number of frames to be collected for motion information may be set to K. v.
  • pruning may be applied when a motion information is updated to the NLMC list. a.
  • a motion information may not be added to the NLMC list if the reference block it indicated is identical to a reference block indicated by an existing motion information in the list. b. In one example, a motion information may not be added to the NLMC list if its motion vector is identical to a motion vector indicated by an existing motion information in the list. a) Alternatively, in one example, a motion information may not be added to the NLMC list if its motion vector is close to a motion vector indicated by an existing motion information in the list. i. In one example, two motion vectors are close when both the vertical and horizontal positional difference is smaller than K. ii. Alternatively, in one example, two motion vectors are close when either one of vertical or horizontal positional difference is smaller than K.
  • a motion information may not be added to the NLMC list if its reference index is identical to a reference index indicated by an existing motion information in the list.
  • the prediction block of a current block may be generated depending on an entry or more entries in the NLMC list. a. In one example, the prediction block of a current block may be generated by weighted prediction combining an entry or more entries in the list. i. In one example, the index of the entries to generate the prediction block may be signalled to the decoder. a. Alternatively, in one example, the index of this entry may be inferred to N. b. In one example, how to select the best N entries may be determined by minimizing a certain cost. i.
  • N may be equal to 1 or greater than 1.
  • the best N entries may denote the N entries with minimal costs.
  • the cost may denote the rate distortion cost between an entry and the current block. iv.
  • the cost may denote the distortion between an entry and the current block, such as SAD, SATD, SSE or MSE.
  • the process of NLMC may be illustrated as in Fig. 15. In Fig. 15, the list has N entries, and Bi is the i th entry in the list. The current block is denoted by C.
  • Entries in a list may be firstly sorted before being used to derive prediction/reconstruction of a current block.
  • d It is proposed to sort the entries in the list based on the distortion between the template of each entry and the template of the current block.
  • the template may denote a MxN region excluding the region of the current block, where M > SI and N > S2.
  • the distortion in the above example may denote the distortion between two templates, such as SAD, SATD, SSE or MSE.
  • the entry may not only include a block in the reconstructed region but also include the template of the block. h.
  • the entries in the list may be sorted in an ascending/descending order based on the template distortion cost. i. In one example, after sorting the dictionary in a descending order based on the template distortion cost, only the first K entries could be applied when coding the current block with NLMC. i. In one example, the indexes of the K entries may not need to be signalled to the decoder. ii. In one example, the value of K may be a default value. a. Alternatively, in one example, the value of K may be signalled to the decoder.
  • the NLMC may be treated as a new prediction mode in additional to existing ones (e.g., intra/inter/IBC prediction modes).
  • indication of the usage of the NLMC may be signalled/parsed as a separate prediction mode (e.g. MODE NLMC). i. In one example, indication of an entry in the list is signalled/parsed by additional syntaxes.
  • b. In one example, fixed length coding/Exp-Golomb/truncated unary/truncated binary may be used to binarize the entry index.
  • the binarization of the signalled index in NLMC may depend on the number of possible entries used in NLMC. i. In one example, the binarization may be fixed length with M bits. a.
  • M may be set equal to floor(log2(N)).
  • log2(N) is a function to get the logarithm of N to the base 2.
  • floor(N) is a function to get the nearest integer upbound of N. ii.
  • the binarization may be truncated binary/unary with cMax equal to N. iii.
  • N may be the number of all entries in the dictionary. a. Alternatively, N may be the number of available entries in the dictionary.
  • Non-local motion candidates codins modes may be used as a motion estimation method for encoders
  • the NLMC may be treated as a motion estimation method for encoders.
  • the entries in the NLMC may be used as additional initial search points before the integer motion search process starts. i. In one example, only the integer parts of a motion vector in an entry may be used. a. Alternatively, both the integer parts and fractional parts of a motion vector in an entry may be used.
  • the entries in the NLMC may be used as additional candidates, as a complement to the traditional motion estimation methods.
  • one entry in the list may be used to generate the prediction block for a current block. i. In one example, the motion information contained in the entry may be signalled by the AMVP manner. d.
  • the NLMC list may be updated with those neighboring motion information, which may not be the best mode selected by neighboring coding blocks.
  • the NLMC list size may be dependent on the coding block size.
  • the NLMC list may exclude the motion information of inter-coded blocks which have a large inter cost. i. In one example, if the inter cost of a block is greater than K, the corresponding motion information may be excluded to the NLMC list.
  • the above methods may be applied to AMVP. Affine, IBC and/or other inter-prediction related coding tools.
  • the M, N, L, and/or K in the above examples may be integer numbers. a. In one example, the M, N, and/or K may be any integer values. a. In one example, both of M and N may be equal to 4. a. In one example, N may be a pre-defined constant value for all QPs. b. In one example, N may be signalled to the decoder. c. In one example, N may be based on a. Video contents (e.g. screen contents or natural contents). b.
  • e. Block shape of current block and/or its neighboring blocks.
  • Quantization parameter of the current block g. Indication of the color format (such as 4:2:0, 4:4:4, RGB or YUV).
  • h. Coding tree structure (such as dual tree or single tree).
  • j. Color component e.g. may be only applied on luma component and/or chroma component) .
  • Video contents e.g. screen contents or natural contents. i. In one example, the above methods may be only applied on screen contents.
  • c. A message signaled in the DPS/SPS/VPS/PPS/APS/picture header/slice header/tile group header/Largest coding unit (LCU)/Coding unit (CU)/LCU row/group of LCUs/TU/PU block/Video coding unit.
  • LCU Lowest coding unit
  • CU Coding unit
  • CU Position of CU/PU/TU/block/Video coding unit.
  • e. Block dimension of current block and/or its neighboring blocks.
  • f. Block shape of current block and/or its neighboring blocks.
  • Indication of the color format (such as 4:2:0, 4:4:4, RGB or YUV).
  • Coding tree structure such as dual tree or single tree).
  • j Slice/tile group type and/or picture type.
  • k Color component (e.g. may be only applied on luma component and/or chroma component).
  • l Temporal layer ID.
  • m Profiles/Levels/Tiers of a standard.
  • the above methods may be applied to the motion estimation methods used in encoding, pre-analysis and/or MCTF processes.
  • FIG. 17 illustrates a flowchart of a method 1700 for video processing in accordance with embodiments of the present disclosure.
  • the method 1700 is implemented during a conversion between a video unit or video block of a video and a bitstream of the video.
  • a list of non-local motion information is determined for the current video block.
  • the current video block is coded with a non-local motion candidate (NLMC) coding mode.
  • the list of non-local motion information is associated with at least one nonadj acent block of the current video block.
  • At block 1720 at least one prediction block for the current video block is determined based on the list of non-local motion information.
  • the conversion is performed based on the at least one prediction block.
  • the method 1700 enables coding of video blocks with the NLMC coding mode, so that the coding performance of the video can be improved.
  • the list of non-local motion information is constructed based on coded information of the current video block.
  • the coded information includes at least one of: a block size of the current video block, a dimension of the current video block, or a coded mode of the current video block.
  • the list of non-local motion information used for the NLMC coding mode comprises at least one entry.
  • a maximum number of entries in the list of nonlocal motion information may be predefined, or may be indicated in the bitstream.
  • an entry in the list of the non-local motion information comprises at least one of: at least one motion vector, a reference picture index, or a reference list index.
  • the at least one motion vector is in an integer precision or a fractional precision.
  • a reference prediction block is determined based on an entry in the list.
  • the reference prediction block is in a previously coded frame, or a current frame indicated by motion information in the list of non-local motion information.
  • the method 1700 further includes generating the list of nonlocal motion information based on neighboring motion information of the current video block, and/or updating the list of non-local motion information based on neighboring motion information of the current video block.
  • a first list of non-local motion information of a first video unit is the same as or different from a second list of non-local motion information of a second video unit.
  • the first or second video unit comprises one of: a block, a coding unit (CU), a coding tree unit (CTU), a CTU row, a slice, a tile, or a group of CUs with a same size.
  • CU coding unit
  • CTU coding tree unit
  • a size of the list of non-local motion information is based on coded information in the bitstream.
  • the coded information comprises at least one of: a size of the current video block, or a prediction mode of the current video block.
  • the prediction mode of the current video block comprises at least one of: an advanced motion vector prediction (AMVP) mode, or a merge mode.
  • AMVP advanced motion vector prediction
  • the list of non-local motion information comprises all motion information of inter-coded blocks in a NxM neighboring region.
  • N and M are positive integers.
  • the list of non-local motion information comprises at least one piece of motion information of inter-coded blocks in a NxM neighboring region.
  • N and M are positive integers.
  • the at least one piece of motion information is added as at least one motion candidate in the list in an order until the list is full.
  • a pruning process is applied to at least one motion candidate before adding the at least one motion candidate into the list. That is, same candidates or similar candidates may be pruned.
  • the first motion information in response to first motion information associated with a block within a NxM neighboring region satisfying a condition, is allowed to be added into the list.
  • N and M are positive integers.
  • the condition comprises at least one of the first motion information being associated with a neighboring block coded in the same mode as the current video block, the same mode being an inter mode or an intra block copy mode, or the first motion information being associated with a neighboring block in the same size as the current video block.
  • the first motion information in response to the first motion information being of a merge coded block or a skip coded block, the first motion information is excluded from the list.
  • a KxL block obtains motion information from the list associated with KxL blocks, K and L being positive integers.
  • one piece (for example, only one) of repeated pieces of motion information is included in the list.
  • whether to include second motion information of an intercoded block in the list is based on at least one of a motion vector magnitude of the second motion information, a reference list of the second motion information, or a reference picture index of the second motion information.
  • a single piece of motion information of a block for a given reference picture list is included in the list. In some embodiments, a single piece of motion information of a block for a given reference picture index is included in the list.
  • a first piece of motion information of a block is excluded from the list in response to a difference between the first piece of motion information and a second piece of motion information in the list being less than a threshold.
  • the difference between the first and second pieces of motion information is less than the threshold in response to at least one of a first reference picture list of the first piece of motion information being same with a second reference picture list of the second piece of motion information, or a first reference picture index of the first piece of motion information being same with a second reference picture of the second piece of motion information.
  • the difference between the first and second pieces of motion information is less than the threshold in response to a vector difference between a first motion vector of the first piece of motion information and a second motion vector of the second piece of motion information being less than the threshold.
  • the vector difference of the first and second motion vectors is less than the threshold in response to at least one of a vertical positional difference between the first and second motion vectors being less than the threshold, or a horizontal positional difference between the first and second motion vectors being less than threshold.
  • whether to include third motion information of an intercoded block in the list is based on pixel values of a reference block indicated by the third motion information.
  • the third motion information is excluded from the list in response to at least one of the reference block indicated by the third motion information being the same as a further reference block indicated by motion information in the list, or a difference between the reference block indicated by the third motion information and a further reference block indicated by motion information in the list being less than a threshold.
  • the difference may be a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE), and/or the like.
  • a first list of non-local motion information for a first coding block and a second list of non-local motion information for a second coding block are separate, the first coding block being with first coded information and the second coding block being with second coded information.
  • the first coding block is coded with a first coding mode and the second coding block is coded with a second coding mode.
  • the first coding mode comprises one of an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode
  • the second coding mode comprises another one of the AMVP mode, the affine mode or the IBC mode.
  • AMVP advanced motion vector prediction
  • IBC intra block copy
  • a first block size of the first coding block is different from a second block size of the second coding block.
  • a NxN block is accessible to motion information from the list of non-local motion information corresponding to a NxN size, N being a positive integer.
  • a same list of non-local motion information is constructed for at least one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode.
  • AMVP advanced motion vector prediction
  • IBC intra block copy
  • a respective list of non-local motion information for NLMC is stored, N and M being positive integers.
  • determining the list of non-local motion information for the current video block comprises: determining the list of non-local motion information by combining a first list of non-local motion information of a current region which the current video block belongs to and at least one list of at least one neighboring region of the current region.
  • the at least one neighboring region comprises at least one of: at least one spatial neighboring region, or at least one temporal neighboring region.
  • the at least one spatial neighboring region comprises at least one of: a top neighboring region, or a left neighboring region.
  • the at least one temporal neighboring region comprises a collated region.
  • the combining of the at last one list is based on at least one distance between the at least one neighboring region and the current region.
  • the number of the at least one neighboring region is indicated in the bitstream.
  • the number of the at least one neighboring region is inferred to be a predefined value.
  • the list of non-local motion information comprises motion information of at least one inter-coded block in a region in at least one of: a current frame, or at least one previous frame.
  • the region comprises a video unit, the video unit comprising one of: at least one block, at least one coding unit (CU), at least one coding tree unit (CTU), at least one CTU row, at least one slice, at least one tile, a plurality of CUs with a same size, or a plurality of blocks with a same size.
  • CU coding unit
  • CTU coding tree unit
  • the region comprises positions from (Cx - M, Cy - N) to (Cx + L, Cy + K) for at least one frame.
  • (Cx, Cy) denotes a position of the current video block.
  • M, N, L and K are integers.
  • N comprises a predefined value for all quantization parameters (QPs), or N is indicated in the bitstream, or N is based on at least one of: video content, a block dimension of the current video block, a block dimension of a neighboring block, a block shape of the current video block, a block shape of a neighboring block, a QP of the current video block, an indication of a color format, a dual tree or single tree structure, a slice type, a tile group type, a picture type, a color component, a temporal layer identifier, a position of a coding unit (CU), prediction unit (PU), transform unit (TU), a block or a video coding unit, or a message indicated in a video region, the video region being one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header,
  • QPs quantization parameters
  • a position of a first region for a first frame is different from a position of a second region for a second frame.
  • a first size of a first region for a first frame is the same as or different from a second size of a second region for a second frame.
  • the number of frames to be collected for motion information in the list is set to a predefined value.
  • a pruning process is applied to candidate motion information, the candidate motion information being to be updated to the list of non-local motion information.
  • the candidate motion information is excluded from the list in response to a reference block indicated by the candidate motion information being the same as a further reference block indicated by existing motion information in the list, or the candidate motion information is excluded from the list in response to a difference between a motion vector indicated by the candidate motion information and a further motion vector indicated by an existing motion information in the list being less than a threshold.
  • the difference between the motion vector and the further motion vector is less than the threshold in response to at least one of a vertical positional difference between the motion vector and the further motion vector being less than the threshold, or a horizontal positional difference between the motion vector and the further motion vector being less than the threshold.
  • the candidate motion information is excluded from the list in response to a reference index indicated by the candidate motion information being the same as a further reference index indicated by existing motion information in the list.
  • determining the at least one prediction block for the current video block comprises: determining the at least one prediction block based on at least one entry in the list of non-local motion information.
  • the at least one prediction block is determined by a weighted prediction combining the at least one entry.
  • At least one index of the at least one entry is indicated in the bitstream.
  • at least one index of the at least one entry is inferred to be at least one predefined value.
  • the at least one entry is selected by minimizing a cost between an entry and the current video block.
  • the at least one entry comprises N entries with minimal costs, N being an integer equal to 1 or greater than 1.
  • the cost comprises at least one of: a rate distortion cost between an entry and the current video block, or a distortion between an entry and the current video block.
  • the cost comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
  • SAD sum of absolute differences
  • SATD sum of absolute transformed differences
  • SSE sum of squared errors
  • MSE mean squared error
  • the at least one entry in the list of non-local motion information comprises N entries, N being a positive integer
  • determining the at least one prediction block comprises: for an ith entry in the list, i being larger than or equal to 0 and less than or equal to N-l, checking the ith entry and determining an ith prediction block for the current video block based on a criterion.
  • the at least one entry is sorted before being used to determine the at least one prediction block.
  • the sorting of the at least one entry may be based on: at least one template distortion cost, and/or at least one distortion between at least one template of the at least one entry and a current template of the current video block.
  • a size of the current video block is SlxS2, and the current template comprises a MxN region excluding a region of the current video block.
  • M is larger than SI and N is larger than S2, M, N, SI and S2 being positive integers.
  • a distortion of the at least one distortion comprises a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE) between two templates.
  • SAD sum of absolute differences
  • SATD sum of absolute transformed differences
  • SSE sum of squared errors
  • MSE mean squared error
  • an entry in the list comprises a block in a reconstructed region and a template of the block.
  • the at least one entry is sorted in an ascending or descending order based on the at least one template distortion cost.
  • first K entries are applied for coding the current video block with NLMC coding mode, K being a positive integer.
  • indexed of the first K entries are excluded from the bitstream.
  • a value of K may be indicated in the bitstream, or may be a default value.
  • the NLMC coding mode is applied in addition to at least one further coding mode, the at least one further coding mode comprising at least one of an intra prediction mode, an inter prediction mode, an intra block copy prediction mode.
  • an indication of usage of the NLMC coding mode is indicated or parsed as a separate prediction mode.
  • an indication of an entry in the list is indicated or parsed by a further syntax element, the further syntax element comprises an entry index.
  • the entry index is coded by one of a fixed-length coding, an Exponential Golomb coding, a truncated unary coding, or a truncated binary coding.
  • a binarization of the entry index is based on the number of entries used in the NLMC coding mode.
  • the binarization comprises one of a fixed length with M bits, or a truncated binary with a parameter equal to N, or a truncated unary with a parameter equal to N, N being the number of entries, and M being determined based on N.
  • the parameter may be cMax.
  • M is equal to floor(log2(N)).
  • log2(N) denotes a function to get a logarithm of N to a base 2
  • floor(x) is a function to get a nearest integer upbound of x.
  • N is the number of all entries in the list, or the number of available entries in the list.
  • the NLMC coding mode is used as a motion estimation approach for encoding the current video block into the bitstream.
  • the NLMC may be used as a motion estimation method for encoders.
  • at least one entry in the list of non-local motion information is used as at least one additional initial search point before an integer motion search process.
  • at least one integer part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
  • At least one integer part and at least one fractional part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
  • the at least one entry is used as at least one additional candidate for at least one further motion estimation approach.
  • an entry in the list is used to generate a prediction block for the current video block, and motion information in the entry is indicated by at least one element associated with an advanced motion vector prediction (AMVP). That is, the motion information of the entry may be signalled by the AMVP manner. For example, instead of signaling MV, a corresponding MVD of the entry may be indicated in the bitstream.
  • AMVP advanced motion vector prediction
  • the list of non-local motion information is updated with at least one piece of neighboring motion information, the at least one piece of neighboring motion information being not selected by at least one neighboring coding block.
  • a size of the list of non-local motion information is based on a size of the current video block.
  • a plurality of lists of non-local motion information are determined for a plurality of reference pictures for the NLMC coding mode.
  • the list of non-local motion information excludes motion information of an inter-coded block with an inter cost larger than a threshold.
  • the NLMC coding mode is applied to at least one of: an advanced motion vector prediction (AMVP) coding tool, an affine coding tool, an intra block copy (IBC) coding tool, or a further inter-prediction related coding tool.
  • AMVP advanced motion vector prediction
  • IBC intra block copy
  • whether to and/or how to apply the method associated with the NLMC coding mode is based on at least one of: video content, a message included in one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit, a position of CU, PU, TU, block, or the video coding unit, a block dimension of the current video block, a block dimension of a neighboring block of the current video block, a block shape of the current video block, a block shape of a neighboring block of the current video block, a quantization parameter of the current video block, an indication of a colour format
  • the method is applied to the current video block in response to the current video block belonging to screen content, and/or the method is applied to at least one of a luma component or a chroma component of the current video block.
  • the colour format comprises one of: 4:2:0, 4:4:4, RGB or YUV, and/or wherein the coding tree structure comprises a dural tree structure or a single tree structure.
  • the method associated with the NLMC coding mode is applied to a motion estimation used in at least one of: an encoding process, a pre-analysis process, or a motion compensated temporal filtering (MCTF) process.
  • a motion estimation used in at least one of: an encoding process, a pre-analysis process, or a motion compensated temporal filtering (MCTF) process.
  • MCTF motion compensated temporal filtering
  • the current video block comprises at least one of: a colour component, a sub-picture, a slice, a tile, a coding tree unit (CTU), a CTU row, a groups of CTU, a coding unit (CU), a prediction unit (PU), a transform unit (TU), a coding tree block (CTB), a coding block (CB), a prediction block(PB), a transform block (TB), a block, subblock of a block, sub-region within a block, or a region that contains more than one sample or pixel.
  • CTU coding tree unit
  • PU prediction unit
  • TU transform unit
  • CB coding tree block
  • PB prediction block
  • TB transform block
  • the conversion comprises encoding the current video block into the bitstream.
  • the conversion comprises decoding the current video block from the bitstream.
  • a non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing.
  • a list of non-local motion information is determined for a current video block of the video.
  • the current video block is coded with a non-local motion candidate (NLMC) coding mode.
  • the list of non-local motion information is associated with at least one non-adjacent block of the current video block.
  • At least one prediction block is determined for the current video block based on the list of nonlocal motion information.
  • the bitstream is generated based on the at least one prediction block.
  • a method for storing bitstream of a video is provided.
  • a list of non-local motion information is determined for a current video block of the video.
  • the current video block is coded with a non-local motion candidate (NLMC) coding mode.
  • the list of non-local motion information is associated with at least one non-adjacent block of the current video block.
  • At least one prediction block is determined for the current video block based on the list of nonlocal motion information.
  • the bitstream is generated based on the at least one prediction block.
  • the bitstream is stored in a non-transitory computer-readable recording medium.
  • a method for video processing comprising: determining, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information for the current video block, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and performing the conversion based on the at least one prediction block.
  • NLMC non-local motion candidate
  • Clause 2 The method of clause 1, wherein the list of non-local motion information is constructed based on coded information of the current video block, the coded information comprising at least one of: a block size of the current video block, a dimension of the current video block, or a coded mode of the current video block.
  • Clause 3 The method of clause 1 or 2, wherein the list of non-local motion information used for the NLMC coding mode comprises at least one entry.
  • Clause 7 The method of any of clauses 3-6, wherein a reference prediction block is determined based on an entry in the list.
  • Clause 8 The method of clause 7, wherein the reference prediction block is in a previously coded frame, or a current frame indicated by motion information in the list of non-local motion information.
  • Clause 9 The method of any of clauses 1-8, further comprising at least one of: generating the list of non-local motion information based on neighboring motion information of the current video block, or updating the list of non-local motion information based on neighboring motion information of the current video block.
  • Clause 10 The method of clause 9, wherein a first list of non-local motion information of a first video unit is the same as or different from a second list of non-local motion information of a second video unit.
  • the first or second video unit comprises one of a block, a coding unit (CU), a coding tree unit (CTU), a CTU row, a slice, a tile, or a group of CUs with a same size.
  • CU coding unit
  • CTU coding tree unit
  • Clause 14 The method of any of clauses 1-13, wherein the list of non-local motion information comprises all motion information of inter-coded blocks in a NxM neighboring region, wherein N and M are positive integers.
  • Clause 15 The method of any of clauses 1-13, wherein the list of non-local motion information comprises at least one piece of motion information of inter-coded blocks in a NxM neighboring region, wherein N and M are positive integers, wherein the at least one piece of motion information is added as at least one motion candidate in the list in an order until the list is full.
  • Clause 16 The method of clause 15, wherein a pruning process is applied to at least one motion candidate before adding the at least one motion candidate into the list.
  • Clause 17 The method of any of clauses 1-13, wherein in response to first motion information associated with a block within a NxM neighboring region satisfying a condition, the first motion information is allowed to be added into the list, wherein N and M are positive integers.
  • Clause 18 The method of clause 17, wherein the condition comprises at least one of the first motion information being associated with a neighboring block coded in the same mode as the current video block, the same mode being an inter mode or an intra block copy mode, or the first motion information being associated with a neighboring block in the same size as the current video block.
  • Clause 21 The method of any of clauses 1-20, wherein one piece of repeated pieces of motion information is included in the list.
  • Clause 22 The method of any of clauses 1-21, wherein whether to include second motion information of an inter-coded block in the list is based on at least one of a motion vector magnitude of the second motion information, a reference list of the second motion information, or a reference picture index of the second motion information.
  • Clause 23 The method of any of clauses 1-22, wherein a single piece of motion information of a block for a given reference picture list is included in the list.
  • Clause 24 The method of any of clauses 1-23, wherein a single piece of motion information of a block for a given reference picture index is included in the list.
  • Clause 25 The method of any of clauses 1-24, wherein a first piece of motion information of a block is excluded from the list in response to a difference between the first piece of motion information and a second piece of motion information in the list being less than a threshold.
  • Clause 26 The method of clause 25, wherein the difference between the first and second pieces of motion information is less than the threshold in response to at least one of a first reference picture list of the first piece of motion information being same with a second reference picture list of the second piece of motion information, or a first reference picture index of the first piece of motion information being same with a second reference picture of the second piece of motion information.
  • Clause 27 The method of clause 25, wherein the difference between the first and second pieces of motion information is less than the threshold in response to a vector difference between a first motion vector of the first piece of motion information and a second motion vector of the second piece of motion information being less than the threshold.
  • Clause 28 The method of clause 27, wherein the vector difference of the first and second motion vectors is less than the threshold in response to at least one of a vertical positional difference between the first and second motion vectors being less than the threshold, or a horizontal positional difference between the first and second motion vectors being less than threshold.
  • Clause 29 The method of any of clauses 1-28, wherein whether to include third motion information of an inter-coded block in the list is based on pixel values of a reference block indicated by the third motion information.
  • Clause 30 The method of clause 29, wherein the third motion information is excluded from the list in response to at least one of: the reference block indicated by the third motion information being the same as a further reference block indicated by motion information in the list, or a difference between the reference block indicated by the third motion information and a further reference block indicated by motion information in the list being less than a threshold.
  • Clause 31 The method of clause 30, wherein the difference comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
  • SAD sum of absolute differences
  • SATD sum of absolute transformed differences
  • SSE sum of squared errors
  • MSE mean squared error
  • Clause 32 The method of any of clauses 1-31, wherein a first list of non-local motion information for a first coding block and a second list of non-local motion information for a second coding block are separate, the first coding block being with first coded information and the second coding block being with second coded information.
  • Clause 33 The method of clause 32, wherein the first coding block is coded with a first coding mode and the second coding block is coded with a second coding mode.
  • Clause 34 The method of clause 33, wherein the first coding mode comprises one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode, and the second coding mode comprises another one of the AMVP mode, the affine mode or the IBC mode.
  • AMVP advanced motion vector prediction
  • IBC intra block copy
  • Clause 35 The method of clause 32, wherein a first block size of the first coding block is different from a second block size of the second coding block.
  • Clause 36 The method of clause 35, wherein a NxN block is accessible to motion information from the list of non-local motion information corresponding to a NxN size, N being a positive integer.
  • Clause 38 The method of any of clauses 1-37, wherein for each of a plurality of NxM regions, a respective list of non-local motion information for NLMC is stored, N and M being positive integers.
  • determining the list of non-local motion information for the current video block comprises: determining the list of non-local motion information by combining a first list of non-local motion information of a current region which the current video block belongs to and at least one list of at least one neighboring region of the current region.
  • Clause 40 The method of clause 39, wherein the at least one neighboring region comprises at least one of: at least one spatial neighboring region, or at least one temporal neighboring region.
  • Clause 41 The method of clause 40, wherein the at least one spatial neighboring region comprises at least one of: a top neighboring region, or a left neighboring region.
  • Clause 42 The method of clause 40, wherein the at least one temporal neighboring region comprises a collated region.
  • Clause 43 The method of any of clauses 39-42, wherein the combining of the at last one list is based on at least one distance between the at least one neighboring region and the current region.
  • Clause 44 The method of any of clauses 39-43, wherein the number of the at least one neighboring region is indicated in the bitstream.
  • Clause 45 The method of any of clauses 39-43, wherein the number of the at least one neighboring region is inferred to be a predefined value.
  • Clause 46 The method of any of clauses 1-45, wherein the list of non-local motion information comprises motion information of at least one inter-coded block in a region in at least one of: a current frame, or at least one previous frame.
  • Clause 47 The method of clause 46, wherein the region comprises a video unit, the video unit comprising one of: at least one block, at least one coding unit (CU), at least one coding tree unit (CTU), at least one CTU row, at least one slice, at least one tile, a plurality of CUs with a same size, or a plurality of blocks with a same size.
  • CU coding unit
  • CTU coding tree unit
  • Clause 48 The method of clause 46, wherein the region comprises positions from (Cx - M, Cy - N) to (Cx + L, Cy + K) for at least one frame, wherein (Cx, Cy) denotes a position of the current video block, M, N, L and K are integers.
  • N comprises a predefined value for all quantization parameters (QPs), or N is indicated in the bitstream, or N is based on at least one of: video content, a block dimension of the current video block, a block dimension of a neighboring block, a block shape of the current video block, a block shape of a neighboring block, a QP of the current video block, an indication of a color format, a dual tree or single tree structure, a slice type, a tile group type, a picture type, a color component, a temporal layer identifier, a position of a coding unit (CU), prediction unit (PU), transform unit (TU), a block or a video coding unit, or a message indicated in a video region, the video region being one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets
  • DPS dependency parameter set
  • SPS sequence parameter set
  • VPS video parameter set
  • Clause 50 The method of clause 46, wherein a position of a first region for a first frame is different from a position of a second region for a second frame.
  • Clause 51 The method of any of clauses 46-50, wherein a first size of a first region for a first frame is the same as or different from a second size of a second region for a second frame.
  • Clause 52 The method of any of clauses 46-51, wherein the number of frames to be collected for motion information in the list is set to a predefined value.
  • Clause 53 The method of any of clauses 46-52, wherein a pruning process is applied to candidate motion information, the candidate motion information being to be updated to the list of non-local motion information.
  • Clause 54 The method of clause 53, wherein the candidate motion information is excluded from the list in response to a reference block indicated by the candidate motion information being the same as a further reference block indicated by existing motion information in the list, or wherein the candidate motion information is excluded from the list in response to a difference between a motion vector indicated by the candidate motion information and a further motion vector indicated by an existing motion information in the list being less than a threshold.
  • determining the at least one prediction block for the current video block comprises: determining the at least one prediction block based on at least one entry in the list of non-local motion information.
  • Clause 58 The method of clause 57, wherein the at least one prediction block is determined by a weighted prediction combining the at least one entry.
  • Clause 60 The method of clause 58, wherein at least one index of the at least one entry is inferred to be at least one predefined value.
  • Clause 61 The method of any of clauses 57-60, wherein the at least one entry is selected by minimizing a cost between an entry and the current video block.
  • Clause 62 The method of clause 61, wherein the at least one entry comprises N entries with minimal costs, N being an integer equal to 1 or greater than 1.
  • Clause 63 The method of clause 61, wherein the cost comprises at least one of: a rate distortion cost between an entry and the current video block, or a distortion between an entry and the current video block.
  • Clause 64 The method of clause 63, wherein the cost comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
  • SAD sum of absolute differences
  • SATD sum of absolute transformed differences
  • SSE sum of squared errors
  • MSE mean squared error
  • Clause 65 The method of any of clauses 57-64, wherein the at least one entry in the list of non-local motion information comprises N entries, N being a positive integer, and determining the at least one prediction block comprises: for an ith entry in the list, i being larger than or equal to 0 and less than or equal to N-l, checking the ith entry and determining an ith prediction block for the current video block based on a criterion.
  • Clause 66 The method of clause 65, wherein performing the conversion comprises: determining a residual block between the current video block and the ith prediction block; and applying at least one of the following to the residual block: a transform, a quantization, or an entropy coded.
  • Clause 67 The method of any of clauses 57-66, wherein the at least one entry is sorted before being used to determine the at least one prediction block.
  • Clause 68 The method of clause 67, wherein the sorting of the at least one entry is based on at least one of: at least one template distortion cost, or at least one distortion between at least one template of the at least one entry and a current template of the current video block.
  • a distortion of the at least one distortion comprises a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE) between two templates.
  • SAD sum of absolute differences
  • SATD sum of absolute transformed differences
  • SSE sum of squared errors
  • MSE mean squared error
  • Clause 71 The method of clause 68, wherein an entry in the list comprises a block in a reconstructed region and a template of the block.
  • Clause 72 The method of clause 68, wherein the at least one entry is sorted in an ascending or descending order based on the at least one template distortion cost.
  • Clause 73 The method of clause 72, wherein after sorting the at least one entry in a descending order based on the at least one template distortion cost, first K entries are applied for coding the current video block with NLMC coding mode, K being a positive integer.
  • Clause 74 The method of clause 73, wherein indexed of the first K entries are excluded from the bitstream.
  • Clause 75 The method of clause 73, wherein a value of K is indicated in the bitstream, or is a default value.
  • Clause 76 The method of clause 75, wherein the NLMC coding mode is applied in addition to at least one further coding mode, the at least one further coding mode comprising at least one of an intra prediction mode, an inter prediction mode, an intra block copy prediction mode.
  • Clause 77 The method of clause 76, wherein an indication of usage of the NLMC coding mode is indicated or parsed as a separate prediction mode.
  • Clause 78 The method of clause 77, wherein an indication of an entry in the list is indicated or parsed by a further syntax element, the further syntax element comprises an entry index.
  • Clause 79 The method of clause 78, wherein the entry index being coded by one of a fixed-length coding, an Exponential Golomb coding, a truncated unary coding, or a truncated binary coding.
  • Clause 80 The method of clause 78, wherein a binarization of the entry index is based on the number of entries used in the NLMC coding mode.
  • Clause 83 The method of clause 81, wherein N is the number of all entries in the list, or the number of available entries in the list.
  • Clause 84 The method of any of clauses 1-83, wherein the NLMC coding mode is used as a motion estimation approach for encoding the current video block into the bitstream.
  • Clause 85 The method of clause 84, wherein at least one entry in the list of nonlocal motion information is used as at least one additional initial search point before an integer motion search process.
  • Clause 86 The method of clause 85, wherein at least one integer part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
  • Clause 87 The method of clause 85, wherein at least one integer part and at least one fractional part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
  • Clause 88 The method of clause 84, wherein the at least one entry is used as at least one additional candidate for at least one further motion estimation approach.
  • Clause 90 The method of any of clauses 1-89, wherein the list of non-local motion information is updated with at least one piece of neighboring motion information, the at least one piece of neighboring motion information being not selected by at least one neighboring coding block.
  • Clause 91 The method of any of clauses 1-90, wherein a size of the list of nonlocal motion information is based on a size of the current video block.
  • Clause 92 The method of any of clauses 1-91, wherein a plurality of lists of non- local motion information are determined for a plurality of reference pictures for the NLMC coding mode.
  • Clause 94 The method of any of clauses 1-93, wherein the NLMC coding mode is applied to at least one of: an advanced motion vector prediction (AMVP) coding tool, an affine coding tool, an intra block copy (IBC) coding tool, or a further inter-prediction related coding tool.
  • AMVP advanced motion vector prediction
  • IBC intra block copy
  • Clause 96 The method of clause 95, wherein the method is applied to the current video block in response to the current video block belonging to screen content, and/or wherein the method is applied to at least one of a luma component or a chroma component of the current video block.
  • Clause 97 The method of clause 95, wherein the colour format comprises one of: 4:2:0, 4:4:4, RGB or YUV, and/or wherein the coding tree structure comprises a dural tree structure or a single tree structure.
  • the current video block comprises at least one of: a colour component, a sub-picture, a slice, a tile, a coding tree unit (CTU), a CTU row, a groups of CTU, a coding unit (CU), a prediction unit (PU), a transform unit (TU), a coding tree block (CTB), a coding block (CB), a prediction block(PB), a transform block (TB), a block, sub-block of a block, sub-region within a block, or a region that contains more than one sample or pixel.
  • CTU coding tree unit
  • PU prediction unit
  • TU transform unit
  • CB coding tree block
  • PB prediction block
  • TB transform block
  • Clause 100 The method of any of clauses 1-99, wherein the conversion comprises encoding the current video block into the bitstream.
  • Clause 102 An apparatus for video processing comprising a processor and a non- transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform a method in accordance with any of clauses 1-101.
  • Clause 103 A non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method in accordance with any of clauses 1- 101.
  • a non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method performed by an apparatus for video processing, wherein the method comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and generating the bitstream based on the at least one prediction block.
  • NLMC non-local motion candidate
  • a method for storing a bitstream of a video comprising: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; generating the bitstream based on the at least one prediction block; and storing the bitstream in a non-transitory computer-readable recording medium.
  • NLMC non-local motion candidate
  • Fig. 18 illustrates a block diagram of a computing device 1800 in which various embodiments of the present disclosure can be implemented.
  • the computing device 1800 may be implemented as or included in the source device 110 (or the video encoder 114 or 200) or the destination device 120 (or the video decoder 124 or 300).
  • the computing device 1800 includes a general -purpose computing device 1800.
  • the computing device 1800 may at least comprise one or more processors or processing units 1810, a memory 1820, a storage unit 1830, one or more communication units 1840, one or more input devices 1850, and one or more output devices 1860.
  • the computing device 1800 may be implemented as any user terminal or server terminal having the computing capability.
  • the server terminal may be a server, a large-scale computing device or the like that is provided by a service provider.
  • the user terminal may for example be any type of mobile terminal, fixed terminal, or portable terminal, including a mobile phone, station, unit, device, multimedia computer, multimedia tablet, Internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system (PCS) device, personal navigation device, personal digital assistant (PDA), audio/video player, digital camera/video camera, positioning device, television receiver, radio broadcast receiver, E-book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof.
  • the computing device 1800 can support any type of interface to a user (such as “wearable” circuitry and the like).
  • the processing unit 1810 may be a physical or virtual processor and can implement various processes based on programs stored in the memory 1820. In a multi-processor system, multiple processing units execute computer executable instructions in parallel so as to improve the parallel processing capability of the computing device 1800.
  • the processing unit 1810 may also be referred to as a central processing unit (CPU), a microprocessor, a controller or a microcontroller.
  • the computing device 1800 typically includes various computer storage medium. Such medium can be any medium accessible by the computing device 1800, including, but not limited to, volatile and non-volatile medium, or detachable and non-detachable medium.
  • the memory 1820 can be a volatile memory (for example, a register, cache, Random Access Memory (RAM)), a non-volatile memory (such as a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), or a flash memory), or any combination thereof.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • flash memory any combination thereof.
  • the storage unit 1830 may be any detachable or non-detachable medium and may include a machine-readable medium such as a memory, flash memory drive, magnetic disk or another other media, which can be used for storing information and/or data and can be accessed in the computing device 1800.
  • a machine-readable medium such as a memory, flash memory drive, magnetic disk or another other media, which can be used for storing information and/or data and can be accessed in the computing device 1800.
  • the computing device 1800 may further include additional detachable/non- detachable, volatile/non-volatile memory medium.
  • additional detachable/non- detachable, volatile/non-volatile memory medium may be provided.
  • a magnetic disk drive for reading from and/or writing into a detachable and non-volatile magnetic disk
  • an optical disk drive for reading from and/or writing into a detachable non-volatile optical disk.
  • each drive may be connected to a bus (not shown) via one or more data medium interfaces.
  • the communication unit 1840 communicates with a further computing device via the communication medium.
  • the functions of the components in the computing device 1800 can be implemented by a single computing cluster or multiple computing machines that can communicate via communication connections. Therefore, the computing device 1800 can operate in a networked environment using a logical connection with one or more other servers, networked personal computers (PCs) or further general network nodes.
  • the input device 1850 may be one or more of a variety of input devices, such as a mouse, keyboard, tracking ball, voice-input device, and the like.
  • the output device 1860 may be one or more of a variety of output devices, such as a display, loudspeaker, printer, and the like.
  • the computing device 1800 can further communicate with one or more external devices (not shown) such as the storage devices and display device, with one or more devices enabling the user to interact with the computing device 1800, or any devices (such as a network card, a modem and the like) enabling the computing device 1800 to communicate with one or more other computing devices, if required.
  • external devices such as the storage devices and display device
  • I/O input/output
  • some or all components of the computing device 1800 may also be arranged in cloud computing architecture.
  • the components may be provided remotely and work together to implement the functionalities described in the present disclosure.
  • cloud computing provides computing, software, data access and storage service, which will not require end users to be aware of the physical locations or configurations of the systems or hardware providing these services.
  • the cloud computing provides the services via a wide area network (such as Internet) using suitable protocols.
  • a cloud computing provider provides applications over the wide area network, which can be accessed through a web browser or any other computing components.
  • the software or components of the cloud computing architecture and corresponding data may be stored on a server at a remote position.
  • the computing resources in the cloud computing environment may be merged or distributed at locations in a remote data center.
  • Cloud computing infrastructures may provide the services through a shared data center, though they behave as a single access point for the users. Therefore, the cloud computing architectures may be used to provide the components and functionalities described herein from a service provider at a remote location. Alternatively, they may be provided from a conventional server or installed directly or otherwise on a client device.
  • the computing device 1800 may be used to implement video encoding/decoding in embodiments of the present disclosure.
  • the memory 1820 may include one or more video coding modules 1825 having one or more program instructions. These modules are accessible and executable by the processing unit 1810 to perform the functionalities of the various embodiments described herein.
  • the input device 1850 may receive video data as an input 1870 to be encoded.
  • the video data may be processed, for example, by the video coding module 1825, to generate an encoded bitstream.
  • the encoded bitstream may be provided via the output device 1860 as an output 1880.
  • the input device 1850 may receive an encoded bitstream as the input 1870.
  • the encoded bitstream may be processed, for example, by the video coding module 1825, to generate decoded video data.
  • the decoded video data may be provided via the output device 1860 as the output 1880.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Embodiments of the present disclosure provide a solution for video processing. In a method for video processing, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information for the current video block is determined. The current video block is coded with a non-local motion candidate (NLMC) coding mode. The list of non-local motion information is associated with at least one non- adjacent block of the current video block. At least one prediction block for the current video block is determined based on the list of non-local motion information. The conversion is performed based on the at least one prediction block.

Description

METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/639,451, filed on April 26, 2024, which is expressly incorporated by reference herein in its entirety.
FIELDS
[0002] Embodiments of the present disclosure relate generally to video processing techniques, and more particularly, to a non-local motion candidate (NLMC) coding mode.
BACKGROUND
[0003] In nowadays, digital video capabilities are being applied in various aspects of peoples’ lives. Multiple types of video compression technologies, such as motion picture expert group (MPEG)-2, MPEG-4, international telecommunication union telecommunication standardization sector (ITU-T) H.263, ITU-T H.264/MPEG-4 Part 10 advanced video coding (AVC), ITU-T H.265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding. However, coding efficiency of video coding techniques is generally expected to be further improved.
SUMMARY
[0004] Embodiments of the present disclosure provide a solution for video processing.
[0005] In a first aspect, a method for video processing is proposed. The method comprises: determining, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information for the current video block, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of nonlocal motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and performing the conversion based on the at least one prediction block. The method in accordance with the first aspect of the present disclosure enables coding of video blocks with the NLMC coding mode, so that the coding performance of the video can be improved.
[0006] In a second aspect, an apparatus for video processing is proposed. The apparatus comprises a processor and a non-transitory memory with instructions thereon. The instructions upon execution by the processor, cause the processor to perform a method in accordance with the first aspect of the present disclosure.
[0007] In a third aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium stores instructions that cause a processor to perform a method in accordance with the first aspect of the present disclosure.
[0008] In a fourth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. The method comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and generating the bitstream based on the at least one prediction block.
[0009] In a fifth aspect, a method for storing a bitstream of a video is proposed. The method comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non- adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; generating the bitstream based on the at least one prediction block; and storing the bitstream in a non- transitory computer-readable recording medium.
[0010] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] Through the following detailed description with reference to the accompanying drawings, the above and other objectives, features, and advantages of example embodiments of the present disclosure will become more apparent. In the example embodiments of the present disclosure, the same reference numerals usually refer to the same components.
[0012] Fig. 1 illustrates a block diagram of an example video coding system in accordance with some embodiments of the present disclosure;
[0013] Fig. 2 illustrates a block diagram of an example video encoder in accordance with some embodiments of the present disclosure;
[0014] Fig. 3 illustrates a block diagram of an example video decoder in accordance with some embodiments of the present disclosure;
[0015] Fig. 4 illustrates a derivation process for merge candidates list construction;
[0016] Fig. 5 illustrates positions of spatial merge candidates;
[0017] Fig. 6 illustrates candidate pairs considered for redundancy check of spatial merge candidates;
[0018] Fig. 7A and Fig. 7B illustrate positions for the second PU of N*2N and 2N*N partitions, respectively;
[0019] Fig. 8 illustrates motion vector scaling for temporal merge candidate;
[0020] Fig. 9 illustrates candidate positions for temporal merge candidate;
[0021] Fig. 10 illustrates an example of combined bi-predictive merge candidate;
[0022] Fig. 11 illustrates a derivation process for motion vector prediction candidates;
[0023] Fig. 12 illustrates motion vector scaling for spatial motion vector candidate;
[0024] Fig. 13 A and Fig. 13B illustrate two simplified affine motion models, respectively;
[0025] Fig. 14 illustrates an example of affine MVF per sub-block;
[0026] Fig. 15 illustrates a coding process of NLMC according to some embodiments of the present disclosure;
[0027] Fig. 16 illustrates an example of template of a block according to some embodiments of the present disclosure;
[0028] Fig. 17 illustrates a flowchart of a method for video processing in accordance with some embodiments of the present disclosure; and
[0029] Fig. 18 illustrates a block diagram of a computing device in which various embodiments of the present disclosure can be implemented.
[0030] Throughout the drawings, the same or similar reference numerals usually refer to the same or similar elements.
DETAILED DESCRIPTION
[0031] Principle of the present disclosure will now be described with reference to some embodiments. It is to be understood that these embodiments are described only for the purpose of illustration and help those skilled in the art to understand and implement the present disclosure, without suggesting any limitation as to the scope of the disclosure. The disclosure described herein can be implemented in various manners other than the ones described below.
[0032] In the following description and claims, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skills in the art to which this disclosure belongs.
[0033] References in the present disclosure to “one embodiment,” “an embodiment,” “an example embodiment,” and the like indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an example embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
[0034] It shall be understood that although the terms “first” and “second” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the listed terms.
[0035] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “has”, “having”, “includes” and/or “including”, when used herein, specify the presence of stated features, elements, and/or components etc., but do not preclude the presence or addition of one or more other features, elements, components and/ or combinations thereof.
Example Environment
[0036] Fig. 1 is a block diagram that illustrates an example video coding system 100 that may utilize the techniques of this disclosure. As shown, the video coding system 100 may include a source device 110 and a destination device 120. The source device 110 can be also referred to as a video encoding device, and the destination device 120 can be also referred to as a video decoding device. In operation, the source device 110 can be configured to generate encoded video data and the destination device 120 can be configured to decode the encoded video data generated by the source device 110. The source device 110 may include a video source 112, a video encoder 114, and an input/output (I/O) interface 116.
[0037] The video source 112 may include a source such as a video capture device. Examples of the video capture device include, but are not limited to, an interface to receive video data from a video content provider, a computer graphics system for generating video data, and/or a combination thereof.
[0038] The video data may comprise one or more pictures. The video encoder 114 encodes the video data from the video source 112 to generate a bitstream. The bitstream may include a sequence of bits that form a coded representation of the video data. The bitstream may include coded pictures and associated data. The coded picture is a coded representation of a picture. The associated data may include sequence parameter sets, picture parameter sets, and other syntax structures. The I/O interface 116 may include a modulator/demodulator and/or a transmitter. The encoded video data may be transmitted directly to destination device 120 via the I/O interface 116 through the network 130A. The encoded video data may also be stored onto a storage medium/server 130B for access by destination device 120. [0039] The destination device 120 may include an I/O interface 126, a video decoder 124, and a display device 122. The I/O interface 126 may include a receiver and/or a modem. The I/O interface 126 may acquire encoded video data from the source device 110 or the storage medium/server 130B. The video decoder 124 may decode the encoded video data. The display device 122 may display the decoded video data to a user. The display device 122 may be integrated with the destination device 120, or may be external to the destination device 120 which is configured to interface with an external display device.
[0040] The video encoder 114 and the video decoder 124 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard, Versatile Video Coding (VVC) standard and other current and/or further standards.
[0041] Fig. 2 is a block diagram illustrating an example of a video encoder 200, which may be an example of the video encoder 114 in the system 100 illustrated in Fig. 1, in accordance with some embodiments of the present disclosure.
[0042] The video encoder 200 may be configured to implement any or all of the techniques of this disclosure. In the example of Fig. 2, the video encoder 200 includes a plurality of functional components. The techniques described in this disclosure may be shared among the various components of the video encoder 200. In some examples, a processor may be configured to perform any or all of the techniques described in this disclosure.
[0043] In some embodiments, the video encoder 200 may include a partition unit 201, a prediction unit 202 which may include a mode select unit 203, a motion estimation unit 204, a motion compensation unit 205 and an intra-prediction unit 206, a residual generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 210, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy encoding unit 214.
[0044] In other examples, the video encoder 200 may include more, fewer, or different functional components. In an example, the prediction unit 202 may include an intra block copy (IBC) unit. The IBC unit may perform prediction in an IBC mode in which at least one reference picture is a picture where the current video block is located.
[0045] Furthermore, although some components, such as the motion estimation unit 204 and the motion compensation unit 205, may be integrated, but are represented in the example of Fig. 2 separately for purposes of explanation.
[0046] The partition unit 201 may partition a picture into one or more video blocks. The video encoder 200 and the video decoder 300 may support various video block sizes.
[0047] The mode select unit 203 may select one of the coding modes, intra or inter, e.g., based on error results, and provide the resulting intra-coded or inter-coded block to a residual generation unit 207 to generate residual block data and to a reconstruction unit 212 to reconstruct the encoded block for use as a reference picture. In some examples, the mode select unit 203 may select a combined inter and intra prediction (CIIP) mode in which the prediction is based on an inter prediction signal and an intra prediction signal. The mode select unit 203 may also select a resolution for a motion vector (e.g., a sub-pixel or integer pixel precision) for the block in the case of inter-prediction.
[0048] To perform inter prediction on a current video block, the motion estimation unit
204 may generate motion information for the current video block by comparing one or more reference frames from buffer 213 to the current video block. The motion compensation unit
205 may determine a predicted video block for the current video block based on the motion information and decoded samples of pictures from the buffer 213 other than the picture associated with the current video block.
[0049] The motion estimation unit 204 and the motion compensation unit 205 may perform different operations for a current video block, for example, depending on whether the current video block is in an I-slice, a P-slice, or a B-slice. As used herein, an “I-slice” may refer to a portion of a picture composed of macroblocks, all of which are based upon macroblocks within the same picture. Further, as used herein, in some aspects, “P-slices” and “B-slices” may refer to portions of a picture composed of macroblocks that are not dependent on macroblocks in the same picture.
[0050] In some examples, the motion estimation unit 204 may perform uni-directional prediction for the current video block, and the motion estimation unit 204 may search reference pictures of list 0 or list 1 for a reference video block for the current video block. The motion estimation unit 204 may then generate a reference index that indicates the reference picture in list 0 or list 1 that contains the reference video block and a motion vector that indicates a spatial displacement between the current video block and the reference video block. The motion estimation unit 204 may output the reference index, a prediction direction indicator, and the motion vector as the motion information of the current video block. The motion compensation unit 205 may generate the predicted video block of the current video block based on the reference video block indicated by the motion information of the current video block.
[0051] Alternatively, in other examples, the motion estimation unit 204 may perform bidirectional prediction for the current video block. The motion estimation unit 204 may search the reference pictures in list 0 for a reference video block for the current video block and may also search the reference pictures in list 1 for another reference video block for the current video block. The motion estimation unit 204 may then generate reference indexes that indicate the reference pictures in list 0 and list 1 containing the reference video blocks and motion vectors that indicate spatial displacements between the reference video blocks and the current video block. The motion estimation unit 204 may output the reference indexes and the motion vectors of the current video block as the motion information of the current video block. The motion compensation unit 205 may generate the predicted video block of the current video block based on the reference video blocks indicated by the motion information of the current video block.
[0052] In some examples, the motion estimation unit 204 may output a full set of motion information for decoding processing of a decoder. Alternatively, in some embodiments, the motion estimation unit 204 may signal the motion information of the current video block with reference to the motion information of another video block. For example, the motion estimation unit 204 may determine that the motion information of the current video block is sufficiently similar to the motion information of a neighboring video block.
[0053] In one example, the motion estimation unit 204 may indicate, in a syntax structure associated with the current video block, a value that indicates to the video decoder 300 that the current video block has the same motion information as the another video block.
[0054] In another example, the motion estimation unit 204 may identify, in a syntax structure associated with the current video block, another video block and a motion vector difference (MVD). The motion vector difference indicates a difference between the motion vector of the current video block and the motion vector of the indicated video block. The video decoder 300 may use the motion vector of the indicated video block and the motion vector difference to determine the motion vector of the current video block.
[0055] As discussed above, video encoder 200 may predictively signal the motion vector. Two examples of predictive signaling techniques that may be implemented by video encoder 200 include advanced motion vector prediction (AMVP) and merge mode signaling.
[0056] The intra prediction unit 206 may perform intra prediction on the current video block. When the intra prediction unit 206 performs intra prediction on the current video block, the intra prediction unit 206 may generate prediction data for the current video block based on decoded samples of other video blocks in the same picture. The prediction data for the current video block may include a predicted video block and various syntax elements.
[0057] The residual generation unit 207 may generate residual data for the current video block by subtracting (e.g., indicated by the minus sign) the predicted video block (s) of the current video block from the current video block. The residual data of the current video block may include residual video blocks that correspond to different sample components of the samples in the current video block.
[0058] In other examples, there may be no residual data for the current video block, for example in a skip mode, and the residual generation unit 207 may not perform the subtracting operation.
[0059] The transform unit 208 may generate one or more transform coefficient video blocks for the current video block by applying one or more transforms to a residual video block associated with the current video block.
[0060] After the transform unit 208 generates a transform coefficient video block associated with the current video block, the quantization unit 209 may quantize the transform coefficient video block associated with the current video block based on one or more quantization parameter (QP) values associated with the current video block.
[0061] The inverse quantization unit 210 and the inverse transform unit 211 may apply inverse quantization and inverse transforms to the transform coefficient video block, respectively, to reconstruct a residual video block from the transform coefficient video block. The reconstruction unit 212 may add the reconstructed residual video block to corresponding samples from one or more predicted video blocks generated by the prediction unit 202 to produce a reconstructed video block associated with the current video block for storage in the buffer 213.
[0062] After the reconstruction unit 212 reconstructs the video block, loop filtering operation may be performed to reduce video blocking artifacts in the video block.
[0063] The entropy encoding unit 214 may receive data from other functional components of the video encoder 200. When the entropy encoding unit 214 receives the data, the entropy encoding unit 214 may perform one or more entropy encoding operations to generate entropy encoded data and output a bitstream that includes the entropy encoded data.
[0064] Fig. 3 is a block diagram illustrating an example of a video decoder 300, which may be an example of the video decoder 124 in the system 100 illustrated in Fig. 1, in accordance with some embodiments of the present disclosure.
[0065] The video decoder 300 may be configured to perform any or all of the techniques of this disclosure. In the example of Fig. 3, the video decoder 300 includes a plurality of functional components. The techniques described in this disclosure may be shared among the various components of the video decoder 300. In some examples, a processor may be configured to perform any or all of the techniques described in this disclosure.
[0066] In the example of Fig. 3, the video decoder 300 includes an entropy decoding unit 301, a motion compensation unit 302, an intra prediction unit 303, an inverse quantization unit 304, an inverse transform unit 305, a reconstruction unit 306 and a buffer 307. The video decoder 300 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 200.
[0067] The entropy decoding unit 301 may retrieve an encoded bitstream. The encoded bitstream may include entropy coded video data (e.g., encoded blocks of video data). The entropy decoding unit 301 may decode the entropy coded video data, and from the entropy decoded video data, the motion compensation unit 302 may determine motion information including motion vectors, motion vector precision, reference picture list indexes, and other motion information. The motion compensation unit 302 may, for example, determine such information by performing the AMVP and merge mode. AMVP is used, including derivation of several most probable candidates based on data from adjacent PBs and the reference picture. Motion information typically includes the horizontal and vertical motion vector displacement values, one or two reference picture indices, and, in the case of prediction regions in B slices, an identification of which reference picture list is associated with each index. As used herein, in some aspects, a “merge mode” may refer to deriving the motion information from spatially or temporally neighboring blocks.
[0068] The motion compensation unit 302 may produce motion compensated blocks, possibly performing interpolation based on interpolation filters. Identifiers for interpolation filters to be used with sub-pixel precision may be included in the syntax elements.
[0069] The motion compensation unit 302 may use the interpolation filters as used by the video encoder 200 during encoding of the video block to calculate interpolated values for sub-integer pixels of a reference block. The motion compensation unit 302 may determine the interpolation filters used by the video encoder 200 according to the received syntax information and use the interpolation filters to produce predictive blocks.
[0070] The motion compensation unit 302 may use at least part of the syntax information to determine sizes of blocks used to encode frame(s) and/or slice(s) of the encoded video sequence, partition information that describes how each macroblock of a picture of the encoded video sequence is partitioned, modes indicating how each partition is encoded, one or more reference frames (and reference frame lists) for each inter-encoded block, and other information to decode the encoded video sequence. As used herein, in some aspects, a “slice” may refer to a data structure that can be decoded independently from other slices of the same picture, in terms of entropy coding, signal prediction, and residual signal reconstruction. A slice can either be an entire picture or a region of a picture.
[0071] The intra prediction unit 303 may use intra prediction modes for example received in the bitstream to form a prediction block from spatially adjacent blocks. The inverse quantization unit 304 inverse quantizes, i.e., de-quantizes, the quantized video block coefficients provided in the bitstream and decoded by entropy decoding unit 301. The inverse transform unit 305 applies an inverse transform.
[0072] The reconstruction unit 306 may obtain the decoded blocks, e.g., by summing the residual blocks with the corresponding prediction blocks generated by the motion compensation unit 302 or intra-prediction unit 303. If desired, a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts. The decoded video blocks are then stored in the buffer 307, which provides reference blocks for subsequent motion compensation/intra prediction and also produces decoded video for presentation on a display device.
[0073] Some example embodiments of the present disclosure will be described in detailed hereinafter. It should be understood that section headings are used in the present document to facilitate ease of understanding and do not limit the embodiments disclosed in a section to only that section. Furthermore, while certain embodiments are described with reference to Versatile Video Coding or other specific video codecs, the disclosed techniques are applicable to other video coding technologies also. Furthermore, while some embodiments describe video coding steps in detail, it will be understood that corresponding steps decoding that undo the coding will be implemented by a decoder. Furthermore, the term video processing encompasses video coding or compression, video decoding or decompression and video transcoding in which video pixels are represented from one compressed format into another compressed format or at a different compressed bitrate.
1. Brief Summary
[0074] This disclosure is related to video coding technologies. Specifically, it is related to inter coding module in an encoder. It may be applied to the encoders compatible with existing video coding standards like H.264, HEVC, and VVC. It may be also used as a technology for future video coding standards or video codecs, and could be extended to other fields involving motion search algorithms, e.g. computer vision, pattern recognition, etc.
2. Introduction
[0075] Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. The ITU-T produced H.261 and H.263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly produced the H.262/MPEG-2 Video and H.264/MPEG-4 Advanced Video Coding (AVC) and H.265/HEVC standards. Since H.262, the video coding standards are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized. To explore the future video coding technologies beyond HEVC, Joint Video Exploration Team (JVET) was founded by VCEG and MPEG jointly in 2015. In April 2018, the Joint Video Expert Team (JVET) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) was created to work on the VVC standard targeting at 50% bitrate reduction compared to HEVC.
2.1 Inter prediction in HEVC/H.265
[0076] Each inter-predicted PU has motion parameters for one or two reference picture lists. Motion parameters include a motion vector and a reference picture index. Usage of one of the two reference picture lists may also be signalled using inter _pred ide. Motion vectors may be explicitly coded as deltas relative to predictors.
[0077] When a CU is coded with skip mode, one PU is associated with the CU, and there are no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current PU are obtained from neighbouring PUs, including spatial and temporal candidates. The merge mode can be applied to any inter-predicted PU, not only for skip mode. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector (to be more precise, motion vector differences (MVD) compared to a motion vector predictor), corresponding reference picture index for each reference picture list and reference picture list usage are signalled explicitly per each PU. Such a mode is named Advanced motion vector prediction (AMVP) in this disclosure.
[0078] When signalling indicates that one of the two reference picture lists is to be used, the PU is produced from one block of samples. This is referred to as ‘uni-prediction’. Uniprediction is available both for P-slices and B-slices.
[0079] When signalling indicates that both reference picture lists are to be used, the PU is produced from two blocks of samples. This is referred to as ‘bi-prediction’. Bi-prediction is available for B-slices only.
[0080] The following text provides the details on the inter prediction modes specified in HEVC. The description will start with the merge mode.
2.1.1 Reference picture list
[0081] In HEVC, the term inter prediction is used to denote prediction derived from data elements (e.g., sample values or motion vectors) of reference pictures other than the current decoded picture. Like in H.264/AVC, a picture can be predicted from multiple reference pictures. The reference pictures that are used for inter prediction are organized in one or more reference picture lists. The reference index identifies which of the reference pictures in the list should be used for creating the prediction signal.
[0082] A single reference picture list, List 0, is used for a P slice and two reference picture lists, List 0 and List 1 are used for B slices. It should be noted reference pictures included in List 0/1 could be from past and future pictures in terms of capturing/display order.
2.1.2 Merge Mode
2.1.2.1 Derivation of candidates for merge mode
[0083] When a PU is predicted using merge mode, an index pointing to an entry in the merge candidates list is parsed from the bitstream and used to retrieve the motion information. The construction of this list is specified in the HEVC standard and can be summarized according to the following sequence of steps:
• Step 1 : Initial candidates derivation o Step 1.1: Spatial candidates derivation, o Step 1.2: Redundancy check for spatial candidates, o Step 1.3: Temporal candidates derivation.
• Step 2: Additional candidates insertion, o Step 2.1: Creation of bi-predictive candidates, o Step 2.2: Insertion of zero motion candidates.
[0084] These steps are also schematically depicted in Fig. 4. For spatial merge candidate derivation, a maximum of four merge candidates are selected among candidates that are located in five different positions. For temporal merge candidate derivation, a maximum of one merge candidate is selected among two candidates. Since constant number of candidates for each PU is assumed at decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of merge candidate (MaxNumMergeCand) which is signalled in slice header. Since the number of candidates is constant, index of best merge candidate is encoded using truncated unary binarization (TU). If the size of CU is equal to 8, all the PUs of the current CU share a single merge candidate list, which is identical to the merge candidate list of the 2N*2N prediction unit.
[0085] In the following, the operations associated with the aforementioned steps are detailed.
2.1.2.2 Spatial candidates derivation
[0086] In the derivation of spatial merge candidates, a maximum of four merge candidates are selected among candidates located in the positions depicted in Fig. 5. The order of derivation is Ai, Bi, Bo, Ao and B2. Position B2 is considered only when any PU of position Ai, Bi, Bo, Ao is not available (e.g. because it belongs to another slice or tile) or is intra coded. After candidate at position Ai is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead only the pairs linked with an arrow in Fig. 6 are considered and a candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information. Another source of duplicate motion information is the “ second PIT" associated with partitions different from 2Nx2N. As an example, Fig. 7A depicts the second PU for the case of N*2N, and Fig. 7B depicts the second PU for the case of 2N*N. When the current PU is partitioned as N*2N, candidate at position Ai is not considered for list construction. In fact, by adding this candidate will lead to two prediction units having the same motion information, which is redundant to just have one PU in a coding unit. Similarly, position Bi is not considered when the current PU is partitioned as 2N*N.
2.1.2.3 Temporal candidates derivation
[0087] In this step, only one candidate is added to the list. Particularly, in the derivation of this temporal merge candidate, a scaled motion vector is derived based on co-located PU belonging to the picture which has the smallest POC difference with current picture within the given reference picture list. The reference picture list to be used for derivation of the colocated PU is explicitly signalled in the slice header. The scaled motion vector for temporal merge candidate is obtained as illustrated by the dotted line in Fig. 8, which is scaled from the motion vector of the co-located PU using the POC distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the colocated picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero. A practical realization of the scaling process is described in the HEVC specification. For a B-slice, two motion vectors, one is for reference picture list 0 and the other is for reference picture list 1, are obtained and combined to make the bi- predictive merge candidate.
[0088] In the co-located PU (Y) belonging to the reference frame, the position for the temporal candidate is selected between candidates Co and Ci, as depicted in Fig. 9. If PU at position Co is not available, is intra coded, or is outside of the current coding tree unit (CTU aka. LCU, largest coding unit) row, position Ci is used. Otherwise, position Co is used in the derivation of the temporal merge candidate.
2.1.2.4 Additional candidates insertion
[0089] Besides spatial and temporal merge candidates, there are two additional types of merge candidates: combined bi-predictive merge candidate and zero merge candidate. Combined bi-predictive merge candidates are generated by utilizing spatial and temporal merge candidates. Combined bi-predictive merge candidate is used for B-Slice only. The combined bi-predictive candidates are generated by combining the first reference picture list motion parameters of an initial candidate with the second reference picture list motion parameters of another. If these two tuples provide different motion hypotheses, they will form a new bi-predictive candidate. As an example, Fig. 10 depicts the case when two candidates in the original list (on the left), which have mvLO and refldxLO or mvLl and refldxLl, are used to create a combined bi-predictive merge candidate added to the final list (on the right). There are numerous rules regarding the combinations which are considered to generate these additional merge candidates.
[0090] Zero motion candidates are inserted to fill the remaining entries in the merge candidates list and therefore hit the MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index which starts from zero and increases every time a new zero motion candidate is added to the list. Finally, no redundancy check is performed on these candidates.
2.1.3 AMVP
[0091] AMVP exploits spatiotemporal correlation of motion vector with neighbouring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by firstly checking availability of left, above temporally neighbouring PU positions, removing redundant candidates and adding zero vector to make the candidate list to be constant length. Then, the encoder can select the best predictor from the candidate list and transmit the corresponding index indicating the chosen candidate. Similarly with merge index signalling, the index of the best motion vector candidate is encoded using a truncated unary. The maximum value to be encoded in this case is 2 (see Fig. 11). In the following sections, details about derivation process of motion vector prediction candidate are provided.
2.1.3.1 Derivation of AMVP candidates
[0092] Fig. 11 summarizes derivation process for motion vector prediction candidate. In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidate and temporal motion vector candidate. For spatial motion vector candidate derivation, two motion vector candidates are eventually derived based on motion vectors of each PU located in five different positions as depicted in Fig. 5.
[0093] For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatiotemporal candidates is made, duplicated motion vector candidates in the list are removed. If the number of potential candidates is larger than two, motion vector candidates whose reference picture index within the associated reference picture list is larger than 1 are removed from the list. If the number of spatiotemporal motion vector candidates is smaller than two, additional zero motion vector candidates is added to the list.
2.1.3.2 Spatial motion vector candidates
[0094] In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates, which are derived from PUs located in positions as depicted in Fig. 5, those positions being the same as those of motion merge. The order of derivation for the left side of the current PU is defined as Ao, Ai, and scaled Ao, scaled Ai. The order of derivation for the above side of the current PU is defined as Bo, Bi, B2, scaled Bo, scaled Bi, scaled B2. For each side there are therefore four cases that can be used as motion vector candidate, with two cases not required to use spatial scaling, and two cases where spatial scaling is used. The four different cases are summarized as follows. • No spatial scaling
- (1) Same reference picture list, and same reference picture index (same POC),
- (2) Different reference picture list, but same reference picture (same POC).
• Spatial scaling
- (3) Same reference picture list, but different reference picture (different POC),
- (4) Different reference picture list, and different reference picture (different POC).
[0095] The no-spatial-scaling cases are checked first followed by the spatial scaling. Spatial scaling is considered when the POC is different between the reference picture of the neighbouring PU and that of the current PU regardless of reference picture list. If all PUs of left candidates are not available or are intra coded, scaling for the above motion vector is allowed to help parallel derivation of left and above MV candidates. Otherwise, spatial scaling is not allowed for the above motion vector.
[0096] In a spatial scaling process, the motion vector of the neighbouring PU is scaled in a similar manner as for temporal scaling, as depicted as Fig. 12. The main difference is that the reference picture list and index of current PU is given as input; the actual scaling process is the same as that of temporal scaling.
2.1.3.3 Temporal motion vector candidates
[0097] Apart for the reference picture index derivation, all processes for the derivation of temporal merge candidates are the same as for the derivation of spatial motion vector candidates (see Fig. 9). The reference picture index is signalled to the decoder.
2.2 Inter prediction methods in VVC
[0098] There are several new coding tools for inter prediction improvement, such as adaptive motion vector difference resolution (AMVR) for signaling MVD, affine prediction mode, geometric partition mode (GPM), Advanced TMVP (ATMVP, aka SbTMVP) and Bidirectional optical flow (BDOF).
2.2.1 Adaptive motion vector difference resolution
[0099] In HEVC, motion vector differences (MVDs) (between the motion vector and predicted motion vector of a PU) are signalled in units of quarter luma samples when use integer mv flag is equal to 0 in the slice header. In the VVC, a locally adaptive motion vector resolution (AMVR) is introduced. In the VVC, MVD can be coded in units of quarter luma samples, integer luma samples or four luma samples (i.e., ’A-pel, 1-pel, 4-pel). The MVD resolution is controlled at the coding unit (CU) level, and MVD resolution flags are conditionally signalled for each CU that has at least one non-zero MVD component. [0100] For a CU that has at least one non-zero MVD components, a flag at first is signalled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter luma sample MV precision is not used, another flag is signalled to indicate whether integer luma sample MV precision or four luma sample MV precision is used.
[0101] When the first MVD resolution flag of a CU is zero, or not coded for a CU (meaning all MVDs in the CU are zero), the quarter luma sample MV resolution is used for the CU. When a CU uses integer-luma sample MV precision or four-luma-sample MV precision, the MVPs in the AMVP candidate list for the CU are rounded to the corresponding precision.
2.2.2 Affine motion compensation prediction
[0102] In HEVC, only translation motion model is applied for motion compensation prediction (MCP). While in the real world, there are many kinds of motion, e.g. zoom in/out, rotation, perspective motions and the other irregular motions. In VVC, a simplified affine transform motion compensation prediction is applied with 4-parameter affine model and 6- parameter affine model. As shown in Fig. 13A and Fig. 13B, the affine motion field of the block is described by two control point motion vectors (CPMVs) for the 4-parameter affine model (see Fig. 13A) and 3 CPMVs for the 6-parameter affine model (see Fig. 13B).
[0103] The motion vector field (MVF) of a block is described by the following equations with the 4-parameter affine model (wherein the 4-parameter are defined as the variables a, b, e and f) in equation (1) and 6-parameter affine model (wherein the 4-parameter are defined as the variables a, b, c, d, e and f) in equation (2) respectively: where (mvho, mvho) is motion vector of the top-left corner control point, and (mvhi, mvhi) is motion vector of the top-right corner control point and (mvh2, mvh2) is motion vector of the bottom-left corner control point, all of the three motion vectors are called control point motion vectors (CPMV), (x, y) represents the coordinate of a representative point relative to the top-left sample within current block and (mvh(x,y), mvv(x,y)) is the motion vector derived for a sample located at (x, y). The CP motion vectors may be signaled (like in the affine AMVP mode) or derived on-the-fly (like in the affine merge mode), w and h are the width and height of the current block. In practice, the division is implemented by right-shift with a rounding operation. In VTM, the representative point is defined to be the center position of a sub-block, e.g., when the coordinate of the left-top corner of a sub-block relative to the top-left sample within current block is (xs, ys), the coordinate of the representative point is defined to be (xs+2, ys+2). For each sub-block (i.e., 4x4 in VTM), the representative point is utilized to derive the motion vector for the whole sub-block.
[0104] To further simplify the motion compensation prediction, sub-block based affine transform prediction is applied. To derive motion vector of each MxN (both M and N are set to 4 in current VVC) sub-block, the motion vector of the center sample of each sub-block, as shown in Fig. 14, is calculated according to Equation (1) and (2), and rounded to 1/16 fraction accuracy. Then the motion compensation interpolation filters for 1/16-pel are applied to generate the prediction of each sub-block with derived motion vector. The interpolation filters for 1/16-pel are introduced by the affine mode.
2.2.3 Merge with motion vector differences (MMVD)
[0105] In VVC, ultimate motion vector expression (UMVE, also known as MMVD) is presented. UMVE is used for either skip or merge modes with a proposed motion vector expression method.
[0106] UMVE re-uses merge candidate as same as those included in the regular merge candidate list in VVC. Among the merge candidates, a base candidate can be selected, and is further expanded by the proposed motion vector expression method.
[0107] UMVE provides a new motion vector difference (MVD) representation method, in which a starting point, a motion magnitude and a motion direction are used to represent an MVD.
[0108] This technique uses a merge candidate list as it is. But only candidates which are default merge type (MRG TYPE DEFAULT N) are considered for UMVE’s expansion.
2.2.4 Decoder-side Motion Vector Refinement (DMVR)
[0109] In bi-prediction operation, for the prediction of one block region, two prediction blocks, formed using a motion vector (MV) of listO and a MV of listl, respectively, are combined to form a single prediction signal. In the decoder-side motion vector refinement (DMVR) method, the two motion vectors of the bi-prediction are further refined.
[0110] For DMVR in VVC, MVD mirroring between list 0 and list 1, and bilateral matching is performed to refine the MVs, i.e., to find the best MVD among several MVD candidates. 2.2.5 Combined intra and inter prediction (CUP)
[OHl] Multi-hypothesis prediction is proposed in VVC, wherein combined intra and inter prediction is one way to generate multiple hypotheses.
[0112] When the multi-hypothesis prediction is applied to improve intra mode, multihypothesis prediction combines one intra prediction and one merge indexed prediction. In a merge CU, one flag is signaled for merge mode to select an intra mode from an intra candidate list when the flag is true. For luma component, the intra candidate list is derived from 4 intra prediction modes including DC, planar, horizontal, and vertical modes, and the size of the intra candidate list can be 3 or 4 depending on the block shape. When the CU width is larger than the double of CU height, horizontal mode is exclusive of the intra mode list and when the CU height is larger than the double of CU width, vertical mode is removed from the intra mode list. One intra prediction mode selected by the intra mode index and one merge indexed prediction selected by the merge index are combined using weighted average. For chroma component, DM is always applied without extra signaling. The weights for combining predictions are described as follows. When DC or planar mode is selected, or the CB width or height is smaller than 4, equal weights are applied. For those CBs with CB width and height larger than or equal to 4, when horizontal/vertical mode is selected, one CB is first vertically/horizontally split into four equal-area regions. Each weight set, denoted as (w_intrai, w_inten), where i is from 1 to 4 and (w_intrai, w_inten) = (6, 2), (w_intra2, w_inter2) = (5, 3), (w_intras, w_inters) = (3, 5), and (w_intra4, w_inter4) = (2, 6), will be applied to a corresponding region, (w intrai, w inten) is for the region closest to the reference samples and (w_intra4, w in ten) is for the region farthest away from the reference samples. Then, the combined prediction can be calculated by summing up the two weighted predictions and right-shifting 3 bits. Moreover, the intra prediction mode for the intra hypothesis of predictors can be saved for reference of the following neighboring CUs.
2.2.6 History-based merge candidates derivation
[0113] The history-based MVP (HMVP) merge candidates are added to merge list after the spatial MVP and TMVP. In this method, the motion information of a previously coded block is stored in a table and used as MVP for the current CU. The table with multiple HMVP candidates is maintained during the encoding/decoding process. The table is reset (emptied) when a new CTU row is encountered. Whenever there is a non-subblock inter-coded CU, the associated motion information is added to the last entry of the table as a new HMVP candidate.
[0114] The HMVP table size S is set to be 6, which indicates up to 6 History-based MVP (HMVP) candidates may be added to the table. When inserting a new motion candidate to the table, a constrained first-in-first-out (FIFO) rule is utilized wherein redundancy check is firstly applied to find whether there is an identical HMVP in the table. If found, the identical HMVP is removed from the table and all the HMVP candidates afterwards are moved forward.
[0115] HMVP candidates could be used in the merge candidate list construction process. The latest several HMVP candidates in the table are checked in order and inserted to the candidate list after the TMVP candidate. Redundancy check is applied on the HMVP candidates to the spatial or temporal merge candidate.
[0116] To reduce the number of redundancy check operations, the following simplifications are introduced:
1) Number of HMPV candidates is used for merge list generation is set as (N <= 4) ? M (8 - N), wherein N indicates number of existing candidates in the merge list and M indicates number of available HMVP candidates in the table.
2) Once the total number of available merge candidates reaches the maximally allowed merge candidates minus 1, the merge candidate list construction process from HMVP is terminated.
3. Problems
1. Non-local motions are observed in videos usually. For the merge modes, in the existing methods, the motion information of a block is only able to be merged with those of its adjacent blocks. The non-local candidates could not be handled yet.
2. For the AMVP mode, although it can capture all motions in a video in theory, it will cost much encoding complexity due to an extensive motion search. A method is desirable if it could capture non-local motions with a low complexity.
4. Detailed solution
[0117] The detailed embodiments below should be considered as examples to explain general concepts. These embodiments should not be interpreted in a narrow way. Furthermore, these embodiments can be combined in any manner.
[0118] In this disclosure, a non-local motion candidates coding mode (NLMC) wherein possible motion candidates from non-adjacent block could be utilized and are determined by the coded information of current block is described. It could be used a coding tool for future image/video coding standards, also could be used as a motion estimation method for an encoder. Note that the motion candidate could be a motion candidate with its reference pointing to another picture, or within the current picture (e.g., also known as block vector candidate). The term ‘block’ may represent a video unit, e.g., a CU/PU/TU/a region/a group of video units.
Non-local motion candidates (NLMC) codins modes as a codins tool
1. A non-local motion information list is employed in the NLMC. a. The list may be constructed according to the coded information of current block (e.g., block size, and/or dimension, and/or coded mode). b. The list used in NLMC may contain one or multiple entries. c. The maximum number of entries in the list may be pre-defined or signaled in the bitstream. d. In one example, an entry of the list may contain one of more following elements, motion vectors, reference picture index and/or reference list index. i. In one example, the motion vectors may be stored in integer precision or fractional precision. e. In one example, a reference prediction block may be derived by an entry in the list. i. In one example, a reference prediction block may be in previous frames, or the current frame indicated by the motion information in the list.
2. The list employed in the NLMC may be generated/updated with neighboring motion information. a. In one example, different video units may have same/different lists. i. In one example, a video unit may be a block/CU/CTU/CTU row/slice/tile. ii. In one example, a video unit may be a group of CUs with same size. iii. In one example, the size of list may depend on coded information in the bitstream. a. In one example, the coded information may be the size of a current coding block. b. In one example, the coded information may be the prediction mode, e.g. AMVP or merge mode, of a current coding block. b. In one example, the list may include all the motion information of inter-coded blocks in a NxM neighboring region. i. Alternatively, the motion information may be added as motion candidates in an order until the list is full. a. Alternatively, furthermore, pruning may be applied before adding one motion candidate. ii. Alternatively, only certain of the motion information associated with a block within the NxM neighboring region is allowed to be added to the list. a. In one example, the motion information of merge/ skip coded blocks may not be included to the list. b. In one example, the motion information associated with a neighbouring block coded in the same mode (e.g., inter/IBC) as the current block may be added. c. In one example, the motion information associated with a neighbouring block in the same size as the current block may be added. a) In one example, a KxL block may only be able to fetch the motion information from the list associated with KxL blocks. iii. In one example, only one of the repeated motion information of inter-coded blocks may be included to the list. iv. In one example, the list may decide whether to include a motion information of an inter-coded block based on the motion vector magnitude, reference list and/or reference picture index. a. In one example, the list may only include one motion information of a block for a given reference picture list. b. In one example, the list may only include one motion information of a block for a given reference picture index. c. In one example, the motion information of a block may be excluded if it is close to one of the motion information already in the list. a) In one example, two motion information are close when their reference picture list and/or reference picture index are same. b) In one example, two motion information are considered as close when only their motion vector is close. i. In one example, two motion vectors are close when both the vertical and horizontal positional difference is smaller than K. ii. Alternatively, in one example, two motion vectors are close when either one of vertical or horizontal positional difference is smaller than K. v. In one example, the list may decide whether to include a motion information of an inter-coded block based on the pixel values of its indicated reference block. a. In one example, a motion information of an inter-coded block may be excluded if the reference block it indicated is identical to one of the reference blocks indicated by the motion information already in the NLMC list. b. In one example, a motion information of an inter-coded block may be excluded if the reference block it indicated is similar to one of the reference blocks indicated by the motion information already in the NLMC list. a) In one example, two block may be considered as similar when their difference is small. i. In one example, the difference may be measured by SAD, SATD, SSE or MSE. ii. In one example, the difference may be considered as small when the SAD, SATD, SSE or MSE value is smaller than K. vi. In one example, the list for video units with different coded information may be separate. a. In one example, the lists for coding blocks with different coding modes may be separate. a) In one example, the AMVP mode, affine mode, IBC mode may have separate NLMC lists. b) Alternatively, in one example, the AMVP mode, affine mode, IBC mode may share a same NLMC list. b. In one example, the lists for coding blocks with different block sizes may be separate. a) In one example, a NxN block may only be able to fetch the motion information from the list corresponding to the NxN size. vii. In one example, for each NxM region, there may be a NLMC list stored. For the current coding block, a new NLMC list may be generated by combining the list of the current region which the current coding block belongs to and the list of its neighboring regions. a. In one example, the neighboring region may denote spatial neighboring regions, such as top neighboring region, left neighboring region. b. In one example, the neighboring region may denote temporal neighboring regions, such as the collocated region. c. In one example, when combining those lists, the list from a closer neighboring region may be combined first. d. In one example, the number of regions to be used for the list generation may be coded in the bitstream. a) Alternatively, in one example, the number of regions to be used for the list generation may be inferred to N. c. In one example, the NLMC list may include all the motion information of intercoded blocks in a region in a current frame and/or previous frames. i. In one example, a region may be a video unit. a. In one example, a video unit may be a block/CU/CTU/CTU row/slice/tile. a) Alternatively, a video unit may be several blocks/CUs/CTUs/CTU rows/slices/tiles. b) Alternatively, in one example, a video unit may be several CUs/blocks with same size. ii. In one example, a regions may be set from (Cx - M, Cy - N) to (Cx + L, Cy + K) for different frames, where (Cx, Cy) is the position of a current block, and M, N, L and K are integer numbers. a. Alternatively, in one example, the positions of the region for different frames may be different. iii. In one example, the region size may be fixed for different frames, e.g. MxN. a. Alternatively, in one example, the region size may be different for different frames. iv. In one example, the number of frames to be collected for motion information may be set to K. v. In one example, pruning may be applied when a motion information is updated to the NLMC list. a. In one example, a motion information may not be added to the NLMC list if the reference block it indicated is identical to a reference block indicated by an existing motion information in the list. b. In one example, a motion information may not be added to the NLMC list if its motion vector is identical to a motion vector indicated by an existing motion information in the list. a) Alternatively, in one example, a motion information may not be added to the NLMC list if its motion vector is close to a motion vector indicated by an existing motion information in the list. i. In one example, two motion vectors are close when both the vertical and horizontal positional difference is smaller than K. ii. Alternatively, in one example, two motion vectors are close when either one of vertical or horizontal positional difference is smaller than K. c. In one example, a motion information may not be added to the NLMC list if its reference index is identical to a reference index indicated by an existing motion information in the list. When the NLMC is applied, the prediction block of a current block may be generated depending on an entry or more entries in the NLMC list. a. In one example, the prediction block of a current block may be generated by weighted prediction combining an entry or more entries in the list. i. In one example, the index of the entries to generate the prediction block may be signalled to the decoder. a. Alternatively, in one example, the index of this entry may be inferred to N. b. In one example, how to select the best N entries may be determined by minimizing a certain cost. i. In one example, N may be equal to 1 or greater than 1. ii. In one example, the best N entries may denote the N entries with minimal costs. iii. In one example, the cost may denote the rate distortion cost between an entry and the current block. iv. Alternatively, in one example, the cost may denote the distortion between an entry and the current block, such as SAD, SATD, SSE or MSE. c. In one example, the process of NLMC may be illustrated as in Fig. 15. In Fig. 15, the list has N entries, and Bi is the ith entry in the list. The current block is denoted by C. The encoder may first check each entry Bi (0 <= i <= N-l) and determine a best prediction block for C under a certain criterion, which is denoted by BK in Fig. 15. After that, the corresponding residual block between C and BK may be transformed, quantized and/or entropy coded.
Entries in a list may be firstly sorted before being used to derive prediction/reconstruction of a current block. d. It is proposed to sort the entries in the list based on the distortion between the template of each entry and the template of the current block. e. In one example, as shown in Fig. 16, if the current block is SlxS2, the template may denote a MxN region excluding the region of the current block, where M > SI and N > S2. f. In one example, the distortion in the above example may denote the distortion between two templates, such as SAD, SATD, SSE or MSE. g. In one example, the entry may not only include a block in the reconstructed region but also include the template of the block. h. In one example, the entries in the list may be sorted in an ascending/descending order based on the template distortion cost. i. In one example, after sorting the dictionary in a descending order based on the template distortion cost, only the first K entries could be applied when coding the current block with NLMC. i. In one example, the indexes of the K entries may not need to be signalled to the decoder. ii. In one example, the value of K may be a default value. a. Alternatively, in one example, the value of K may be signalled to the decoder.
4. The NLMC may be treated as a new prediction mode in additional to existing ones (e.g., intra/inter/IBC prediction modes). a. In one example, indication of the usage of the NLMC may be signalled/parsed as a separate prediction mode (e.g. MODE NLMC). i. In one example, indication of an entry in the list is signalled/parsed by additional syntaxes. b. In one example, fixed length coding/Exp-Golomb/truncated unary/truncated binary may be used to binarize the entry index. c. The binarization of the signalled index in NLMC may depend on the number of possible entries used in NLMC. i. In one example, the binarization may be fixed length with M bits. a. In one example, M may be set equal to floor(log2(N)). a) In one example, log2(N) is a function to get the logarithm of N to the base 2. b) In one example, floor(N) is a function to get the nearest integer upbound of N. ii. In one example, the binarization may be truncated binary/unary with cMax equal to N. iii. In the above examples, N may be the number of all entries in the dictionary. a. Alternatively, N may be the number of available entries in the dictionary.
Non-local motion candidates codins modes (NLMC) may be used as a motion estimation method for encoders
5. The NLMC may be treated as a motion estimation method for encoders. a. In one example, the entries in the NLMC may be used as additional initial search points before the integer motion search process starts. i. In one example, only the integer parts of a motion vector in an entry may be used. a. Alternatively, both the integer parts and fractional parts of a motion vector in an entry may be used. b. In one example, the entries in the NLMC may be used as additional candidates, as a complement to the traditional motion estimation methods. c. In one example, one entry in the list may be used to generate the prediction block for a current block. i. In one example, the motion information contained in the entry may be signalled by the AMVP manner. d. In one example, the NLMC list may be updated with those neighboring motion information, which may not be the best mode selected by neighboring coding blocks. e. In one example, the NLMC list size may be dependent on the coding block size. f. In one example, there may be multiple NLMC lists corresponding to different reference pictures. g. In one example, the NLMC list may exclude the motion information of inter-coded blocks which have a large inter cost. i. In one example, if the inter cost of a block is greater than K, the corresponding motion information may be excluded to the NLMC list.
6. The above methods may be applied to AMVP. Affine, IBC and/or other inter-prediction related coding tools.
General aspects
7. The M, N, L, and/or K in the above examples may be integer numbers. a. In one example, the M, N, and/or K may be any integer values. a. In one example, both of M and N may be equal to 4. a. In one example, N may be a pre-defined constant value for all QPs. b. In one example, N may be signalled to the decoder. c. In one example, N may be based on a. Video contents (e.g. screen contents or natural contents). b. A message signaled in the DPS/SPS/VPS/PPS/APS/picture header/slice header/tile group header/ Largest coding unit (LCU)/Coding unit (CU)/LCU row/group of LCUs/TU/PU block/Video coding unit. c. Position of CU/PU/TU/block/Video coding unit. d. Block dimension of current block and/or its neighboring blocks. e. Block shape of current block and/or its neighboring blocks. f. Quantization parameter of the current block. g. Indication of the color format (such as 4:2:0, 4:4:4, RGB or YUV). h. Coding tree structure (such as dual tree or single tree). i. Slice/tile group type and/or picture type. j. Color component (e.g. may be only applied on luma component and/or chroma component) . k. Temporal layer ID.
8. Whether and/or how apply the above methods may be based on: b. Video contents (e.g. screen contents or natural contents). i. In one example, the above methods may be only applied on screen contents. c. A message signaled in the DPS/SPS/VPS/PPS/APS/picture header/slice header/tile group header/Largest coding unit (LCU)/Coding unit (CU)/LCU row/group of LCUs/TU/PU block/Video coding unit. d. Position of CU/PU/TU/block/Video coding unit. e. Block dimension of current block and/or its neighboring blocks. f. Block shape of current block and/or its neighboring blocks. g. Quantization parameter of the current block. h. Indication of the color format (such as 4:2:0, 4:4:4, RGB or YUV). i. Coding tree structure (such as dual tree or single tree). j . Slice/tile group type and/or picture type. k. Color component (e.g. may be only applied on luma component and/or chroma component). l. Temporal layer ID. m. Profiles/Levels/Tiers of a standard.
9. The above methods may be applied to the motion estimation methods used in encoding, pre-analysis and/or MCTF processes.
[0119] Fig. 17 illustrates a flowchart of a method 1700 for video processing in accordance with embodiments of the present disclosure. The method 1700 is implemented during a conversion between a video unit or video block of a video and a bitstream of the video.
[0120] At block 1710, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information is determined for the current video block. The current video block is coded with a non-local motion candidate (NLMC) coding mode. The list of non-local motion information is associated with at least one nonadj acent block of the current video block.
[0121] At block 1720, at least one prediction block for the current video block is determined based on the list of non-local motion information.
[0122] At block 1730, the conversion is performed based on the at least one prediction block.
[0123] The method 1700 enables coding of video blocks with the NLMC coding mode, so that the coding performance of the video can be improved.
[0124] In some embodiments, the list of non-local motion information is constructed based on coded information of the current video block. The coded information includes at least one of: a block size of the current video block, a dimension of the current video block, or a coded mode of the current video block.
[0125] In some embodiments, the list of non-local motion information used for the NLMC coding mode comprises at least one entry. A maximum number of entries in the list of nonlocal motion information may be predefined, or may be indicated in the bitstream.
[0126] In some embodiments, an entry in the list of the non-local motion information comprises at least one of: at least one motion vector, a reference picture index, or a reference list index.
[0127] In some embodiments, the at least one motion vector is in an integer precision or a fractional precision.
[0128] In some embodiments, a reference prediction block is determined based on an entry in the list.
[0129] In some embodiments, the reference prediction block is in a previously coded frame, or a current frame indicated by motion information in the list of non-local motion information. [0130] In some embodiments, the method 1700 further includes generating the list of nonlocal motion information based on neighboring motion information of the current video block, and/or updating the list of non-local motion information based on neighboring motion information of the current video block.
[0131] In some embodiments, a first list of non-local motion information of a first video unit is the same as or different from a second list of non-local motion information of a second video unit.
[0132] In some embodiments, the first or second video unit comprises one of: a block, a coding unit (CU), a coding tree unit (CTU), a CTU row, a slice, a tile, or a group of CUs with a same size.
[0133] In some embodiments, a size of the list of non-local motion information is based on coded information in the bitstream. The coded information comprises at least one of: a size of the current video block, or a prediction mode of the current video block.
[0134] In some embodiments, the prediction mode of the current video block comprises at least one of: an advanced motion vector prediction (AMVP) mode, or a merge mode.
[0135] In some embodiments, the list of non-local motion information comprises all motion information of inter-coded blocks in a NxM neighboring region. N and M are positive integers.
[0136] In some embodiments, the list of non-local motion information comprises at least one piece of motion information of inter-coded blocks in a NxM neighboring region. N and M are positive integers. The at least one piece of motion information is added as at least one motion candidate in the list in an order until the list is full.
[0137] In some embodiments, a pruning process is applied to at least one motion candidate before adding the at least one motion candidate into the list. That is, same candidates or similar candidates may be pruned.
[0138] In some embodiments, in response to first motion information associated with a block within a NxM neighboring region satisfying a condition, the first motion information is allowed to be added into the list. N and M are positive integers.
[0139] In some embodiments, the condition comprises at least one of the first motion information being associated with a neighboring block coded in the same mode as the current video block, the same mode being an inter mode or an intra block copy mode, or the first motion information being associated with a neighboring block in the same size as the current video block.
[0140] In some embodiments, in response to the first motion information being of a merge coded block or a skip coded block, the first motion information is excluded from the list.
[0141] In some embodiments, a KxL block obtains motion information from the list associated with KxL blocks, K and L being positive integers.
[0142] In some embodiments, one piece (for example, only one) of repeated pieces of motion information is included in the list.
[0143] In some embodiments, whether to include second motion information of an intercoded block in the list is based on at least one of a motion vector magnitude of the second motion information, a reference list of the second motion information, or a reference picture index of the second motion information.
[0144] In some embodiments, a single piece of motion information of a block for a given reference picture list is included in the list. In some embodiments, a single piece of motion information of a block for a given reference picture index is included in the list.
[0145] In some embodiments, a first piece of motion information of a block is excluded from the list in response to a difference between the first piece of motion information and a second piece of motion information in the list being less than a threshold.
[0146] In some embodiments, the difference between the first and second pieces of motion information is less than the threshold in response to at least one of a first reference picture list of the first piece of motion information being same with a second reference picture list of the second piece of motion information, or a first reference picture index of the first piece of motion information being same with a second reference picture of the second piece of motion information.
[0147] In some embodiments, the difference between the first and second pieces of motion information is less than the threshold in response to a vector difference between a first motion vector of the first piece of motion information and a second motion vector of the second piece of motion information being less than the threshold.
[0148] In some embodiments, the vector difference of the first and second motion vectors is less than the threshold in response to at least one of a vertical positional difference between the first and second motion vectors being less than the threshold, or a horizontal positional difference between the first and second motion vectors being less than threshold. [0149] In some embodiments, whether to include third motion information of an intercoded block in the list is based on pixel values of a reference block indicated by the third motion information.
[0150] In some embodiments, the third motion information is excluded from the list in response to at least one of the reference block indicated by the third motion information being the same as a further reference block indicated by motion information in the list, or a difference between the reference block indicated by the third motion information and a further reference block indicated by motion information in the list being less than a threshold. The difference may be a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE), and/or the like.
[0151] In some embodiments, a first list of non-local motion information for a first coding block and a second list of non-local motion information for a second coding block are separate, the first coding block being with first coded information and the second coding block being with second coded information.
[0152] In some embodiments, the first coding block is coded with a first coding mode and the second coding block is coded with a second coding mode.
[0153] In some embodiments, the first coding mode comprises one of an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode, and the second coding mode comprises another one of the AMVP mode, the affine mode or the IBC mode.
[0154] In some embodiments, a first block size of the first coding block is different from a second block size of the second coding block.
[0155] In some embodiments, a NxN block is accessible to motion information from the list of non-local motion information corresponding to a NxN size, N being a positive integer. [0156] In some embodiments, a same list of non-local motion information is constructed for at least one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode.
[0157] In some embodiments, for each of a plurality of NxM regions, a respective list of non-local motion information for NLMC is stored, N and M being positive integers.
[0158] In some embodiments, determining the list of non-local motion information for the current video block comprises: determining the list of non-local motion information by combining a first list of non-local motion information of a current region which the current video block belongs to and at least one list of at least one neighboring region of the current region.
[0159] In some embodiments, the at least one neighboring region comprises at least one of: at least one spatial neighboring region, or at least one temporal neighboring region.
[0160] In some embodiments, the at least one spatial neighboring region comprises at least one of: a top neighboring region, or a left neighboring region.
[0161] In some embodiments, the at least one temporal neighboring region comprises a collated region.
[0162] In some embodiments, the combining of the at last one list is based on at least one distance between the at least one neighboring region and the current region.
[0163] In some embodiments, the number of the at least one neighboring region is indicated in the bitstream. Alternatively, in some embodiments, the number of the at least one neighboring region is inferred to be a predefined value.
[0164] In some embodiments, the list of non-local motion information comprises motion information of at least one inter-coded block in a region in at least one of: a current frame, or at least one previous frame.
[0165] In some embodiments, the region comprises a video unit, the video unit comprising one of: at least one block, at least one coding unit (CU), at least one coding tree unit (CTU), at least one CTU row, at least one slice, at least one tile, a plurality of CUs with a same size, or a plurality of blocks with a same size.
[0166] In some embodiments, the region comprises positions from (Cx - M, Cy - N) to (Cx + L, Cy + K) for at least one frame. (Cx, Cy) denotes a position of the current video block. M, N, L and K are integers.
[0167] In some embodiments, M and N are equal to 4, or N comprises a predefined value for all quantization parameters (QPs), or N is indicated in the bitstream, or N is based on at least one of: video content, a block dimension of the current video block, a block dimension of a neighboring block, a block shape of the current video block, a block shape of a neighboring block, a QP of the current video block, an indication of a color format, a dual tree or single tree structure, a slice type, a tile group type, a picture type, a color component, a temporal layer identifier, a position of a coding unit (CU), prediction unit (PU), transform unit (TU), a block or a video coding unit, or a message indicated in a video region, the video region being one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit.
[0168] In some embodiments, a position of a first region for a first frame is different from a position of a second region for a second frame.
[0169] In some embodiments, a first size of a first region for a first frame is the same as or different from a second size of a second region for a second frame.
[0170] In some embodiments, the number of frames to be collected for motion information in the list is set to a predefined value.
[0171] In some embodiments, a pruning process is applied to candidate motion information, the candidate motion information being to be updated to the list of non-local motion information.
[0172] In some embodiments, the candidate motion information is excluded from the list in response to a reference block indicated by the candidate motion information being the same as a further reference block indicated by existing motion information in the list, or the candidate motion information is excluded from the list in response to a difference between a motion vector indicated by the candidate motion information and a further motion vector indicated by an existing motion information in the list being less than a threshold.
[0173] In some embodiments, the difference between the motion vector and the further motion vector is less than the threshold in response to at least one of a vertical positional difference between the motion vector and the further motion vector being less than the threshold, or a horizontal positional difference between the motion vector and the further motion vector being less than the threshold.
[0174] In some embodiments, the candidate motion information is excluded from the list in response to a reference index indicated by the candidate motion information being the same as a further reference index indicated by existing motion information in the list.
[0175] In some embodiments, determining the at least one prediction block for the current video block comprises: determining the at least one prediction block based on at least one entry in the list of non-local motion information.
[0176] In some embodiments, the at least one prediction block is determined by a weighted prediction combining the at least one entry.
[0177] In some embodiments, at least one index of the at least one entry is indicated in the bitstream. Alternatively, in some embodiments, at least one index of the at least one entry is inferred to be at least one predefined value.
[0178] In some embodiments, the at least one entry is selected by minimizing a cost between an entry and the current video block.
[0179] In some embodiments, the at least one entry comprises N entries with minimal costs, N being an integer equal to 1 or greater than 1.
[0180] In some embodiments, the cost comprises at least one of: a rate distortion cost between an entry and the current video block, or a distortion between an entry and the current video block.
[0181] In some embodiments, the cost comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
[0182] In some embodiments, the at least one entry in the list of non-local motion information comprises N entries, N being a positive integer, and determining the at least one prediction block comprises: for an ith entry in the list, i being larger than or equal to 0 and less than or equal to N-l, checking the ith entry and determining an ith prediction block for the current video block based on a criterion.
[0183] In some embodiments, as shown in Fig. 15, performing the conversion comprises: determining a residual block between the current video block and the ith prediction block (such as Bi in Fig. 15, i=0, 1, 2, ..., N-l); and applying at least one of the following to the residual block: a transform, a quantization, or an entropy coded.
[0184] In some embodiments, the at least one entry is sorted before being used to determine the at least one prediction block. The sorting of the at least one entry may be based on: at least one template distortion cost, and/or at least one distortion between at least one template of the at least one entry and a current template of the current video block.
[0185] In some embodiments, a size of the current video block is SlxS2, and the current template comprises a MxN region excluding a region of the current video block. M is larger than SI and N is larger than S2, M, N, SI and S2 being positive integers.
[0186] In some embodiments, a distortion of the at least one distortion comprises a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE) between two templates.
[0187] In some embodiments, an entry in the list comprises a block in a reconstructed region and a template of the block.
[0188] In some embodiments, the at least one entry is sorted in an ascending or descending order based on the at least one template distortion cost.
[0189] In some embodiments, after sorting the at least one entry in a descending order based on the at least one template distortion cost, first K entries are applied for coding the current video block with NLMC coding mode, K being a positive integer.
[0190] In some embodiments, indexed of the first K entries are excluded from the bitstream. A value of K may be indicated in the bitstream, or may be a default value.
[0191] In some embodiments, the NLMC coding mode is applied in addition to at least one further coding mode, the at least one further coding mode comprising at least one of an intra prediction mode, an inter prediction mode, an intra block copy prediction mode.
[0192] In some embodiments, an indication of usage of the NLMC coding mode is indicated or parsed as a separate prediction mode.
[0193] In some embodiments, an indication of an entry in the list is indicated or parsed by a further syntax element, the further syntax element comprises an entry index.
[0194] In some embodiments, the entry index is coded by one of a fixed-length coding, an Exponential Golomb coding, a truncated unary coding, or a truncated binary coding.
[0195] In some embodiments, a binarization of the entry index is based on the number of entries used in the NLMC coding mode.
[0196] In some embodiments, the binarization comprises one of a fixed length with M bits, or a truncated binary with a parameter equal to N, or a truncated unary with a parameter equal to N, N being the number of entries, and M being determined based on N. For example, the parameter may be cMax.
[0197] In some embodiments, M is equal to floor(log2(N)). log2(N) denotes a function to get a logarithm of N to a base 2, and floor(x) is a function to get a nearest integer upbound of x.
[0198] In some embodiments, N is the number of all entries in the list, or the number of available entries in the list.
[0199] In some embodiments, the NLMC coding mode is used as a motion estimation approach for encoding the current video block into the bitstream. For example, the NLMC may be used as a motion estimation method for encoders. [0200] In some embodiments, at least one entry in the list of non-local motion information is used as at least one additional initial search point before an integer motion search process. [0201] In some embodiments, at least one integer part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
[0202] In some embodiments, at least one integer part and at least one fractional part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
[0203] In some embodiments, the at least one entry is used as at least one additional candidate for at least one further motion estimation approach.
[0204] In some embodiments, an entry in the list is used to generate a prediction block for the current video block, and motion information in the entry is indicated by at least one element associated with an advanced motion vector prediction (AMVP). That is, the motion information of the entry may be signalled by the AMVP manner. For example, instead of signaling MV, a corresponding MVD of the entry may be indicated in the bitstream.
[0205] In some embodiments, the list of non-local motion information is updated with at least one piece of neighboring motion information, the at least one piece of neighboring motion information being not selected by at least one neighboring coding block.
[0206] In some embodiments, a size of the list of non-local motion information is based on a size of the current video block.
[0207] In some embodiments, a plurality of lists of non-local motion information are determined for a plurality of reference pictures for the NLMC coding mode.
[0208] In some embodiments, the list of non-local motion information excludes motion information of an inter-coded block with an inter cost larger than a threshold.
[0209] In some embodiments, the NLMC coding mode is applied to at least one of: an advanced motion vector prediction (AMVP) coding tool, an affine coding tool, an intra block copy (IBC) coding tool, or a further inter-prediction related coding tool.
[0210] In some embodiments, whether to and/or how to apply the method associated with the NLMC coding mode is based on at least one of: video content, a message included in one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit, a position of CU, PU, TU, block, or the video coding unit, a block dimension of the current video block, a block dimension of a neighboring block of the current video block, a block shape of the current video block, a block shape of a neighboring block of the current video block, a quantization parameter of the current video block, an indication of a colour format, a coding tree structure, a slice, a tile group type, a picture type, a colour component, a temporal layer identifier (ID), or profiles, levels, or tiers of a standard.
[0211] In some embodiments, the method is applied to the current video block in response to the current video block belonging to screen content, and/or the method is applied to at least one of a luma component or a chroma component of the current video block.
[0212] In some embodiments, the colour format comprises one of: 4:2:0, 4:4:4, RGB or YUV, and/or wherein the coding tree structure comprises a dural tree structure or a single tree structure.
[0213] In some embodiments, the method associated with the NLMC coding mode is applied to a motion estimation used in at least one of: an encoding process, a pre-analysis process, or a motion compensated temporal filtering (MCTF) process.
[0214] In some embodiments, the current video block comprises at least one of: a colour component, a sub-picture, a slice, a tile, a coding tree unit (CTU), a CTU row, a groups of CTU, a coding unit (CU), a prediction unit (PU), a transform unit (TU), a coding tree block (CTB), a coding block (CB), a prediction block(PB), a transform block (TB), a block, subblock of a block, sub-region within a block, or a region that contains more than one sample or pixel.
[0215] In some embodiments, the conversion comprises encoding the current video block into the bitstream.
[0216] In some embodiments, the conversion comprises decoding the current video block from the bitstream.
[0217] According to further embodiments of the present disclosure, a non-transitory computer-readable recording medium is provided. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. In the method, a list of non-local motion information is determined for a current video block of the video. The current video block is coded with a non-local motion candidate (NLMC) coding mode. The list of non-local motion information is associated with at least one non-adjacent block of the current video block. At least one prediction block is determined for the current video block based on the list of nonlocal motion information. The bitstream is generated based on the at least one prediction block.
[0218] According to still further embodiments of the present disclosure, a method for storing bitstream of a video is provided. In the method, a list of non-local motion information is determined for a current video block of the video. The current video block is coded with a non-local motion candidate (NLMC) coding mode. The list of non-local motion information is associated with at least one non-adjacent block of the current video block. At least one prediction block is determined for the current video block based on the list of nonlocal motion information. The bitstream is generated based on the at least one prediction block. The bitstream is stored in a non-transitory computer-readable recording medium.
[0219] Implementations of the present disclosure can be described in view of the following clauses, the features of which can be combined in any reasonable manner.
[0220] Clause 1. A method for video processing, comprising: determining, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information for the current video block, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and performing the conversion based on the at least one prediction block.
[0221] Clause 2. The method of clause 1, wherein the list of non-local motion information is constructed based on coded information of the current video block, the coded information comprising at least one of: a block size of the current video block, a dimension of the current video block, or a coded mode of the current video block.
[0222] Clause 3. The method of clause 1 or 2, wherein the list of non-local motion information used for the NLMC coding mode comprises at least one entry.
[0223] Clause 4. The method of clause 3, wherein a maximum number of entries in the list of non-local motion information is predefined, or is indicated in the bitstream.
[0224] Clause 5. The method of clause 3 or 4, wherein an entry in the list of the non-local motion information comprises at least one of: at least one motion vector, a reference picture index, or a reference list index.
[0225] Clause 6. The method of clause 5, wherein the at least one motion vector is in an integer precision or a fractional precision.
[0226] Clause 7. The method of any of clauses 3-6, wherein a reference prediction block is determined based on an entry in the list.
[0227] Clause 8. The method of clause 7, wherein the reference prediction block is in a previously coded frame, or a current frame indicated by motion information in the list of non-local motion information. [0228] Clause 9. The method of any of clauses 1-8, further comprising at least one of: generating the list of non-local motion information based on neighboring motion information of the current video block, or updating the list of non-local motion information based on neighboring motion information of the current video block.
[0229] Clause 10. The method of clause 9, wherein a first list of non-local motion information of a first video unit is the same as or different from a second list of non-local motion information of a second video unit.
[0230] Clause 11. The method of clause 10, wherein the first or second video unit comprises one of a block, a coding unit (CU), a coding tree unit (CTU), a CTU row, a slice, a tile, or a group of CUs with a same size.
[0231] Clause 12. The method of any of clauses 1-11, wherein a size of the list of nonlocal motion information is based on coded information in the bitstream, wherein the coded information comprises at least one of a size of the current video block, or a prediction mode of the current video block.
[0232] Clause 13. The method of clause 12, wherein the prediction mode of the current video block comprises at least one of an advanced motion vector prediction (AMVP) mode, or a merge mode.
[0233] Clause 14. The method of any of clauses 1-13, wherein the list of non-local motion information comprises all motion information of inter-coded blocks in a NxM neighboring region, wherein N and M are positive integers.
[0234] Clause 15. The method of any of clauses 1-13, wherein the list of non-local motion information comprises at least one piece of motion information of inter-coded blocks in a NxM neighboring region, wherein N and M are positive integers, wherein the at least one piece of motion information is added as at least one motion candidate in the list in an order until the list is full.
[0235] Clause 16. The method of clause 15, wherein a pruning process is applied to at least one motion candidate before adding the at least one motion candidate into the list.
[0236] Clause 17. The method of any of clauses 1-13, wherein in response to first motion information associated with a block within a NxM neighboring region satisfying a condition, the first motion information is allowed to be added into the list, wherein N and M are positive integers.
[0237] Clause 18. The method of clause 17, wherein the condition comprises at least one of the first motion information being associated with a neighboring block coded in the same mode as the current video block, the same mode being an inter mode or an intra block copy mode, or the first motion information being associated with a neighboring block in the same size as the current video block.
[0238] Clause 19. The method of clause 17, wherein in response to the first motion information being of a merge coded block or a skip coded block, the first motion information is excluded from the list.
[0239] Clause 20. The method of clause 17, wherein a KxL block obtains motion information from the list associated with KxL blocks, K and L being positive integers.
[0240] Clause 21. The method of any of clauses 1-20, wherein one piece of repeated pieces of motion information is included in the list.
[0241] Clause 22. The method of any of clauses 1-21, wherein whether to include second motion information of an inter-coded block in the list is based on at least one of a motion vector magnitude of the second motion information, a reference list of the second motion information, or a reference picture index of the second motion information.
[0242] Clause 23. The method of any of clauses 1-22, wherein a single piece of motion information of a block for a given reference picture list is included in the list.
[0243] Clause 24. The method of any of clauses 1-23, wherein a single piece of motion information of a block for a given reference picture index is included in the list.
[0244] Clause 25. The method of any of clauses 1-24, wherein a first piece of motion information of a block is excluded from the list in response to a difference between the first piece of motion information and a second piece of motion information in the list being less than a threshold.
[0245] Clause 26. The method of clause 25, wherein the difference between the first and second pieces of motion information is less than the threshold in response to at least one of a first reference picture list of the first piece of motion information being same with a second reference picture list of the second piece of motion information, or a first reference picture index of the first piece of motion information being same with a second reference picture of the second piece of motion information.
[0246] Clause 27. The method of clause 25, wherein the difference between the first and second pieces of motion information is less than the threshold in response to a vector difference between a first motion vector of the first piece of motion information and a second motion vector of the second piece of motion information being less than the threshold.
[0247] Clause 28. The method of clause 27, wherein the vector difference of the first and second motion vectors is less than the threshold in response to at least one of a vertical positional difference between the first and second motion vectors being less than the threshold, or a horizontal positional difference between the first and second motion vectors being less than threshold.
[0248] Clause 29. The method of any of clauses 1-28, wherein whether to include third motion information of an inter-coded block in the list is based on pixel values of a reference block indicated by the third motion information.
[0249] Clause 30. The method of clause 29, wherein the third motion information is excluded from the list in response to at least one of: the reference block indicated by the third motion information being the same as a further reference block indicated by motion information in the list, or a difference between the reference block indicated by the third motion information and a further reference block indicated by motion information in the list being less than a threshold.
[0250] Clause 31. The method of clause 30, wherein the difference comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
[0251] Clause 32. The method of any of clauses 1-31, wherein a first list of non-local motion information for a first coding block and a second list of non-local motion information for a second coding block are separate, the first coding block being with first coded information and the second coding block being with second coded information.
[0252] Clause 33. The method of clause 32, wherein the first coding block is coded with a first coding mode and the second coding block is coded with a second coding mode.
[0253] Clause 34. The method of clause 33, wherein the first coding mode comprises one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode, and the second coding mode comprises another one of the AMVP mode, the affine mode or the IBC mode.
[0254] Clause 35. The method of clause 32, wherein a first block size of the first coding block is different from a second block size of the second coding block.
[0255] Clause 36. The method of clause 35, wherein a NxN block is accessible to motion information from the list of non-local motion information corresponding to a NxN size, N being a positive integer.
[0256] Clause 37. The method of any of clauses 1-31, wherein a same list of non-local motion information is constructed for at least one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode.
[0257] Clause 38. The method of any of clauses 1-37, wherein for each of a plurality of NxM regions, a respective list of non-local motion information for NLMC is stored, N and M being positive integers.
[0258] Clause 39. The method of clause 38, wherein determining the list of non-local motion information for the current video block comprises: determining the list of non-local motion information by combining a first list of non-local motion information of a current region which the current video block belongs to and at least one list of at least one neighboring region of the current region.
[0259] Clause 40. The method of clause 39, wherein the at least one neighboring region comprises at least one of: at least one spatial neighboring region, or at least one temporal neighboring region.
[0260] Clause 41. The method of clause 40, wherein the at least one spatial neighboring region comprises at least one of: a top neighboring region, or a left neighboring region.
[0261] Clause 42. The method of clause 40, wherein the at least one temporal neighboring region comprises a collated region.
[0262] Clause 43. The method of any of clauses 39-42, wherein the combining of the at last one list is based on at least one distance between the at least one neighboring region and the current region.
[0263] Clause 44. The method of any of clauses 39-43, wherein the number of the at least one neighboring region is indicated in the bitstream.
[0264] Clause 45. The method of any of clauses 39-43, wherein the number of the at least one neighboring region is inferred to be a predefined value.
[0265] Clause 46. The method of any of clauses 1-45, wherein the list of non-local motion information comprises motion information of at least one inter-coded block in a region in at least one of: a current frame, or at least one previous frame.
[0266] Clause 47. The method of clause 46, wherein the region comprises a video unit, the video unit comprising one of: at least one block, at least one coding unit (CU), at least one coding tree unit (CTU), at least one CTU row, at least one slice, at least one tile, a plurality of CUs with a same size, or a plurality of blocks with a same size.
[0267] Clause 48. The method of clause 46, wherein the region comprises positions from (Cx - M, Cy - N) to (Cx + L, Cy + K) for at least one frame, wherein (Cx, Cy) denotes a position of the current video block, M, N, L and K are integers.
[0268] Clause 49. The method of clause 48, wherein: M and N are equal to 4, or N comprises a predefined value for all quantization parameters (QPs), or N is indicated in the bitstream, or N is based on at least one of: video content, a block dimension of the current video block, a block dimension of a neighboring block, a block shape of the current video block, a block shape of a neighboring block, a QP of the current video block, an indication of a color format, a dual tree or single tree structure, a slice type, a tile group type, a picture type, a color component, a temporal layer identifier, a position of a coding unit (CU), prediction unit (PU), transform unit (TU), a block or a video coding unit, or a message indicated in a video region, the video region being one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit.
[0269] Clause 50. The method of clause 46, wherein a position of a first region for a first frame is different from a position of a second region for a second frame.
[0270] Clause 51. The method of any of clauses 46-50, wherein a first size of a first region for a first frame is the same as or different from a second size of a second region for a second frame.
[0271] Clause 52. The method of any of clauses 46-51, wherein the number of frames to be collected for motion information in the list is set to a predefined value.
[0272] Clause 53. The method of any of clauses 46-52, wherein a pruning process is applied to candidate motion information, the candidate motion information being to be updated to the list of non-local motion information.
[0273] Clause 54. The method of clause 53, wherein the candidate motion information is excluded from the list in response to a reference block indicated by the candidate motion information being the same as a further reference block indicated by existing motion information in the list, or wherein the candidate motion information is excluded from the list in response to a difference between a motion vector indicated by the candidate motion information and a further motion vector indicated by an existing motion information in the list being less than a threshold.
[0274] Clause 55. The method of clause 54, wherein the difference between the motion vector and the further motion vector is less than the threshold in response to at least one of a vertical positional difference between the motion vector and the further motion vector being less than the threshold, or a horizontal positional difference between the motion vector and the further motion vector being less than the threshold.
[0275] Clause 56. The method of clause 52, wherein the candidate motion information is excluded from the list in response to a reference index indicated by the candidate motion information being the same as a further reference index indicated by existing motion information in the list.
[0276] Clause 57. The method of any of clauses 1-56, wherein determining the at least one prediction block for the current video block comprises: determining the at least one prediction block based on at least one entry in the list of non-local motion information.
[0277] Clause 58. The method of clause 57, wherein the at least one prediction block is determined by a weighted prediction combining the at least one entry.
[0278] Clause 59. The method of clause 58, wherein at least one index of the at least one entry is indicated in the bitstream.
[0279] Clause 60. The method of clause 58, wherein at least one index of the at least one entry is inferred to be at least one predefined value.
[0280] Clause 61. The method of any of clauses 57-60, wherein the at least one entry is selected by minimizing a cost between an entry and the current video block.
[0281] Clause 62. The method of clause 61, wherein the at least one entry comprises N entries with minimal costs, N being an integer equal to 1 or greater than 1.
[0282] Clause 63. The method of clause 61, wherein the cost comprises at least one of: a rate distortion cost between an entry and the current video block, or a distortion between an entry and the current video block.
[0283] Clause 64. The method of clause 63, wherein the cost comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
[0284] Clause 65. The method of any of clauses 57-64, wherein the at least one entry in the list of non-local motion information comprises N entries, N being a positive integer, and determining the at least one prediction block comprises: for an ith entry in the list, i being larger than or equal to 0 and less than or equal to N-l, checking the ith entry and determining an ith prediction block for the current video block based on a criterion.
[0285] Clause 66. The method of clause 65, wherein performing the conversion comprises: determining a residual block between the current video block and the ith prediction block; and applying at least one of the following to the residual block: a transform, a quantization, or an entropy coded.
[0286] Clause 67. The method of any of clauses 57-66, wherein the at least one entry is sorted before being used to determine the at least one prediction block.
[0287] Clause 68. The method of clause 67, wherein the sorting of the at least one entry is based on at least one of: at least one template distortion cost, or at least one distortion between at least one template of the at least one entry and a current template of the current video block.
[0288] Clause 69. The method of clause 68, wherein a size of the current video block is SlxS2, and the current template comprises a MxN region excluding a region of the current video block, wherein M is larger than SI and N is larger than S2, M, N, SI and S2 being positive integers.
[0289] Clause 70. The method of clause 68, wherein a distortion of the at least one distortion comprises a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE) between two templates.
[0290] Clause 71. The method of clause 68, wherein an entry in the list comprises a block in a reconstructed region and a template of the block.
[0291] Clause 72. The method of clause 68, wherein the at least one entry is sorted in an ascending or descending order based on the at least one template distortion cost.
[0292] Clause 73. The method of clause 72, wherein after sorting the at least one entry in a descending order based on the at least one template distortion cost, first K entries are applied for coding the current video block with NLMC coding mode, K being a positive integer.
[0293] Clause 74. The method of clause 73, wherein indexed of the first K entries are excluded from the bitstream.
[0294] Clause 75. The method of clause 73, wherein a value of K is indicated in the bitstream, or is a default value.
[0295] Clause 76. The method of clause 75, wherein the NLMC coding mode is applied in addition to at least one further coding mode, the at least one further coding mode comprising at least one of an intra prediction mode, an inter prediction mode, an intra block copy prediction mode.
[0296] Clause 77. The method of clause 76, wherein an indication of usage of the NLMC coding mode is indicated or parsed as a separate prediction mode.
[0297] Clause 78. The method of clause 77, wherein an indication of an entry in the list is indicated or parsed by a further syntax element, the further syntax element comprises an entry index.
[0298] Clause 79. The method of clause 78, wherein the entry index being coded by one of a fixed-length coding, an Exponential Golomb coding, a truncated unary coding, or a truncated binary coding.
[0299] Clause 80. The method of clause 78, wherein a binarization of the entry index is based on the number of entries used in the NLMC coding mode.
[0300] Clause 81. The method of clause 80, wherein the binarization comprises one of: a fixed length with M bits, or a truncated binary with a parameter equal to N, or a truncated unary with a parameter equal to N, N being the number of entries, and M being determined based on N.
[0301] Clause 82. The method of clause 81, wherein M is equal to floor(log2(N)), wherein log2(N) denotes a function to get a logarithm of N to a base 2, and floor(x) is a function to get a nearest integer upbound of x.
[0302] Clause 83. The method of clause 81, wherein N is the number of all entries in the list, or the number of available entries in the list.
[0303] Clause 84. The method of any of clauses 1-83, wherein the NLMC coding mode is used as a motion estimation approach for encoding the current video block into the bitstream. [0304] Clause 85. The method of clause 84, wherein at least one entry in the list of nonlocal motion information is used as at least one additional initial search point before an integer motion search process.
[0305] Clause 86. The method of clause 85, wherein at least one integer part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
[0306] Clause 87. The method of clause 85, wherein at least one integer part and at least one fractional part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
[0307] Clause 88. The method of clause 84, wherein the at least one entry is used as at least one additional candidate for at least one further motion estimation approach.
[0308] Clause 89. The method of any of clauses 84-88, wherein an entry in the list is used to generate a prediction block for the current video block, and motion information in the entry being indicated by at least one element associated with an advanced motion vector prediction (AM VP).
[0309] Clause 90. The method of any of clauses 1-89, wherein the list of non-local motion information is updated with at least one piece of neighboring motion information, the at least one piece of neighboring motion information being not selected by at least one neighboring coding block.
[0310] Clause 91. The method of any of clauses 1-90, wherein a size of the list of nonlocal motion information is based on a size of the current video block.
[0311] Clause 92. The method of any of clauses 1-91, wherein a plurality of lists of non- local motion information are determined for a plurality of reference pictures for the NLMC coding mode.
[0312] Clause 93. The method of any of clauses 1-92, wherein the list of non-local motion information excludes motion information of an inter-coded block with an inter cost larger than a threshold.
[0313] Clause 94. The method of any of clauses 1-93, wherein the NLMC coding mode is applied to at least one of: an advanced motion vector prediction (AMVP) coding tool, an affine coding tool, an intra block copy (IBC) coding tool, or a further inter-prediction related coding tool.
[0314] Clause 95. The method of any of clauses 1-94, wherein whether to and/or how to apply the method associated with the NLMC coding mode is based on at least one of: video content, a message included in one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit, a position of CU, PU, TU, block, or the video coding unit, a block dimension of the current video block, a block dimension of a neighboring block of the current video block, a block shape of the current video block, a block shape of a neighboring block of the current video block, a quantization parameter of the current video block, an indication of a colour format, a coding tree structure, a slice, a tile group type, a picture type, a colour component, a temporal layer identifier (ID), or profiles, levels, or tiers of a standard.
[0315] Clause 96. The method of clause 95, wherein the method is applied to the current video block in response to the current video block belonging to screen content, and/or wherein the method is applied to at least one of a luma component or a chroma component of the current video block.
[0316] Clause 97. The method of clause 95, wherein the colour format comprises one of: 4:2:0, 4:4:4, RGB or YUV, and/or wherein the coding tree structure comprises a dural tree structure or a single tree structure.
[0317] Clause 98. The method of any of clauses 1-97, wherein the method associated with the NLMC coding mode is applied to a motion estimation used in at least one of: an encoding process, a pre-analysis process, or a motion compensated temporal filtering (MCTF) process. [0318] Clause 99. The method of any of clauses 1-98, wherein the current video block comprises at least one of: a colour component, a sub-picture, a slice, a tile, a coding tree unit (CTU), a CTU row, a groups of CTU, a coding unit (CU), a prediction unit (PU), a transform unit (TU), a coding tree block (CTB), a coding block (CB), a prediction block(PB), a transform block (TB), a block, sub-block of a block, sub-region within a block, or a region that contains more than one sample or pixel.
[0319] Clause 100. The method of any of clauses 1-99, wherein the conversion comprises encoding the current video block into the bitstream.
[0320] Clause 101. The method of any of clauses 1-99, wherein the conversion comprises decoding the current video block from the bitstream.
[0321] Clause 102. An apparatus for video processing comprising a processor and a non- transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform a method in accordance with any of clauses 1-101. [0322] Clause 103. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method in accordance with any of clauses 1- 101.
[0323] Clause 104. A non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method performed by an apparatus for video processing, wherein the method comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and generating the bitstream based on the at least one prediction block.
[0324] Clause 105. A method for storing a bitstream of a video, comprising: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; generating the bitstream based on the at least one prediction block; and storing the bitstream in a non-transitory computer-readable recording medium.
Example Device
[0325] Fig. 18 illustrates a block diagram of a computing device 1800 in which various embodiments of the present disclosure can be implemented. The computing device 1800 may be implemented as or included in the source device 110 (or the video encoder 114 or 200) or the destination device 120 (or the video decoder 124 or 300).
[0326] It would be appreciated that the computing device 1800 shown in Fig. 18 is merely for purpose of illustration, without suggesting any limitation to the functions and scopes of the embodiments of the present disclosure in any manner.
[0327] As shown in Fig. 18, the computing device 1800 includes a general -purpose computing device 1800. The computing device 1800 may at least comprise one or more processors or processing units 1810, a memory 1820, a storage unit 1830, one or more communication units 1840, one or more input devices 1850, and one or more output devices 1860.
[0328] In some embodiments, the computing device 1800 may be implemented as any user terminal or server terminal having the computing capability. The server terminal may be a server, a large-scale computing device or the like that is provided by a service provider. The user terminal may for example be any type of mobile terminal, fixed terminal, or portable terminal, including a mobile phone, station, unit, device, multimedia computer, multimedia tablet, Internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system (PCS) device, personal navigation device, personal digital assistant (PDA), audio/video player, digital camera/video camera, positioning device, television receiver, radio broadcast receiver, E-book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It would be contemplated that the computing device 1800 can support any type of interface to a user (such as “wearable” circuitry and the like).
[0329] The processing unit 1810 may be a physical or virtual processor and can implement various processes based on programs stored in the memory 1820. In a multi-processor system, multiple processing units execute computer executable instructions in parallel so as to improve the parallel processing capability of the computing device 1800. The processing unit 1810 may also be referred to as a central processing unit (CPU), a microprocessor, a controller or a microcontroller.
[0330] The computing device 1800 typically includes various computer storage medium. Such medium can be any medium accessible by the computing device 1800, including, but not limited to, volatile and non-volatile medium, or detachable and non-detachable medium. The memory 1820 can be a volatile memory (for example, a register, cache, Random Access Memory (RAM)), a non-volatile memory (such as a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), or a flash memory), or any combination thereof. The storage unit 1830 may be any detachable or non-detachable medium and may include a machine-readable medium such as a memory, flash memory drive, magnetic disk or another other media, which can be used for storing information and/or data and can be accessed in the computing device 1800.
[0331] The computing device 1800 may further include additional detachable/non- detachable, volatile/non-volatile memory medium. Although not shown in Fig. 18, it is possible to provide a magnetic disk drive for reading from and/or writing into a detachable and non-volatile magnetic disk and an optical disk drive for reading from and/or writing into a detachable non-volatile optical disk. In such cases, each drive may be connected to a bus (not shown) via one or more data medium interfaces.
[0332] The communication unit 1840 communicates with a further computing device via the communication medium. In addition, the functions of the components in the computing device 1800 can be implemented by a single computing cluster or multiple computing machines that can communicate via communication connections. Therefore, the computing device 1800 can operate in a networked environment using a logical connection with one or more other servers, networked personal computers (PCs) or further general network nodes. [0333] The input device 1850 may be one or more of a variety of input devices, such as a mouse, keyboard, tracking ball, voice-input device, and the like. The output device 1860 may be one or more of a variety of output devices, such as a display, loudspeaker, printer, and the like. By means of the communication unit 1840, the computing device 1800 can further communicate with one or more external devices (not shown) such as the storage devices and display device, with one or more devices enabling the user to interact with the computing device 1800, or any devices (such as a network card, a modem and the like) enabling the computing device 1800 to communicate with one or more other computing devices, if required. Such communication can be performed via input/output (I/O) interfaces (not shown).
[0334] In some embodiments, instead of being integrated in a single device, some or all components of the computing device 1800 may also be arranged in cloud computing architecture. In the cloud computing architecture, the components may be provided remotely and work together to implement the functionalities described in the present disclosure. In some embodiments, cloud computing provides computing, software, data access and storage service, which will not require end users to be aware of the physical locations or configurations of the systems or hardware providing these services. In various embodiments, the cloud computing provides the services via a wide area network (such as Internet) using suitable protocols. For example, a cloud computing provider provides applications over the wide area network, which can be accessed through a web browser or any other computing components. The software or components of the cloud computing architecture and corresponding data may be stored on a server at a remote position. The computing resources in the cloud computing environment may be merged or distributed at locations in a remote data center. Cloud computing infrastructures may provide the services through a shared data center, though they behave as a single access point for the users. Therefore, the cloud computing architectures may be used to provide the components and functionalities described herein from a service provider at a remote location. Alternatively, they may be provided from a conventional server or installed directly or otherwise on a client device.
[0335] The computing device 1800 may be used to implement video encoding/decoding in embodiments of the present disclosure. The memory 1820 may include one or more video coding modules 1825 having one or more program instructions. These modules are accessible and executable by the processing unit 1810 to perform the functionalities of the various embodiments described herein.
[0336] In the example embodiments of performing video encoding, the input device 1850 may receive video data as an input 1870 to be encoded. The video data may be processed, for example, by the video coding module 1825, to generate an encoded bitstream. The encoded bitstream may be provided via the output device 1860 as an output 1880.
[0337] In the example embodiments of performing video decoding, the input device 1850 may receive an encoded bitstream as the input 1870. The encoded bitstream may be processed, for example, by the video coding module 1825, to generate decoded video data. The decoded video data may be provided via the output device 1860 as the output 1880.
[0338] While this disclosure has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application as defined by the appended claims. Such variations are intended to be covered by the scope of this present application. As such, the foregoing description of embodiments of the present application is not intended to be limiting.

Claims

WHAT IS CLAIMED IS:
1. A method for video processing, comprising: determining, for a conversion between a current video block of a video and a bitstream of the video, a list of non-local motion information for the current video block, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and performing the conversion based on the at least one prediction block.
2. The method of claim 1, wherein the list of non-local motion information is constructed based on coded information of the current video block, the coded information comprising at least one of: a block size of the current video block, a dimension of the current video block, or a coded mode of the current video block.
3. The method of claim 1 or 2, wherein the list of non-local motion information used for the NLMC coding mode comprises at least one entry.
4. The method of claim 3, wherein a maximum number of entries in the list of non-local motion information is predefined, or is indicated in the bitstream.
5. The method of claim 3 or 4, wherein an entry in the list of the non-local motion information comprises at least one of: at least one motion vector, a reference picture index, or a reference list index.
6. The method of claim 5, wherein the at least one motion vector is in an integer precision or a fractional precision.
7. The method of any of claims 3-6, wherein a reference prediction block is determined based on an entry in the list.
8. The method of claim 7, wherein the reference prediction block is in a previously coded frame, or a current frame indicated by motion information in the list of non-local motion information.
9. The method of any of claims 1-8, further comprising at least one of: generating the list of non-local motion information based on neighboring motion information of the current video block, or updating the list of non-local motion information based on neighboring motion information of the current video block.
10. The method of claim 9, wherein a first list of non-local motion information of a first video unit is the same as or different from a second list of non-local motion information of a second video unit.
11. The method of claim 10, wherein the first or second video unit comprises one of: a block, a coding unit (CU), a coding tree unit (CTU), a CTU row, a slice, a tile, or a group of CUs with a same size.
12. The method of any of claims 1-11, wherein a size of the list of non-local motion information is based on coded information in the bitstream, wherein the coded information comprises at least one of: a size of the current video block, or a prediction mode of the current video block.
13. The method of claim 12, wherein the prediction mode of the current video block comprises at least one of: an advanced motion vector prediction (AMVP) mode, or a merge mode.
14. The method of any of claims 1-13, wherein the list of non-local motion information comprises all motion information of inter-coded blocks in a NxM neighboring region, wherein N and M are positive integers.
15. The method of any of claims 1-13, wherein the list of non-local motion information comprises at least one piece of motion information of inter-coded blocks in a NxM neighboring region, wherein N and M are positive integers, wherein the at least one piece of motion information is added as at least one motion candidate in the list in an order until the list is full.
16. The method of claim 15, wherein a pruning process is applied to at least one motion candidate before adding the at least one motion candidate into the list.
17. The method of any of claims 1-13, wherein in response to first motion information associated with a block within a NxM neighboring region satisfying a condition, the first motion information is allowed to be added into the list, wherein N and M are positive integers.
18. The method of claim 17, wherein the condition comprises at least one of: the first motion information being associated with a neighboring block coded in the same mode as the current video block, the same mode being an inter mode or an intra block copy mode, or the first motion information being associated with a neighboring block in the same size as the current video block.
19. The method of claim 17, wherein in response to the first motion information being of a merge coded block or a skip coded block, the first motion information is excluded from the list.
20. The method of claim 17, wherein a KxL block obtains motion information from the list associated with KxL blocks, K and L being positive integers.
21. The method of any of claims 1-20, wherein one piece of repeated pieces of motion information is included in the list.
22. The method of any of claims 1-21, wherein whether to include second motion information of an inter-coded block in the list is based on at least one of: a motion vector magnitude of the second motion information, a reference list of the second motion information, or a reference picture index of the second motion information.
23. The method of any of claims 1-22, wherein a single piece of motion information of a block for a given reference picture list is included in the list.
24. The method of any of claims 1-23, wherein a single piece of motion information of a block for a given reference picture index is included in the list.
25. The method of any of claims 1-24, wherein a first piece of motion information of a block is excluded from the list in response to a difference between the first piece of motion information and a second piece of motion information in the list being less than a threshold.
26. The method of claim 25, wherein the difference between the first and second pieces of motion information is less than the threshold in response to at least one of: a first reference picture list of the first piece of motion information being same with a second reference picture list of the second piece of motion information, or a first reference picture index of the first piece of motion information being same with a second reference picture of the second piece of motion information.
27. The method of claim 25, wherein the difference between the first and second pieces of motion information is less than the threshold in response to a vector difference between a first motion vector of the first piece of motion information and a second motion vector of the second piece of motion information being less than the threshold.
28. The method of claim 27, wherein the vector difference of the first and second motion vectors is less than the threshold in response to at least one of: a vertical positional difference between the first and second motion vectors being less than the threshold, or a horizontal positional difference between the first and second motion vectors being less than threshold.
29. The method of any of claims 1-28, wherein whether to include third motion information of an inter-coded block in the list is based on pixel values of a reference block indicated by the third motion information.
30. The method of claim 29, wherein the third motion information is excluded from the list in response to at least one of: the reference block indicated by the third motion information being the same as a further reference block indicated by motion information in the list, or a difference between the reference block indicated by the third motion information and a further reference block indicated by motion information in the list being less than a threshold.
31. The method of claim 30, wherein the difference comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
32. The method of any of claims 1-31, wherein a first list of non-local motion information for a first coding block and a second list of non-local motion information for a second coding block are separate, the first coding block being with first coded information and the second coding block being with second coded information.
33. The method of claim 32, wherein the first coding block is coded with a first coding mode and the second coding block is coded with a second coding mode.
34. The method of claim 33, wherein the first coding mode comprises one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode, and the second coding mode comprises another one of the AMVP mode, the affine mode or the IBC mode.
35. The method of claim 32, wherein a first block size of the first coding block is different from a second block size of the second coding block.
36. The method of claim 35, wherein a NxN block is accessible to motion information from the list of non-local motion information corresponding to a NxN size, N being a positive integer.
37. The method of any of claims 1-31, wherein a same list of non-local motion information is constructed for at least one of: an advanced motion vector prediction (AMVP) mode, an affine mode, or an intra block copy (IBC) mode.
38. The method of any of claims 1-37, wherein for each of a plurality of NxM regions, a respective list of non-local motion information for NLMC is stored, N and M being positive integers.
39. The method of claim 38, wherein determining the list of non-local motion information for the current video block comprises: determining the list of non-local motion information by combining a first list of non-local motion information of a current region which the current video block belongs to and at least one list of at least one neighboring region of the current region.
40. The method of claim 39, wherein the at least one neighboring region comprises at least one of: at least one spatial neighboring region, or at least one temporal neighboring region.
41. The method of claim 40, wherein the at least one spatial neighboring region comprises at least one of: a top neighboring region, or a left neighboring region.
42. The method of claim 40, wherein the at least one temporal neighboring region comprises a collated region.
43. The method of any of claims 39-42, wherein the combining of the at last one list is based on at least one distance between the at least one neighboring region and the current region.
44. The method of any of claims 39-43, wherein the number of the at least one neighboring region is indicated in the bitstream.
45. The method of any of claims 39-43, wherein the number of the at least one neighboring region is inferred to be a predefined value.
46. The method of any of claims 1-45, wherein the list of non-local motion information comprises motion information of at least one inter-coded block in a region in at least one of: a current frame, or at least one previous frame.
47. The method of claim 46, wherein the region comprises a video unit, the video unit comprising one of: at least one block, at least one coding unit (CU), at least one coding tree unit (CTU), at least one CTU row, at least one slice, at least one tile, a plurality of CUs with a same size, or a plurality of blocks with a same size.
48. The method of claim 46, wherein the region comprises positions from (Cx - M, Cy - N) to (Cx + L, Cy + K) for at least one frame, wherein (Cx, Cy) denotes a position of the current video block, M, N, L and K are integers.
49. The method of claim 48, wherein:
M and N are equal to 4, or
N comprises a predefined value for all quantization parameters (QPs), or
N is indicated in the bitstream, or
N is based on at least one of: video content, a block dimension of the current video block, a block dimension of a neighboring block, a block shape of the current video block, a block shape of a neighboring block, a QP of the current video block, an indication of a color format, a dual tree or single tree structure, a slice type, a tile group type, a picture type, a color component, a temporal layer identifier, a position of a coding unit (CU), prediction unit (PU), transform unit (TU), a block or a video coding unit, or a message indicated in a video region, the video region being one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit.
50. The method of claim 46, wherein a position of a first region for a first frame is different from a position of a second region for a second frame.
51. The method of any of claims 46-50, wherein a first size of a first region for a first frame is the same as or different from a second size of a second region for a second frame.
52. The method of any of claims 46-51, wherein the number of frames to be collected for motion information in the list is set to a predefined value.
53. The method of any of claims 46-52, wherein a pruning process is applied to candidate motion information, the candidate motion information being to be updated to the list of non-local motion information.
54. The method of claim 53, wherein the candidate motion information is excluded from the list in response to a reference block indicated by the candidate motion information being the same as a further reference block indicated by existing motion information in the list, or wherein the candidate motion information is excluded from the list in response to a difference between a motion vector indicated by the candidate motion information and a further motion vector indicated by an existing motion information in the list being less than a threshold.
55. The method of claim 54, wherein the difference between the motion vector and the further motion vector is less than the threshold in response to at least one of: a vertical positional difference between the motion vector and the further motion vector being less than the threshold, or a horizontal positional difference between the motion vector and the further motion vector being less than the threshold.
56. The method of claim 52, wherein the candidate motion information is excluded from the list in response to a reference index indicated by the candidate motion information being the same as a further reference index indicated by existing motion information in the list.
57. The method of any of claims 1-56, wherein determining the at least one prediction block for the current video block comprises: determining the at least one prediction block based on at least one entry in the list of nonlocal motion information.
58. The method of claim 57, wherein the at least one prediction block is determined by a weighted prediction combining the at least one entry.
59. The method of claim 58, wherein at least one index of the at least one entry is indicated in the bitstream.
60. The method of claim 58, wherein at least one index of the at least one entry is inferred to be at least one predefined value.
61. The method of any of claims 57-60, wherein the at least one entry is selected by minimizing a cost between an entry and the current video block.
62. The method of claim 61, wherein the at least one entry comprises N entries with minimal costs, N being an integer equal to 1 or greater than 1.
63. The method of claim 61, wherein the cost comprises at least one of: a rate distortion cost between an entry and the current video block, or a distortion between an entry and the current video block.
64. The method of claim 63, wherein the cost comprises at least one of: a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE).
65. The method of any of claims 57-64, wherein the at least one entry in the list of nonlocal motion information comprises N entries, N being a positive integer, and determining the at least one prediction block comprises: for an ith entry in the list, i being larger than or equal to 0 and less than or equal to N-l, checking the ith entry and determining an ith prediction block for the current video block based on a criterion.
66. The method of claim 65, wherein performing the conversion comprises: determining a residual block between the current video block and the ith prediction block; and applying at least one of the following to the residual block: a transform, a quantization, or an entropy coded.
67. The method of any of claims 57-66, wherein the at least one entry is sorted before being used to determine the at least one prediction block.
68. The method of claim 67, wherein the sorting of the at least one entry is based on at least one of: at least one template distortion cost, or at least one distortion between at least one template of the at least one entry and a current template of the current video block.
69. The method of claim 68, wherein a size of the current video block is SlxS2, and the current template comprises a MxN region excluding a region of the current video block, wherein M is larger than SI and N is larger than S2, M, N, SI and S2 being positive integers.
70. The method of claim 68, wherein a distortion of the at least one distortion comprises a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), or a mean squared error (MSE) between two templates.
71. The method of claim 68, wherein an entry in the list comprises a block in a reconstructed region and a template of the block.
72. The method of claim 68, wherein the at least one entry is sorted in an ascending or descending order based on the at least one template distortion cost.
73. The method of claim 72, wherein after sorting the at least one entry in a descending order based on the at least one template distortion cost, first K entries are applied for coding the current video block with NLMC coding mode, K being a positive integer.
74. The method of claim 73, wherein indexed of the first K entries are excluded from the bitstream.
75. The method of claim 73, wherein a value of K is indicated in the bitstream, or is a default value.
76. The method of claim 75, wherein the NLMC coding mode is applied in addition to at least one further coding mode, the at least one further coding mode comprising at least one of: an intra prediction mode, an inter prediction mode, an intra block copy prediction mode.
77. The method of claim 76, wherein an indication of usage of the NLMC coding mode is indicated or parsed as a separate prediction mode.
78. The method of claim 77, wherein an indication of an entry in the list is indicated or parsed by a further syntax element, the further syntax element comprises an entry index.
79. The method of claim 78, wherein the entry index being coded by one of: a fixed-length coding, an Exponential Golomb coding, a truncated unary coding, or a truncated binary coding.
80. The method of claim 78, wherein a binarization of the entry index is based on the number of entries used in the NLMC coding mode.
81. The method of claim 80, wherein the binarization comprises one of: a fixed length with M bits, or a truncated binary with a parameter equal to N, or a truncated unary with a parameter equal to N, N being the number of entries, and M being determined based on N.
82. The method of claim 81, wherein M is equal to floor(log2(N)), wherein log2(N) denotes a function to get a logarithm of N to a base 2, and floor(x) is a function to get a nearest integer upbound of x.
83. The method of claim 81 , wherein N is the number of all entries in the list, or the number of available entries in the list.
84. The method of any of claims 1-83, wherein the NLMC coding mode is used as a motion estimation approach for encoding the current video block into the bitstream.
85. The method of claim 84, wherein at least one entry in the list of non-local motion information is used as at least one additional initial search point before an integer motion search process.
86. The method of claim 85, wherein at least one integer part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
87. The method of claim 85, wherein at least one integer part and at least one fractional part of at least one motion vector in the at least one entry is used as the at least one additional initial search point.
88. The method of claim 84, wherein the at least one entry is used as at least one additional candidate for at least one further motion estimation approach.
89. The method of any of claims 84-88, wherein an entry in the list is used to generate a prediction block for the current video block, and motion information in the entry being indicated by at least one element associated with an advanced motion vector prediction (AMVP).
90. The method of any of claims 1-89, wherein the list of non-local motion information is updated with at least one piece of neighboring motion information, the at least one piece of neighboring motion information being not selected by at least one neighboring coding block.
91. The method of any of claims 1-90, wherein a size of the list of non-local motion information is based on a size of the current video block.
92. The method of any of claims 1-91, wherein a plurality of lists of non-local motion information are determined for a plurality of reference pictures for the NLMC coding mode.
93. The method of any of claims 1-92, wherein the list of non-local motion information excludes motion information of an inter-coded block with an inter cost larger than a threshold.
94. The method of any of claims 1-93, wherein the NLMC coding mode is applied to at least one of: an advanced motion vector prediction (AMVP) coding tool, an affine coding tool, an intra block copy (IBC) coding tool, or a further inter-prediction related coding tool.
95. The method of any of claims 1-94, wherein whether to and/or how to apply the method associated with the NLMC coding mode is based on at least one of: video content, a message included in one of: a dependency parameter set (DPS), a sequence parameter set (SPS), a video parameter set (VPS), a picture parameter set (PPS), an adaptation parameter sets (APS), a picture header, a slice header, a tile group header, a largest coding unit (LCU), a coding unit (CU), a LCU row, a group of LCUs, a transform unit (TU), a prediction unit (PU) block, or a video coding unit, a position of CU, PU, TU, block, or the video coding unit, a block dimension of the current video block, a block dimension of a neighboring block of the current video block, a block shape of the current video block, a block shape of a neighboring block of the current video block, a quantization parameter of the current video block, an indication of a colour format, a coding tree structure, a slice, a tile group type, a picture type, a colour component, a temporal layer identifier (ID), or profiles, levels, or tiers of a standard.
96. The method of claim 95, wherein the method is applied to the current video block in response to the current video block belonging to screen content, and/or wherein the method is applied to at least one of a luma component or a chroma component of the current video block.
97. The method of claim 95, wherein the colour format comprises one of: 4:2:0, 4:4:4, RGB or YUV, and/or wherein the coding tree structure comprises a dural tree structure or a single tree structure.
98. The method of any of claims 1-97, wherein the method associated with the NLMC coding mode is applied to a motion estimation used in at least one of: an encoding process, a preanalysis process, or a motion compensated temporal filtering (MCTF) process.
99. The method of any of claims 1-98, wherein the current video block comprises at least one of: a colour component, a sub-picture, a slice, a tile, a coding tree unit (CTU), a CTU row, a groups of CTU, a coding unit (CU), a prediction unit (PU), a transform unit (TU), a coding tree block (CTB), a coding block (CB), a prediction block(PB), a transform block (TB), a block, subblock of a block, sub-region within a block, or a region that contains more than one sample or pixel.
100. The method of any of claims 1-99, wherein the conversion comprises encoding the current video block into the bitstream.
101. The method of any of claims 1-99, wherein the conversion comprises decoding the current video block from the bitstream.
102. An apparatus for video processing comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform a method in accordance with any of claims 1-101.
103. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method in accordance with any of claims 1-101.
104. A non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method performed by an apparatus for video processing, wherein the method comprises: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; and generating the bitstream based on the at least one prediction block.
105. A method for storing a bitstream of a video, comprising: determining a list of non-local motion information for a current video block of the video, the current video block being coded with a non-local motion candidate (NLMC) coding mode, the list of non-local motion information being associated with at least one non-adjacent block of the current video block; determining at least one prediction block for the current video block based on the list of non-local motion information; generating the bitstream based on the at least one prediction block; and storing the bitstream in a non-transitory computer-readable recording medium.
PCT/US2025/026362 2024-04-26 2025-04-25 Method, apparatus, and medium for video processing Pending WO2025227030A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463639451P 2024-04-26 2024-04-26
US63/639,451 2024-04-26

Publications (1)

Publication Number Publication Date
WO2025227030A1 true WO2025227030A1 (en) 2025-10-30

Family

ID=97491036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2025/026362 Pending WO2025227030A1 (en) 2024-04-26 2025-04-25 Method, apparatus, and medium for video processing

Country Status (1)

Country Link
WO (1) WO2025227030A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110007800A1 (en) * 2008-01-10 2011-01-13 Thomson Licensing Methods and apparatus for illumination compensation of intra-predicted video
WO2020227892A1 (en) * 2019-05-13 2020-11-19 Alibaba Group Holding Limited Non-local intra block copy and parallel coding for motion prediction
US20210287417A1 (en) * 2020-03-15 2021-09-16 Intel Corporation Apparatus and method for performing non-local means filtering using motion estimation circuitry of a graphics processor
US20230051412A1 (en) * 2019-12-27 2023-02-16 Koninklijke Kpn N.V. Motion vector prediction for video coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110007800A1 (en) * 2008-01-10 2011-01-13 Thomson Licensing Methods and apparatus for illumination compensation of intra-predicted video
WO2020227892A1 (en) * 2019-05-13 2020-11-19 Alibaba Group Holding Limited Non-local intra block copy and parallel coding for motion prediction
US20230051412A1 (en) * 2019-12-27 2023-02-16 Koninklijke Kpn N.V. Motion vector prediction for video coding
US20210287417A1 (en) * 2020-03-15 2021-09-16 Intel Corporation Apparatus and method for performing non-local means filtering using motion estimation circuitry of a graphics processor

Similar Documents

Publication Publication Date Title
US20250287017A1 (en) Method, apparatus, and medium for video processing
US20240259608A1 (en) Method, apparatus, and medium for video processing
US20250274589A1 (en) Method, apparatus, and medium for video processing
WO2024046479A1 (en) Method, apparatus, and medium for video processing
WO2024002185A9 (en) Method, apparatus, and medium for video processing
WO2025227030A1 (en) Method, apparatus, and medium for video processing
WO2024213142A1 (en) Method, apparatus, and medium for video processing
US20260039804A1 (en) Method, apparatus, and medium for video processing
WO2023131047A1 (en) Method, apparatus, and medium for video processing
WO2024032671A1 (en) Method, apparatus, and medium for video processing
WO2024078630A1 (en) Method, apparatus, and medium for video processing
WO2024251177A1 (en) Method, apparatus, and medium for video processing
WO2024067638A9 (en) Method, apparatus, and medium for video processing
WO2024199399A1 (en) Method, apparatus, and medium for video processing
WO2025087419A1 (en) Method, apparatus, and medium for video processing
WO2023197966A1 (en) Method, apparatus, and medium for video processing
WO2025082516A1 (en) Method, apparatus, and medium for video processing
WO2025087415A1 (en) Method, apparatus, and medium for video processing
US20250150575A1 (en) Method, apparatus, and medium for video processing
WO2025124508A1 (en) Method, apparatus, and medium for video processing
WO2025214237A1 (en) Method, apparatus, and medium for video processing
WO2024131851A9 (en) Method, apparatus, and medium for video processing
WO2024169970A1 (en) Method, apparatus, and medium for video processing
WO2024149267A1 (en) Method, apparatus, and medium for video processing
WO2025087416A1 (en) Method, apparatus, and medium for video processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25795222

Country of ref document: EP

Kind code of ref document: A1