CN121336406A

CN121336406A - Methods, apparatus and recording media for encoding/decoding images

Info

Publication number: CN121336406A
Application number: CN202480038254.4A
Authority: CN
Inventors: 金东炫; 金钟昊; 林成昶; 林雄; 崔振秀; 姜晶媛
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2023-07-04
Filing date: 2024-07-04
Publication date: 2026-01-13
Also published as: WO2025009915A1; KR20250006739A

Abstract

An image decoding method for predicting a target block within a target picture in a sub-block unit is disclosed. The method includes determining a motion information offset by decoding offset information from a bitstream, deriving a first motion shift based on one or more pre-reconstructed blocks decoded prior to a target block, determining a second motion shift based on the first motion shift and the motion information offset, determining a reference block at a position indicated by the second motion shift within a col picture selected from a reference picture list, and deriving motion information for each sub-block within the target block using motion information for a corresponding sub-block in the reference block.

Description

Method, apparatus and recording medium for encoding/decoding image

Technical Field

The present disclosure relates to a method, apparatus, and recording medium for encoding/decoding an image. In one aspect, the present disclosure provides a method for predicting a target block in sub-block units.

Background

With the continued development of the information and communication industry, services for providing video through broadcasting, the internet, etc. have been expanding globally.

Users demand video with higher resolution and higher image quality. In order to meet the needs of users, image encoding/decoding techniques suitable for such video are required. Image encoding techniques may generate compressed video by compressing video indicative of an image to have a smaller amount of data. Image decoding techniques may use compressed video to generate reconstructed images.

Regarding image encoding/decoding techniques, there are various techniques such as partition techniques, prediction techniques, transformation techniques, quantization techniques, filtering techniques, and entropy encoding/decoding techniques. By introducing, modifying, enhancing, and combining these various techniques, video and images can be more efficiently compressed, transmitted, and stored.

Disclosure of Invention

Technical problem

An aspect of the present disclosure is to provide a codec tool for predicting a target block in a sub-block unit to improve codec efficiency.

Technical proposal

An aspect of the present invention provides an image decoding method for predicting a target block within a target picture in a sub-block unit. The method includes determining a motion information offset by decoding offset information from a bitstream, deriving a first motion shift based on one or more pre-reconstructed blocks decoded prior to a target block, determining a second motion shift based on the first motion shift and the motion information offset, determining a reference block at a position indicated by the second motion shift within a co-located (col) picture selected from a reference picture list, and deriving motion information for each sub-block within the target block using motion information for a corresponding sub-block in the reference block.

Another aspect of the present invention provides an image encoding method for predicting a target block within a target picture in a sub-block unit. The method includes determining a motion information offset and encoding offset information indicating the motion information offset, deriving a first motion shift based on one or more pre-reconstructed blocks decoded prior to a target block, determining a second motion shift based on the first motion shift and the motion information offset, determining a reference block at a position indicated by the second motion shift within a col picture selected from a reference picture list, and deriving motion information for each sub-block within the target block using motion information for a corresponding sub-block in the reference block.

Another aspect of the present invention provides a method for transmitting a bitstream generated by the above video encoding method.

Another aspect of the present invention provides a computer readable recording medium storing a bitstream generated by the above video encoding method.

Drawings

Fig. 1 illustrates a system for video encoding and decoding according to an embodiment;

FIG. 2 illustrates an image partition structure according to an embodiment;

FIG. 3 illustrates a structure of intra prediction according to an embodiment;

Fig. 4 illustrates a structure of inter prediction for explaining an inter prediction process according to an embodiment;

FIG. 5 illustrates an order of adding spatial candidates to a candidate list according to an embodiment;

FIG. 6 shows a plurality of loop filters according to an example, and

Fig. 7 shows a structure of entropy encoding and entropy decoding according to an example.

Fig. 8 illustrates one embodiment of a template match.

FIG. 9 illustrates a target template that may be used for a target block.

Fig. 10 is an exemplary embodiment showing a target template for each sub-block in a sub-block unit-based template matching pattern.

Fig. 11 is another exemplary embodiment showing a target template of each sub-block in a sub-block unit-based template matching pattern.

Fig. 12 is an exemplary diagram illustrating a method for configuring a target template for each sub-block of a target block to which a geometric partition mode is applied.

FIG. 13 is another exemplary diagram illustrating a method for configuring a target template for each sub-block of a target block to which a geometric partition mode is applied.

Fig. 14 is a further exemplary diagram illustrating a method for configuring a target template for each sub-block of a target block to which a geometric partition mode is applied.

Fig. 15 and 16 illustrate various embodiments of a sub-sampling method for template matching.

Fig. 17 to 22 each illustrate a search method for template matching according to one embodiment.

Fig. 23 shows a method for configuring a target template in an affine mode.

Fig. 24a and 24b illustrate a method for configuring a reference template in affine mode according to one embodiment.

FIG. 25 illustrates bilateral matching according to one embodiment.

FIG. 26 illustrates an affine transformation model according to one embodiment.

Fig. 27 is an exemplary diagram illustrating a CPMV block including a control point and a target template of the CPMV block according to one embodiment.

Fig. 28 is another exemplary diagram illustrating a CPMV block including a control point and a target template of the CPMV block according to one embodiment.

Fig. 29 is an exemplary diagram illustrating the position of a reconstructed block considered for deriving prediction information of a target block according to one embodiment.

Fig. 30 is an exemplary diagram illustrating the position of a reconstructed block considered for deriving prediction information of a target block according to one embodiment.

Fig. 31 is an exemplary diagram illustrating the locations of temporally adjacent blocks used to derive affine transformation candidates according to some embodiments.

Detailed Description

Various modifications may be applied to the present disclosure. Further, the present disclosure may have various embodiments. Specific embodiments will be described with reference to the drawings and detailed description.

The specific embodiments are not intended to limit the disclosure to the particular mode of practice, and it is to be understood that all changes, equivalents, and alternatives that do not depart from the spirit or technical scope of the disclosure are included in the disclosure as embodiments.

These embodiments are described so that those skilled in the art to which the present disclosure pertains may readily practice the embodiments. It should be noted that the various embodiments are different from each other, but are not necessarily mutually exclusive. For example, the shapes, structures and characteristics associated with the embodiments may be applied or implemented in other embodiments without departing from the spirit and scope of the present disclosure. Further, it is to be understood that the location or arrangement of individual components within an embodiment may be changed without departing from the spirit and scope of the disclosure. Accordingly, the following detailed description is not intended to limit the scope of the disclosure, and the scope of the exemplary embodiments is limited only by the appended claims and equivalents thereof, so long as they are properly described.

A detailed description of embodiments to be described later may be made with reference to the drawings of the embodiments. The description made in the accompanying drawings or the description shown in the drawings may be considered part of the detailed description of the disclosure. In the drawings, like reference numerals may be used to designate the same or similar functions in all respects. The dependency relationships between components may not be limited to the dependency relationships shown in the figures.

In embodiments, singular expressions may include plural expressions and may be limited to plural expressions unless the context specifically indicates the contrary. In other words, in an embodiment, expressions such as "at least one" and "one or more" may be replaced with the term "plurality". Terms such as "/", "and/or", and "," of "one or more of the plurality of items may refer to 1) one of the plurality of items, 2) some of the plurality of items, 3) a combination of some of the plurality of items, or 4) a combination of the plurality of items. Furthermore, plural expressions may be replaced with singular expressions. Here, the plural may represent an integer of 1,2, 3, 4, or 5 or more.

In an embodiment, terms related to numerals such as "first" and "second" may be used to explain various components. These terms are only used to distinguish one element from another element and are not intended to limit the element. For example, a first component may be referred to as a second component without departing from the scope of the present disclosure. Similarly, the second component may also be referred to as the first component.

The fact that a first component sends (or provides) information to a second component may mean that the first component sends information directly to the second component, or may also mean that the first component sends information to the second component through a third component. Here, the information received (or acquired) by the second component may be information transmitted by the first component or information generated by applying a specific process to the information transmitted by the first component.

The components in the embodiments may be shown separately to indicate different feature functions, and the illustration does not mean that each component corresponds to separate hardware or one software element. That is, components in the embodiments may be divided and enumerated for convenience of description. Two or more components described in the embodiments may be regarded as one component. Furthermore, one component described in the embodiment may be divided into a plurality of components, which divide and individually perform the functions of the components. Embodiments in which these components are integrated and embodiments in which these components are separate may be included within the scope of the present disclosure without departing from the essence of the present disclosure.

The terminology used in the embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. In an embodiment, terms such as "comprising" or "having" are intended to indicate the existence of the features, numbers, steps, operations, components, portions, or combinations thereof described in the embodiment. These terms do not exclude the possibility that other features, numbers, steps, operations, components, parts or combinations thereof not explicitly described in the embodiments will be present or added. That is, the description of the components "including" a specific component in the embodiments does not exclude additional components other than the specific component, and means that additional components may be included within the scope of the embodiments of the present disclosure or within the scope of the technical spirit of the present disclosure.

Some components in the embodiments may be selective components other than the basic components of the present disclosure for performing basic functions. Such selective components may be used to improve performance. Embodiments may be implemented as a structure including only basic components required to implement gist of the embodiments, and not optional components. Such a structure is included in the scope of the embodiments.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present disclosure pertains can easily practice the present disclosure. In the description of the present disclosure, a detailed description of known functions and configurations that are considered to obscure the gist of the present disclosure will be omitted. In the following description of the present disclosure, the same reference numerals are used to denote the same or similar components throughout the drawings, and repeated descriptions of the same components will be omitted.

Substitution between terms in the examples

In the following description, terms listed in a row may be used in the embodiments to have the same meaning and may be used interchangeably with each other in the embodiments.

- "One or more" and "at least one"

- "Two or more", "multiple" and "multiple" (in embodiments, "one or more" or "at least one" may be further limited to "two or more", "multiple" or "multiple")

- "Information" and "signal"

- "Value", "predefined value", "specific value", "threshold", "baseline value" and "reference value"

"Statistical value" and "statistical value"

- "Indicator", "index", "flag" and "information"

"Encoder" and "encoding device"

"Decoder" and "decoding device"

"Entropy coding", "coding" and "encoding"

"Entropy decoding", "decoding" and "decoding"

"Codec" and "encoding and/or decoding"

- "Video", "moving picture", "image", "picture", "frame" and "screen"

"Reference picture" and "reference picture"

- "Reference Picture List (RPL)" and "reference picture list"

"Original", "input" and "Source"

- "Block", "Unit" and "Signal"

"Square" and "square shape"

- "Pixel", "sample" and "pel"

"Region", "zone", "portion" and "fragment"

- "Partition", "partition" and "partition"

"Quaternary" and "quaternary"

- "Luma component", "luma", and "Y"

- "Chroma component", "chroma component", "Cb and Cr", "Cb or Cr", "Cb", "Cr", "U and V", "U or V", "U" or "V"

"Target", "current" (e.g., target block and current block, or target image and current image)

"Neighbor", "adjacent", "neighbor/neighbor" (e.g., neighbor block, neighboring block, and neighbor/neighbor block)

- "Co-located" and "COL"

- "Reconstruction", "reconstruction" and "decoding"

- "Reconstructed", "reconstructed" and "decoded"

- "Difference", "error", "residual" and "residual"

-Maximum coding unit (LCU) and Coding Tree Unit (CTU)

"Inter" and "inter"

Inter prediction, inter prediction and motion compensation "

Inter mode, inter prediction mode, inter mode and inter prediction mode "

- "Motion vector", "predictive motion vector" and "Advanced Motion Vector Prediction (AMVP)".

- "List" and "candidate list"

- "Spatial candidate" and "spatial merge candidate"

- "Time candidate" and "time merge candidate"

"Prediction motion vector candidate" and "motion vector predictor"

"Prediction method" and "prediction mode"

"Intra" and "intra"

"Intra prediction" and "intra prediction"

"Intra mode" and "intra prediction mode"

"Dequantization" and "scaling"

"Quantization matrix" and "scaling list"

"Quantized matrix coefficients" and "matrix coefficients"

- "Transform coefficient level", "quantization coefficient", "quantized transform coefficient" and "quantized transform coefficient level"

"Inverse quantized coefficients" and "inverse quantized transform coefficients"

"Scanning type" and "scanning direction"

- "Direction mode", "Angle mode" and "Intra prediction mode"

"Mode number of intra prediction mode", "mode index of intra prediction mode", "mode value of intra prediction mode", "mode angle of intra prediction mode", "direction of intra prediction mode", "mode number of intra prediction direction", "mode index of intra prediction direction", "mode value of intra prediction direction" and "mode angle of intra prediction direction"

- "Merge mode" and "motion merge mode"

"Geometric Partition Mode (GPM)" and "triangle partition mode"

Terms having the same meaning based on common knowledge in the art, other than the terms exemplified above, may be used interchangeably with each other in the embodiments.

Information and ranges of information values described in the embodiments

In an embodiment, the information may include constants, flags, indexes, variables, coding parameters, elements, syntax elements, motion information, attributes, entities, objects, data, and the like. In other words, the term "information" may be used interchangeably with "data," flag, "" index, "" variable, "" element, "" syntax element, "" motion information, "" attribute, "or" entity.

The information may have one of a plurality of values. The term "nth value" may refer to an nth value of a plurality of values.

For example, the first value may represent a "0" or a logical false. The second value may represent a "1" or a (logical) true. Alternatively, the first value may represent a "1" or a (logical) true. The second value may represent a "0" or (logic) false.

The flag may be information having any one of values "0" and "1". In an embodiment, the values of the flags, i.e. "0" and "1", may be replaced by "1" and "0", respectively. For example, information indicating whether to perform a specific process or information indicating whether to apply a specific process may be regarded as a flag.

When a variable such as i or j is used to indicate a row, column, or index, the variable may be an integer equal to or greater than 0 and less than or equal to n-1. Alternatively, the variable may be an integer equal to or greater than 1 and less than or equal to n. Here, n may be the number of rows, columns, or the number of entities indicated by the index.

Concepts related to codec

Hereinafter, concepts related to codec will be described. The following description may be applied to the embodiments.

Predefined value a predefined value may refer to a value that is commonly used by an encoding device and a decoding device. For example, the predefined value may be limited to and interpreted as a fixed value. Alternatively, the predefined value may be a value shared between the encoding device and the decoding device by signaling. Alternatively, the predefined value may be a value derived by the encoding device and the decoding device through the same process such that the encoding device and the decoding device have a common value. Alternatively, the predefined value may be a common value that the encoding device and the decoding device have. The description of the predefined values may also be applied to the predefined information. In the above description, the "value" may be replaced with "information".

The values derived by the same procedure in the encoding device and the decoding device may comprise values derived by the same procedure in the encoding device and the decoding device for the same values and/or the same information.

The values derived by the same process in the encoding device and the decoding device may comprise values derived in the encoding device and the decoding device using the same conditional statement for the same values and/or the same information.

The description of the predefined value may also be applied to the predefined information. In the above description, the term "value" may be replaced with the term "information".

Availability-the fact that a particular mode is available for a particular target may mean that a mode selected from among the particular modes is used for the particular target. Other modes belonging to the category of a particular mode may be unavailable modes. The unavailable mode may not be used for a particular purpose. The description of the specific pattern may also be applied to other specific information. In the above description, the "mode" may be replaced with "information".

Adjacent the terms "direction" and the term "second entity" for "first entity" may refer to the "direction" of the first entity and the "second entity" adjacent to the corner/surface of the first entity. For example, the term "upper left block" for "target block" may be a block adjacent to the upper left of the target block. Here, the "first entity" may be a target unit, a target block, or a target sample. The term "direction" may refer to one of directions corresponding to upper left, upper right, upper left, right, lower left, lower right, and lower right. The term "second entity" may be a unit, block or spot. Here, with respect to directions corresponding to the upper left, upper right, lower left, and lower right, the corner of the first entity and the corner of the second entity may be diagonally adjacent to each other. With respect to the directions corresponding to the upper, left, right, and lower directions, one surface of the first entity and one surface of the second entity may be in contact with each other.

For example, the block adjacent to the upper left of the target block may be the block adjacent to the top of the block adjacent to the left of the target block. The block adjacent to the upper right of the target block may be a block adjacent to the right of the block adjacent to the top of the target block. The block adjacent to the lower left of the target block may be a block adjacent to the bottom of the block adjacent to the left of the target block.

Codec-codec may refer to the encoding and/or decoding of an image.

Signal a signal may refer to information about an image, a unit or a block. The specific signal may refer to a specific image, a specific unit, or a specific block.

An image may refer to one of the pictures that make up a video, and may also refer to the video itself. For example, "image encoding and/or decoding" may refer to "video encoding and/or decoding", and may also refer to "encoding and/or decoding one of the images constituting a video".

An image may refer to the whole of a picture or may refer to a portion of a picture, such as a block.

Target image the target image may be an encoding target image as an encoding target and/or a decoding target image as a decoding target. Further, the target image may be an input image processed by the encoding apparatus or a reconstructed image processed by the decoding apparatus. The target image may be an image including a target block.

Sprites a picture may be partitioned into one or more sprites.

The sprite may be an area in the picture having a square shape or a rectangular shape. Each sprite may have one or more CTUs.

A sprite may comprise one or more slices and/or one or more parallel blocks. For example, a sprite may include one or more stripe rows and one or more stripe columns. Optionally, each sprite may include one or more parallel block rows and one or more parallel block columns.

The sprite may comprise one or more strips that collectively cover a rectangular area in the picture. Thus, the boundary of each sprite may always be the boundary of a slice. Furthermore, each vertical sprite boundary may always be a vertical parallel block boundary.

The slice may include one or more parallel blocks in the picture. A stripe may include one or more parallel block rows and one or more parallel block columns.

Parallel blocks-parallel blocks may be square or rectangular areas in a picture. A parallel block may have one or more CTUs. A picture may be partitioned into one or more parallel block rows and one or more parallel block columns.

CTU-the image may be partitioned into a plurality of Coding Tree Units (CTUs).

The CTU may include one Y-Coded Tree Block (CTB) and at least one of Cb CTB and Cr CTB related to the Y CTB, and may include information about each CTB. The information may include syntax elements.

Each CTU may be partitioned using one or more partition types to form sub-units such as Coding Units (CUs), prediction Units (PUs), and Transform Units (TUs). The one or more partition types may include a Quadtree (QT) partition, a Binary Tree (BT) partition, and a Trigeminal Tree (TT) partition. Further, each CTU may be partitioned using a multi-type tree (MTT) partition that uses a combination of multiple partition types.

CTB may refer to one of Y CTB, cb CTB and Cr CTB.

A unit may determine the unit for a particular process in the codec. The element may be information about a specific area in the image. The image may be recursively partitioned into portions to perform particular codec processing. A unit may refer to an area to which a specific process is applied and information about the area.

The cell type may refer to the specific treatment applied to the cell. The specific process may be applied to the unit according to the unit type. The "specific" unit may be a unit of processing named "specific" in the codec. For example, the unit may be at least one of an original unit, a CTU, a coding unit, a prediction unit, a residual unit, a reconstructed residual unit, a transform unit, and a reconstruction unit.

The cells may comprise spots having a two-dimensional (2D) form or arrangement. In this regard, the term "unit" may refer to a "block". For example, the block may be at least one of an original block, a CTB, a Coded Block (CB), a Predicted Block (PB), a residual block, a reconstructed residual block, a Transformed Block (TB), and a reconstructed block. For example, a partition of a unit may refer to a partition of blocks corresponding to the unit.

The unit may comprise a syntax element. In other words, the blocks and syntax elements of the blocks may be collectively referred to as "units".

The block may be an array of mxn samples. Here, M and N may refer to positive integer values, and a block may refer generally to an array of samples in 2D form. The current block may refer to an encoding target block, which is a target to be encoded in encoding, and a decoding target block, which is a target to be decoded. Furthermore, the current block may be at least one of a coded block, a predicted block, a residual block, a transformed block, and a reconstructed block. The blocks may have various sizes and shapes. For example, the block shape may include one or more of a quadrangle, a rectangle, a square, a rectangle of a horizontal length different from a vertical length (i.e., rectangle), a trapezoid, a triangle, a right triangle, and a pentagon. Here, the horizontal length and the vertical length of the rectangle may be different from each other. Furthermore, the block shape may include other geometric figures that may be represented in two dimensions. For example, the block shape may be rectangular or pentagonal as defined by excluding right triangle areas from rectangular areas. Here, the right-angle vertex of the right triangle may be one of vertices of a rectangle. Further, the block shape may be a combination of two or more of the above shapes. Further, the block shape may be a remaining shape obtained by excluding the other shape from the above-described one.

In an embodiment, the rectangle may be limited to a non-square rectangle. In an embodiment, when the shape of the specific object is described as a rectangular shape, the description may additionally mean that the horizontal length and the vertical length of the specific object are different from each other.

In an embodiment, the blocks may be limited to at least one of vertically oriented blocks and horizontally oriented blocks. The vertically oriented blocks may be blocks having a vertical length greater than a horizontal length. The horizontally oriented blocks may be blocks having a horizontal length greater than a vertical length.

The unit may include a luminance component block (i.e., a Y block) and two chrominance component blocks (i.e., at least one of a Cb block, a Cr block, or a combination thereof), and may include information about each block. The information may include syntax elements.

The information about the units may include the type of unit, the size of the unit, the depth of the unit, the coding order of the unit, the decoding order of the unit, etc.

Target unit the target unit may be an encoding target unit that is a target to be encoded in encoding and a decoding target unit that is a target to be decoded. The target unit may be a specific region in a target picture to which one or more specific processes of the codec are to be applied. A particular type of unit may be generated by applying a particular process to the target unit. Alternatively, the target unit may refer to a unit having a specific type for a specific codec process.

Depth-a block may be hierarchically partitioned into multiple sub-blocks while having a depth that depends on the tree structure. The multiple sub-blocks generated from the partitioning operation of a block may be referred to as "partitions".

When the blocks constituting the image are represented in a tree structure, the depth of the blocks may represent the level of the node corresponding to each block. Alternatively, the depth of the block may indicate the number of partitions applied until the block is determined. As the block is further partitioned, the depth of the block may be increased by 1.

In the tree structure, the root node may be considered to have the lowest level and the leaf node the highest level. The root node may be the topmost node of the tree structure and may correspond to an initial block that is not partitioned. The level of the root node may be 0 or 1. When the level of the root node is 0, a node having level 1 may refer to a block determined as the initial block is partitioned once. A node having a level n may refer to a block that is determined when an initial block is partitioned n times. The leaf node may be the bottommost node of the tree structure. Leaf nodes may be nodes that cannot be further partitioned. The depth of the leaf node may be a predefined maximum depth. For example, the maximum depth may be a positive integer such as 3. The root node may be referred to as a CTU. A leaf node may refer to at least one of a CU, PU, and TU.

Depth may have a type that depends on the partition type. QT depth may represent depth in a quadtree partition. BT depth may represent depth in a binary tree partition. The TT depth may represent the depth in the trigeminal tree partition.

Sample points the sample points may be the basic units that make up the block. A sample may consist of one or more bits. The bit depth may refer to the number of bits constituting a sample. The samples may be represented by values ranging from 0 to 2 ^Bd -1, depending on the bit depth.

PU may represent a basic unit for prediction-related processing. For example, prediction-related processing may include inter-prediction, intra-block copy (IBC) prediction, intra-compensation, and motion compensation.

One PU may be partitioned into a plurality of sub-PUs, each sub-PU having a size smaller than the size of the PU. Each of the plurality of sub-PUs may also be a base unit for prediction-related processing. In other words, the prediction unit partition generated from the partition operation of the prediction unit may also be a prediction unit.

TU may be a basic unit for processing related to a residual block. The processing related to the residual block may include at least one of transformation, inverse transformation, quantization, inverse quantization, transform coefficient coding, transform coefficient decoding, entropy coding, or entropy decoding, or a combination thereof. One TU may be partitioned into a plurality of sub-transform units, each sub-transform unit having a size smaller than that of the TU. Each of the plurality of sub-TUs may also be a base unit for processing related to the residual block. In other words, the transform unit partition generated from the partition operation of the transform unit may also be a transform unit.

The transforms may include one or more of primary transforms and secondary transforms, and the inverse transforms may include one or more of primary inverse transforms and secondary inverse transforms.

Parameter set-the parameter set may correspond to header information in the structure of the bitstream.

The parameter sets may comprise at least one of a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS) or a Decoding Parameter Set (DPS) or a combination thereof.

The information signaled by the parameter set is applicable to the pictures of the reference parameter set. For example, information in a VPS may be applied to a picture referencing the VPS. The information in the SPS may be applied to pictures referencing the SPS. The information in the PPS may be applied to a picture referencing the PPS. The parameter set may refer to an upper layer parameter set. For example, PPS may refer to SPS. SPS may refer to VPS.

Furthermore, the parameter set may comprise parallel chunk information, slice header information, and parallel chunk header information. A parallel block group may refer to a group or stripe comprising a plurality of parallel blocks.

Most Probable Mode (MPM) MPM may refer to an intra-prediction mode having a high likelihood of intra-prediction for a target block.

One or more different MPMs may be determined based on the coding parameters associated with the target block and the attributes of the entity associated with the target block.

One or more MPMs may be determined based on intra-prediction modes of the reference block. The reference block may include a plurality of reference blocks. One or more different MPMs may be determined according to which intra-prediction modes have been used for one or more reference blocks. The reference block may include a spatial neighboring block.

MPM list the MPM list may be a list comprising one or more MPMs. The number of one or more MPMs in the MPM list may be predefined.

MPM index the MPM index may indicate MPMs for intra-prediction of the target block among one or more MPMs in the MPM list.

MPM use indicator may indicate whether an MPM list is used for prediction of a target block.

Prediction mode the prediction mode may be information indicating a prediction method for the target block, such as a mode to be used for intra prediction or a mode to be used for inter prediction. The prediction mode may indicate one of modes related to prediction, which will be described in the embodiments. Further, the prediction modes may include at least one of an intra mode, an inter mode, an intra block copy mode, or a combination thereof.

Reference picture list the reference picture list may be a list comprising one or more reference pictures to be used for prediction of the target block.

The reference image list may comprise a plurality of lists. The plurality of reference image lists may include List0 (List 0; L0), list1 (List 1; L1), and the like.

For inter prediction of the target block, one or more reference picture lists may be used. In the names of information related to inter prediction, parts such as "L0" and "L1" may refer to a reference picture list related to the information.

Reference picture (reference picture) the reference picture (picture) may be a picture that is referenced for predicting the target block. Alternatively, the reference picture may be an image including a reference block. The reference picture may include an image preceding the target image, and an image following the target image.

Reference picture index the reference picture index may be an index indicating one reference picture (picture) for predicting a target block among one or more reference pictures in a reference picture list.

Reference block-a reference block may be a block that is referenced for encoding/decoding (such as prediction and filtering) a target block. For example, the reference block may include reference samples that are referenced to derive prediction samples, and may refer to a block that provides information for decoding a target block.

Reference samples the reference samples may be samples that are referenced for encoding/decoding (such as prediction and filtering) the target block.

Inter prediction indicator-the inter prediction indicator may indicate the direction of inter prediction for the target block. Inter prediction may be one of unidirectional prediction and bidirectional prediction. Alternatively, the inter prediction indicator may represent the number of reference pictures used to generate the prediction block for the target block. Alternatively, the inter prediction indicator may represent the number of prediction blocks used for inter prediction of the target block. The reference direction may refer to an inter prediction indicator. For example, the inter prediction indicator may represent one of a unidirectional indicator and a bidirectional indicator. Alternatively, in the inter mode using only the reference pictures in the reference picture list L0, the inter prediction indicator may have "0" as a first value, in the inter mode using only the reference pictures in the reference picture list L1, the inter prediction indicator may have "1" as a second value, and in the inter mode using at least two of the reference pictures in the reference picture list L0 and the reference pictures in the reference picture list L1, the inter prediction indicator may have "2" as a third value.

Prediction list utilization flag the prediction list utilization flag of a particular reference picture list may indicate whether at least one reference picture in the particular reference picture list is used to generate a prediction block for a target block. For example, a prediction list utilization flag of a particular reference picture list equal to "0" may indicate that reference pictures in the particular reference picture list are not used to generate the prediction block. A prediction list utilization flag of a particular reference picture list equal to "1" may indicate that a prediction block is generated using reference pictures in the particular reference picture list.

The inter prediction indicator may be derived using the prediction list utilization flag. Alternatively, the inter-prediction indicator may be used to derive a prediction list utilization flag. For example, the inter prediction indicator may be derived using a prediction list utilization flag of multiple reference picture lists. When the inter prediction indicator indicates that a specific reference list among the plurality of reference picture lists is used, a prediction list utilization flag of the specific reference list indicated by the inter prediction indicator among the prediction list utilization flags of the plurality of reference picture lists may be set to "1", and a prediction list utilization flag of the remaining reference picture lists not indicated by the inter prediction indicator may be set to "0".

Reference direction the reference direction may indicate a list of reference pictures for prediction of the target block. For example, the reference direction may indicate one or more of the reference picture list L0 and the reference picture list L1.

The reference direction only indicates the list of reference pictures for prediction of the target block and may not indicate that the direction of the reference pictures in the list of reference pictures is limited to the forward direction or the backward direction. In other words, each of the reference image list L0 and the reference image list L1 may include a forward image and a backward image. Here, the forward direction may indicate a direction from the target image to an image preceding the target image. The forward inter prediction may be inter prediction using a previous image of the target image as a reference image. The backward direction may indicate a direction from the target image to an image subsequent to the target image. The backward inter prediction may be inter prediction using a subsequent image of the target image as a reference image.

A unidirectional reference direction may mean that a reference picture list is used. A bi-directional reference direction may mean that two reference picture lists are used. For example, the reference direction may indicate one of using only the reference picture list L0, using only the reference picture list L1, and using both reference picture lists. Further, the reference direction may be indicated by an inter prediction indicator.

Picture Order Count (POC): the POC of a picture (picture) may indicate the display order or output order of the picture (picture).

Motion information the motion information may be information for specifying a reference block. The motion information may include information for inter prediction such as a Motion Vector (MV), a reference picture index, a reference picture, an inter prediction indicator, and a prediction list utilization flag. In addition, the motion information may include information used in a specific inter prediction mode, such as MV candidates, MV candidate indexes, merge candidates, and merge indexes. In addition, the motion information may include information related to a block vector described below. The information related to the block vector may refer to information including at least one of a block vector, a block vector candidate, and a block vector candidate index.

For inter prediction for a target block, pieces of motion information related to multiple reference picture lists may be used, respectively. Motion information of a specific reference picture list may be used for prediction using the specific reference picture list. A plurality of (intermediate) prediction blocks may be derived from the plurality of pieces of motion information, respectively. The statistics of the plurality of (intermediate) prediction blocks may be used to generate a (final) prediction block of the target block.

Motion Vectors (MVs) may be two-dimensional (2D) vectors for inter prediction. MV may refer to an offset between a target block and a reference block. Alternatively, the MV may indicate a difference between the position of the target block and the position of the reference block.

MV can be expressed, for example, in the form of (MV _x, mv_y). mv _x may indicate a horizontal component and mv _y may indicate a vertical component.

The zero vector may be a (0, 0) MV.

Block Vectors (BV) BV may be two-dimensional (2D) vectors for intra copy prediction. BV may refer to an offset between a target block in a target picture and a reference block in the target picture. That is, the BV may indicate a displacement between the target block and the reference block in the target image.

BV may be expressed in the form of (BV _x, bv_y), for example, similar to MV. bv _x may indicate a horizontal component and bv _y may indicate a vertical component.

The zero vector may be a (0, 0) MV.

Motion information candidates motion information of the target block may be selected from motion information candidates determined through a specific scheme in a specific prediction. The motion information candidates may refer to motion information of the reference block, and may refer to the reference block itself having motion information. Here, the reference block may be a block determined by a specific scheme to select a motion information candidate.

Candidate list the candidate list may be a list comprising one or more candidates. For example, the candidate list may include a motion information candidate list, a merge candidate list, a MV candidate list, an MPM list, and the like. The candidate list may be generated by the encoding device and the decoding device in the same manner. In other words, the candidate list used by the encoding device and the candidate list used by the decoding device may be identical to each other, and the identical candidate list may be shared between the encoding device and the decoding device. The encoding device may select a candidate to be used for processing the target block from among candidates in the candidate list. An indicator indicating the selected candidate may be signaled from the encoding device to the decoding device. The decoding apparatus may specify a candidate to be used for processing the target block among candidates in the candidate list by using the indicator. Alternatively, the encoding device and the decoding device may specify candidates to be used for processing the target block among candidates in the candidate list based on the same rule.

Motion information candidate list a motion information candidate list may refer to a list constructed using one or more motion information candidates.

Motion information candidate index the motion information candidate index may be an identifier or indicator indicating a motion information candidate for prediction of the target block among the motion information candidates in the motion information candidate list.

In a particular inter prediction mode, motion information of the target block may be derived using motion information of the additional reconstructed block. The additional block may be a neighboring block. In a particular inter prediction mode, the motion information itself of the target block may not be signaled separately, and additional information for deriving the motion information of the target block based on the motion information of the additional reconstructed block may be signaled. Here, the additional information may include information indicating a block in the additional reconstructed block (such as a motion information candidate index), the motion information of which is used to derive the motion information of the target block.

For example, the inter prediction mode may include AMVP mode, merge mode, skip mode, and the like. The motion information candidate index may be a merge index or an MV candidate index.

In an embodiment, the MV may be part of the motion information. In an embodiment, information related to motion information (such as motion information candidates, motion information candidate lists, and motion information candidate indexes) may be replaced with information related to MVs (such as MV candidates, MV candidate lists, and MV candidate indexes), and description of motion information may also be applied to MVs.

Merging the term "merging" may refer to merging pieces of motion information of a plurality of blocks and may refer to applying motion information to a target block together with motion information of one additional block. In other words, the merge mode may refer to a mode in which motion information of a target block is derived from motion information of neighboring blocks.

Merging candidates the merging candidates may refer to specific (reconstructed) blocks for merging the target blocks and may refer to motion information of the specific blocks. Alternatively, the merge candidate may include motion information of a specific block.

The merging candidates of the target block may comprise spatial merging candidates, temporal merging candidates, history-based candidates, average candidates based on an average of two merging candidates, zero merging candidates, etc.

Merge candidate list-the merge candidate list may be a list constructed using one or more merge candidates.

Merging index-the merging index may be an indicator indicating a merging candidate for the prediction target block among the merging candidates in the merging candidate list. The motion information of the merge candidate indicated by the merge index among the merge candidates in the merge candidate list may be used as the motion information of the target block.

Adjacent block-the adjacent block may be a block adjacent to the target block. The neighboring blocks may include spatial neighboring blocks and temporal neighboring blocks. The neighboring block may refer to a reconstructed neighboring block in the reference image. The neighboring block does not necessarily need to be adjacent to the target block.

Spatially adjacent blocks the spatially adjacent blocks may be blocks spatially adjacent to the target block.

The target block and the spatially neighboring block may be comprised in the target image.

The spatially adjacent blocks may comprise blocks, at least part of the boundary of which is in contact with at least part of the boundary of the target block. Alternatively, the spatially adjacent blocks may include blocks having a distance from the target block less than or equal to a specific value.

The spatially neighboring blocks may comprise blocks diagonally adjacent to the vertices of the target block.

The spatial neighboring blocks may include an upper left block adjacent to an upper left of the target block, an upper block adjacent to a top of the target block, an upper right block adjacent to an upper right of the target block, a left block adjacent to a left side of the target block, a right block adjacent to a right side of the target block, a lower left block adjacent to a lower left of the target block, a lower block adjacent to a bottom of the target block, and a lower right block adjacent to a lower right of the target block.

Time-adjacent block-the time-adjacent block may be a block that is adjacent in time to the target block.

The temporal neighboring blocks may comprise co-located blocks (COL blocks). The COL block may be a block in the reconstructed image stored in the reference image (picture) buffer. A co-located picture (co-located picture: COL picture) may refer to a picture comprising COL blocks. The COL picture may be an image included in the reference image list.

The COL-block may be determined based on the location of the target block in the target image. The fact that two blocks are "adjacent to each other in time" may mean that the positions of the two blocks satisfy a certain condition.

The position of the COL block in the COL picture may be the same as the position of the target block in the target image. Alternatively, the position of the COL block in the COL picture may correspond to the position of the target block in the target image. Here, the positions of the blocks corresponding to each other may mean that the areas of the blocks are identical to each other, may mean that an area of one block is included in an area of the additional block, and may mean that one block occupies a specific position of the additional block.

For example, the position of the COL block in the COL picture may be the same as the position of the target block in the target image. Alternatively, the COL block may be a block including COL samples in the COL picture. The COL samples may be samples having the same coordinates as those of the specific samples in the target block.

The temporal neighboring blocks may be blocks temporally adjacent to the spatial neighboring blocks of the target block.

Neighboring samples-neighboring samples may refer to samples within neighboring blocks. The neighboring samples may include a prediction sample, a reconstruction sample, a residual sample, and a decoding sample.

Search scope the search scope may refer to the 2D scope of the search for MVs performed during inter prediction. For example, when the best MV is derived to process the target block, the best MV is selected from among MVs indicating the inside of the search range.

Transform coefficients the transform coefficients may be coefficients generated by performing a transform on the residual block. Alternatively, the transform coefficient may be a coefficient value generated by performing inverse quantization on the quantization level.

Quantization level the quantization level may be an integer number that serves as an input for inverse quantization.

Quantization may be the process of generating quantization levels for transform coefficients. Quantization levels may be generated by applying quantization to the transform coefficients. The transform may also be considered as part of quantization.

Inverse quantization-inverse quantization may be a process of multiplying a factor by a quantization level. The (reconstructed) transform coefficients may be generated by applying inverse quantization to the quantization levels.

Quantization Parameter (QP) QP may refer to a factor used when generating quantization levels for transform coefficients. Furthermore, QP may refer to a factor used when generating (reconstructed) transform coefficients for quantization levels in inverse quantization. Alternatively, the QP may be a value mapped to a quantization step size.

Delta QP may be the difference between the QP predicted by a particular process and the QP of the target block. In other words, the QP of the target block may be the sum of the predicted QP and the delta QP.

Quantization matrix the quantization matrix may be a matrix used in quantization or inverse quantization to improve the subjective quality or objective quality of the image.

Quantization matrix coefficients the quantization matrix coefficients may be each element in the quantization matrix.

Scanning may refer to a method of arranging values in blocks or matrices. These values may be coefficients. For example, a scan may refer to an arrangement of values arranged in a 2D form, a 1D form, and to a rearrangement of values arranged in a 1D form, a 2D form. Inverse scanning may refer to an arrangement (or rearrangement) that is opposite to the arrangement performed in the scanning.

Non-zero transform coefficients-non-zero transform coefficients may refer to transform coefficients having non-zero values, or quantization levels having non-zero values.

Bit stream a bit stream may refer to a bit stream/sequence comprising encoded information generated by encoding an image. The bitstream may include information depending on a specific syntax element. For example, the information may include syntax elements. The encoding device may generate a bitstream including a plurality of pieces of information depending on a specific syntax element. The decoding apparatus may acquire pieces of information from the bitstream according to a specific syntax element.

Signaling of information may refer to transmitting information from an encoding device to a decoding device via a bitstream. For example, the information may include syntax elements. Alternatively, signaling may refer to the process by which the encoding device includes information in the bitstream. The information signaled by the encoding device may be used by the decoding device. In signaling, the bit stream may be transmitted over a network and may be included in a storage/recording medium. In an embodiment, the description indicating that the information is signaled may include 1) the encoding device determining and generating the information to signal the information, 2) the encoding device generating the encoded information by encoding the information, 3) transmitting (encoded) the information from the encoding device to the decoding device through a bitstream, 4) the decoding device acquiring the information by decoding the encoded information, and 5) the decoding device determining and generating the information by signaling the information.

The encoding device may generate encoded information by performing encoding on the information. The encoded information may be signaled by a bitstream. The decoding device may acquire information by performing decoding on the encoded information.

Signaling of information for a specific target may mean that a plurality of pieces of information are used for the specific target, respectively, and that the processing indicated by the plurality of pieces of information are applied to the specific target, respectively. For example, signaling information at a particular unit level may mean that the corresponding information is used/processed for each particular unit.

The signaled information may comprise one or more pieces of sub-information. Signaling of the specific information may mean signaling each of one or more pieces of sub information included in the specific information.

Selective signaling-signaling of information may be selectively performed. Selective signaling of information may refer to the process by which an encoding device selectively includes information in a bitstream (depending on particular conditions). Selective signaling of information may refer to the process by which a decoding device selectively obtains information from a bitstream (depending on particular conditions).

Skip signaling-signaling of information may be skipped. The signaling of skip information may refer to the process where the encoding device does not include the information in the bitstream (depending on the particular conditions). The signaling of the skip information may refer to a process in which the decoding apparatus does not acquire information from the bitstream (depending on a specific condition). In an embodiment, the decoding device may derive the skipped signaled information by using other information.

Symbol a symbol may refer to at least one piece of information of a target unit, such as syntax elements, coding parameters, quantization levels, and transform coefficients of the target unit or target block. Further, a symbol may refer to a target of entropy encoding or a result of entropy decoding.

Entropy coding may be a technique that allocates fewer bits to more frequently occurring symbols and more bits to less frequently occurring symbols. Since the symbols are represented by such allocation, the bit stream indicating the symbols can be reduced in size.

Entropy coding may be implemented using any method such as Variable Length Coding (VLC) and context-adaptive binary arithmetic coding (CABAC). For example, entropy encoding may be performed using a variable length table in Variable Length Coding (VLC). For example, in CABAC, a binarization method for a symbol and a probability model for a symbol/binary number may be derived to perform entropy encoding, and arithmetic encoding using a context may be performed.

Entropy decoding in which the processing performed in entropy encoding may be performed inversely. The symbols may be generated by entropy decoding performed on the bitstream.

Parsing may refer to determining values of syntax elements by performing entropy decoding on encoded information of a bitstream. Alternatively, parsing may refer to entropy decoding itself.

Statistics values of pieces of information related to a specific entity (entities) described in the embodiment may be used as inputs for a specific operation. The statistical value may be a value derived by performing a particular operation on a value associated with such particular entity(s). For example, the statistical value of the plurality of pieces of specific information may indicate one or more values generated by an average, a weighted average (or weighted mean), a weighted sum, a minimum, a maximum, a crowd value, a median, an interpolated value, a sum of products, and a product of the sums of the values of the plurality of pieces of specific information. Furthermore, information (such as constants, variables, and encoding parameters) having a specific value determined by an operation (calculation) in an embodiment may have a specific statistical value according to the embodiment.

Coding parameters

In an embodiment, the encoding parameter may be information required for encoding and decoding. The encoding parameters may include information signaled from the encoding device to the decoding device, may include information calculated/derived during the encoding process described in the embodiments, and may include information for processing the encoding described in the embodiments.

In an embodiment, the encoding parameters may include one or more of a size of the CTU, a size of the cell, a form of the cell, a shape of the cell, a depth of the cell, a minimum cell size, a maximum cell depth, a minimum cell depth, partition information of the cell, quadtree (QT) partition information, binary Tree (BT) partition information, partition direction of the BT partition, partition form of the BT partition, trigeminal Tree (TT) partition information, partition direction of the TT partition, partition form of the TT partition, multi-type tree (MTT) partition information, a combination of MTT partitions, partition direction of the MTT partition, and the like, Partition form, prediction mode, intra prediction mode, luma intra prediction mode, chroma intra prediction mode, intra partition information, inter partition information, coding block partition information, prediction block partition information, transform block partition information, reference sample line index, reference sample point filter method, reference sample point filter tap, reference sample point filter coefficient, prediction block filter method, prediction block filter tap, prediction block filter coefficient, prediction block boundary filter method, prediction block boundary filter tap, prediction block boundary filter coefficient, inter prediction mode, motion information, motion Vector (MV), motion Vector Difference (MVD), MVD resolution, MV magnitude, MV representation precision, reference picture list, reference picture index, inter prediction direction, inter prediction indicator, prediction list utilization flag, picture Order Count (POC), MV candidate index, MV candidate list, advanced Motion Vector Prediction (AMVP) mode use information, merge candidate, merge index, merge candidate list, merge mode use information, correction information for motion information, skip mode use information, intra block copy mode use information, block Vector (BV), block Vector Difference (BVD), BVD resolution, BV magnitude, BV represents precision, BV candidate index, BV candidate list, filter tap of interpolation filter, filter coefficient of interpolation filter, transform type, transform size, transform selection information, primary transform use information, secondary transform use information, primary transform selection information, secondary transform selection information, residual block presence information, coding block pattern, coding block flag, quantization Parameter (QP), delta QP, quantization matrix, deblocking filter use information, coefficient of deblocking filter, filter tap of deblocking filter, deblocking filter strength, deblocking filter shape/form, adaptive sample point offset use information, and method of processing a BV signal, Adaptive sample offset value, adaptive sample offset class, adaptive sample offset type, adaptive loop filter usage information, coefficients of an adaptive loop filter, filter taps of an adaptive loop filter, adaptive loop filter shape/form, binarization/anti-binarization method, context model decision method, context model update method, normal mode usage information, bypass mode usage information, significant coefficient flag, last significant coefficient flag, coefficient group unit encoding flag, last significant coefficient position, flag indicating whether coefficient value is greater than 1, flag indicating whether coefficient value is greater than 2, flag, a flag indicating whether the coefficient value is greater than 3, residual coefficient value information, sign information, context binary number, bypass binary number, reconstructed sample point, reconstructed luminance sample point, reconstructed chroma sample point, residual luminance sample point, residual chroma sample point, transform coefficient, luminance transform coefficient, chroma transform coefficient, transform coefficient level, luminance transform coefficient level, chroma transform coefficient level, transform coefficient level scanning method, quantization level, quantization luminance level, quantization chroma level, size of MV search range on the decoding apparatus side, shape of MV search range on the decoding apparatus side, number of MV search iterations on the decoding apparatus side, Picture type, slice identification information, slice type, slice partition information, parallel block group identification information, parallel block group type, parallel block group partition information, parallel block identification information, parallel block type, parallel block partition information, bit depth, input sample bit depth, reconstructed sample bit depth, residual sample bit depth, transform coefficient bit depth, quantization level bit depth, mapping availability information, luminance signal information, chrominance signal information, color space of a target block, color space of a residual block, and temporal layer information.

Furthermore, the encoding parameters may further include 1) a value of information that can be included in the encoding parameters, 2) a combination of pieces of information that can be included in the encoding parameters, 3) a statistical value of information that can be included in the encoding parameters, 4) information related to the encoding parameters, 5) information for calculating/deriving the encoding parameters, and 6) information calculated/derived using the encoding parameters.

In an embodiment, the "X use information" may be information indicating whether "X is used/applied/executed". Alternatively, "X usage information" may be information indicating "whether X is available". For example, the "specific pattern use information" may be information indicating whether or not a specific pattern is used. The mode information may indicate a mode for the target block among the modes described in the embodiment. In the embodiment, the specific mode use information may be replaced with the mode information, and the description of the specific mode use information may also be applied to the mode information. The "X usage information" and the "X indicator" may be used interchangeably with each other.

In an embodiment, the coding parameters and the syntax elements may correspond to each other. For example, a syntax element according to an embodiment may be used as the coding parameter, and the coding parameter may be signaled as the syntax element.

In an embodiment, the "X presence information" may be regarded as "information indicating whether" X is present "or" information indicating whether information indicating X is present in the bitstream ".

In an embodiment, the "X selection information" may be information indicating one of candidates or methods for X. The "X selection information" may be regarded as an "X index".

In an embodiment, the partition form of the particular tree may indicate one of a symmetric partition and an asymmetric partition, and may indicate one of QT, BT, TT, and a non-partition. The partition direction of a particular tree may indicate one of a horizontal direction and a vertical direction.

In an embodiment, when the encoding parameter has one of a plurality of values, the encoding parameter may be "replaced" with "information indicating whether the encoding parameter has a specific value of the plurality of values available for the encoding parameter.

In an embodiment, when the encoding parameter has one of a plurality of values, the "encoding parameter" may be replaced with "information indicating whether the encoding parameter indicates a specific object of a plurality of objects".

In an embodiment, the encoding parameter may include at least one of a type of the target picture and a type of the target slice. The type of the target picture may be one of an I picture, a B picture, and a P picture. The type of target stripe may be one of an I-stripe, a B-stripe, and a P-stripe.

When the target image to be encoded is an I-slice, the target image may be encoded using data within the image itself without inter-coding with reference to other pictures. For example, I slices may be encoded using intra prediction only.

When the target picture is a P-slice, the target picture can be encoded by inter prediction using only reference slices present in one direction. Here, one direction may be a forward direction or a backward direction.

When the target image is a B slice, the target image may be encoded by using inter prediction of a reference slice existing in two directions or by using inter prediction of a reference slice existing in one of a forward direction and a backward direction. Here, the two directions may include a forward direction and a backward direction.

P slices and B slices encoded and/or decoded using reference slices may be considered as pictures that allow inter prediction.

System for video coding and decoding

Fig. 1 illustrates a system for video encoding and decoding according to an embodiment.

The system 100 may include at least one of the encoding device 110 or the decoding device 150 or a combination thereof.

Each of the encoding device 110 and the decoding device 150 may be a computer or an electronic device.

Structure of coding device

The encoding device 110 may include a processor 120, a memory 140, and a communicator 149.

The processor 120, the memory 140, and the communicator 149 may be connected to each other by a bus.

The processor 120 may be a semiconductor device, such as a Central Processing Unit (CPU), that executes instructions or computer-executable code. Processor 120 may be at least one hardware processor.

In an embodiment, the processor 120 may perform generation and processing of information input to the encoding device 110 or output from the encoding device 110 or used in the encoding device 110, and may perform comparison, determination, etc. related to the information.

Processor 120 may include a number of components. The plurality of components may include a partitioner 122, a subtractor 124, a transformer 125, a quantizer 126, an inverse quantizer 127, an inverse transformer 128, an adder 129, a filter 130, and an entropy encoder 139.

At least some of the components described above may be program modules. Program modules may be included in the encoding device 110 in the form of operating systems, applications, and other program modules. Program modules may be instructions or computer-executable code stored in memory 140 and executed by processor 120.

The memory 140 may include various types of volatile storage media and nonvolatile storage media. For example, the memory 140 may include memories such as Read Only Memory (ROM) and Random Access Memory (RAM).

Memory 140 may store instructions and computer-executable code for the operation of encoding device 110 and may store information and bitstreams described in the embodiments. The memory 140 may include a reference picture buffer 141.

Communicator 149 may perform functions related to the communication of information in encoding device 110. For example, the communicator 149 may send the bitstream to the decoding device 150.

In the name of the components of the encoding device 110, the "-unit" may be replaced by a "-unit" or a "-device". Memory 140 may be designated as a memory unit.

Operation of an encoding device

The encoding device 110 may sequentially encode one or more images (pictures) of the video.

The memory 140 may store an original image. The original image may be used as a target image in the encoding apparatus 110.

The processor 120 may generate a bitstream including encoding information by performing encoding on the target image, and may store the generated bitstream in the memory 140. The generated bit stream may be stored in a computer readable storage medium and may be transmitted by the communicator 149 to the communicator 189 of the decoding device 150 through a wired and/or wireless transmission medium.

The partitioner 122 may determine the target block by partitioning the target image.

The predictor 123 may determine a prediction mode of the target block. The predictor 123 may generate a prediction block for the target block by performing prediction corresponding to the prediction mode.

The prediction mode of the target block may be one of the available prediction modes. For example, available prediction modes may include intra prediction, inter prediction, and IBC prediction.

For example, when the prediction mode corresponds to intra prediction, the predictor 123 may generate a prediction block for a target block by performing intra prediction on the target block.

For example, when the prediction mode corresponds to inter prediction, the predictor 123 may generate a prediction block for the target block by performing inter prediction on the target block.

For example, when the prediction mode corresponds to IBC, the predictor 123 may generate a prediction block for the target block by performing IBC on the target block.

The subtractor 124 may generate a residual block of the target block. The residual block may be a difference between the original block and the prediction block. The original block may be an area indicated by a target block in the original image. Alternatively, the residual block may refer to a block generated by applying one or more of transformation and quantization to a difference between the original block and the predicted block.

The transformer 125 may generate transform coefficients by performing a transform on the residual block.

The transformer 125 may perform the transformation using one of a variety of transformation methods.

For example, the various transform methods may include Discrete Cosine Transform (DCT), discrete Sine Transform (DST), karhunen-Loeve transform (KLT), transform on a per transform basis, and so forth.

The transform skip mode may be a mode in which a reconstructed block is generated using both a reconstructed residual block and a prediction block, on which transformation and inverse transformation are not performed. When the transform skip mode is applied to the target block, the transform and the inverse transform of the target block may be skipped, and only quantization and inverse quantization (inverse quantization) of the target block may be performed.

The quantizer 126 may generate quantization levels by applying quantization using quantization parameters to the transform coefficients. In an embodiment, the quantization level may also be referred to as a "transform coefficient".

The entropy encoder 139 may generate encoded information by performing entropy encoding according to a probability distribution of information for image decoding. The bitstream may include encoded information.

The information for image decoding may include quantization levels, syntax elements, etc., calculated by the quantizer 126.

The probability distribution may be determined based on the quantization level and the coding parameter.

The entropy encoder 139 may change the quantization level having a 2D block form into a 1D vector form using scanning to perform encoding on the quantization level. In the scanning, it is possible to determine which of the upper right diagonal scan, the vertical scan, and the horizontal scan is to be used based on encoding parameters such as the size of a block and an intra prediction mode of the block.

When encoding is performed on the target image/block, the predictor 123 performs prediction using the reference image/block. The encoded target image/block may be used as a reference image/block for another image/block that is subsequently processed. Accordingly, the processor 120 may perform reconstruction of the encoded target block, and may store a reconstructed image including the reconstructed target block generated by the reconstruction as a reference image in the reference picture buffer 141. Inverse quantization and inverse transformation may be performed on the encoded target block to perform reconstruction.

The inverse quantizer 127 may generate inverse quantized transform coefficients by performing inverse quantization on the quantization levels.

The inverse transformer 128 may generate inverse quantized and inverse transformed coefficients by performing inverse transform on the inverse quantized transform coefficients. In an embodiment, the inverse quantized and/or inverse transformed coefficients may refer to coefficients to which at least one of the inverse quantization or inverse transformation, or a combination thereof, is applied. The inverse quantized and inverse transformed coefficients may be reconstructed residual blocks.

The adder 129 may generate a reconstructed block by adding the prediction block and the reconstructed residual block to each other.

The reconstructed block may be processed by a filter 130. The filter 130 may be configured to apply one or more of a plurality of filters to the target. Each of the plurality of filters may be a loop filter. The target may be a reconstructed sample, a reconstructed block, or a reconstructed image.

The reference picture buffer 141 may store the reconstructed block/image provided from the filter 130. The reconstructed image may be an image comprising reconstructed blocks. Alternatively, the reconstructed image may be an image composed of reconstructed blocks.

The reference picture buffer 141 may provide the stored reconstructed image as a reference image to the predictor 123. From the standpoint of storage of decoded (i.e., reconstructed) images (pictures), the reference picture buffer 141 may also be referred to as a "Decoded Picture Buffer (DPB)".

Structure of decoding device

The decoding device 150 may include a processor 160, a memory 180, and a communicator 189.

The description of the processor 120, memory 140 and communicator 149 associated with the encoding device 110 is also applicable to the processor 160, memory 180 and communicator 189 associated with the decoding device 150. Repeated descriptions will be omitted herein.

Processor 160 may include a number of components. The plurality of components may include an entropy decoder 161, a partitioner 162, a predictor 163, an inverse quantizer 167, an inverse transformer 168, an adder 169, and a filter 170.

The memory 180 may include a reference picture buffer 181.

The communicator 189 may perform functions related to communication of information in the decoding device 150. For example, the communicator 189 may receive a bit stream from the encoding device 110.

In the names of the components of decoding device 150, a "-unit" may be substituted for a "-unit" or a "-device. The memory 180 may be designated as a memory unit.

Operation of decoding device

The communicator 149 of the encoding device 110 may transmit the bitstream generated by the encoding device 110 to the decoding device 150. Alternatively, the computer readable storage medium storing the bitstream may transmit the bitstream generated by the encoding apparatus 110 to the decoding apparatus 150.

The communicator 189 may receive the bit stream from the encoding device 110 through a wired/wireless transmission medium. The received bit stream may be stored in the memory 180.

The processor 160 may retrieve the bit stream from the memory 180 or a computer readable storage medium.

The bitstream may include encoded information.

The entropy decoder 161 may generate information for image decoding by performing entropy decoding on encoded information of a bitstream based on probability distribution.

The information for image decoding may include quantization levels, syntax elements, and the like.

The entropy decoder 161 may change a quantization level having a 1D vector form into a 2D block form using scanning to perform decoding on the quantization level. In the scanning, it is possible to determine which of the upper right diagonal scan, the vertical scan, and the horizontal scan is to be used based on encoding parameters such as the size of a block and an intra prediction mode of the block.

The entropy decoder 161 may provide syntax elements to other components of the processor 160, such as the partitioner 162.

Co-description based on relationships between components of an encoding device and components of a decoding device

The decoding apparatus 150 performs decoding using the bit stream generated by the encoding apparatus 110. The encoding apparatus 110 may perform encoding on the target block using the reconstructed image derived in the decoding apparatus 150 instead of the original image not provided to the decoding apparatus 150. Thus, the encoding device 110 and the decoding device 150 need to be able to generate reconstructed blocks/images in the same way. In this regard, the descriptions of the partitioner 122, the predictor 123, the inverse quantizer 127, the inverse transformer 128, the adder 129, the filter 130, and the reference picture buffer 141 of the encoding device 110 disclosed in the embodiment may be applied to the partitioner 162, the predictor 163, the inverse quantizer 167, the inverse transformer 168, the adder 169, the filter 170, and the reference picture buffer 181 of the decoding device 150, respectively. Repeated descriptions will be omitted herein.

Further, each of the partitioner 122, the predictor 123, the inverse quantizer 127, the inverse transformer 128, the adder 129, and the filter 130 of the encoding device 110 may generate information on syntax elements for specifying a process of a target. Further, each of the partitioner 162, the predictor 163, the inverse quantizer 167, the inverse transformer 168, the adder 169, and the filter 170 of the decoding apparatus 150 may perform processing (the same processing as performed by the encoding apparatus 110) on the target using information on the syntax element.

As described above, the corresponding components of the encoding device 110 and decoding device 150 may perform the same or corresponding functions. In an embodiment, the processor may refer to the processor 120 of the encoding device 110 and/or the processor 160 of the decoding device 150. For example, in prediction related functions, a processor may refer to predictor 123, subtractor 124, and adder 129, and may refer to predictor 163 and adder 169. In the transformation-related functions, the processor may refer to the transformer 125 and the inverse transformer 128, and may refer to the inverse transformer 168. In quantization related functions, the processor may be referred to as a quantizer 126 and an inverse quantizer 127, and may be referred to as an inverse quantizer 167. In functions related to entropy encoding/decoding, the processor may refer to the entropy encoder 139 and/or the entropy decoder 161. In functions related to filtering, the processor may refer to filter 130 and/or filter 170. Memory may refer to memory 140 of encoding device 110 and/or memory 180 of decoding device 150. The reference picture buffer may refer to the reference picture buffer 141 of the encoding device 110 and/or the reference picture buffer 181 of the decoding device 150. The communicator may refer to the communicator 149 of the encoding device 110 and/or the communicator 189 of the decoding device 150.

Partitioning of units constituting an image

Fig. 2 illustrates an image partition structure according to an embodiment.

Fig. 2 may schematically illustrate an example in which one unit is partitioned into a plurality of sub-units.

A CU may be used as a base unit for image encoding and decoding. Furthermore, a CU may be a basic unit for prediction, transformation, quantization, inverse transformation, entropy encoding, and entropy decoding.

A CU may be used as a unit to apply a prediction mode. In other words, it may be determined which of the available prediction modes available for each CU is to be applied for encoding. For example, available prediction modes may include intra prediction, inter prediction, and Intra Block Copy (IBC) prediction.

The target image 200 may be sequentially partitioned into units of CTUs. For each CTU, a partition structure may be determined. CTUs may be partitioned into CUs depending on the partition structure. Alternatively, one CTU may be used as a CU. The size of the CTU may be the maximum CU size.

Each CU may have depth information. The depth information may refer to the depth of the CU, and may represent the size of the CU. The depth of the CTU may be 0. The depth of a CU generated by partitioning CTUs may be 1. When the parent CU is partitioned into child CUs, the depth of the child CU may be increased by 1 from the depth of the parent CU. The number of CUs generated by partitioning may be a positive integer equal to or greater than 2, including 2,4, 8, 16, etc. Depending on the number of sub-CUs, at least one of the horizontal size and the vertical size of the sub-CU generated by partitioning the parent CU may be smaller than at least one of the horizontal size and the vertical size of the parent CU.

The partitioned CUs may be recursively partitioned in the same manner until a predefined maximum depth or a predefined minimum size. The depth of the minimum coding unit (SCU) may be a predefined maximum depth, and the size of the SCU may be a predefined minimum size. The SCU size may be the minimum CU size.

For example, the range of CU depths may correspond to values ranging from 0to 3. Depending on the CU depth, a CU may have a size ranging from 64 x 64 to 8 x 8. CTUs of depth 0 may be 64 x 64 blocks. 0 may be the minimum depth. The SCU of depth 3 may be 8 x 8 blocks. 3 may be the maximum depth. Depth 0 may represent CTUs as 64 x 64 blocks. Depth 1 may represent a CU as a 32 x 32 block. Depth 2 may represent a CU as a 16 x 16 block. Depth 3 may represent SCU as an 8 x 8 block.

The partition information of the CU may indicate whether the CU is partitioned. The partition information may be a 1-bit flag. All CUs except SCU may include partition information. For example, partition information of a CU that is not further partitioned may be "0" as a first value, and partition information of a CU that is partitioned may be "1" as a second value.

Quadtree (QT) partitioning may refer to partitioning a CU into four CUs. When a parent CU is partitioned into four child CUs, the horizontal and vertical dimensions of each child CU may be half the horizontal and vertical dimensions of the parent CU, respectively.

Binary Tree (BT) partition may refer to partitioning a CU into two CUs. For example, when a parent CU is partitioned into two child CUs, the horizontal or vertical size of each child CU may be half of the horizontal or vertical size of the parent CU.

Trifurcate (TT) partitioning may refer to partitioning a CU into three CUs. For example, when a parent CU is partitioned into three child CUs, the three child CUs may be generated by partitioning the horizontal or vertical size of the parent CU in a ratio of 1:2:1. The horizontal or vertical dimensions of the child CUs may be 1/4, 1/2, and 1/4 of the horizontal or vertical dimensions of the parent CUs, respectively.

In fig. 2, QT partition is applied to a first CTU. QT partition, BT partition, and TT partition are applied to the second CTU.

To partition the CTU, at least one of different types of partitions such as QT partition, BT partition, and TT partition may be applied to the CTU. Different types of partitioning may be applied based on a particular priority.

For example, QT partition may be preferentially applied to CTUs. A CU that cannot further apply QT partitioning may correspond to a leaf node of QT. A CU that is a leaf node of QT may be a root node of BT and/or TT. A CU that is a leaf node of QT may be partitioned in BT form or TT form, or may not be further partitioned. In this case, the QT partition may not be reapplied to the CU generated by applying the BT partition or the TT partition to the CU that is a leaf node of the QT.

The QT partition information may be used to signal the partition of the CU corresponding to each node of the QT. QT partition information may be a flag. The QT partition information of the unit may be information indicating whether the unit is partitioned in QT form. A "0" as a first value of QT partition information may indicate that a CU is not partitioned in QT form. QT partition information having a first value may mean a multi-type tree (MTT) partition. The MTT partition may include a BT partition and a TT partition. A "1" as a second value of QT partition information may indicate that a CU is partitioned in QT form.

There may be no priority between BT partition and TT partition. That is, CUs corresponding to leaf nodes of QT may be partitioned in BT form or TT form. Further, a CU generated by BT partition or TT partition may be further partitioned in BT form or TT form, or may not be further partitioned.

The CU corresponding to the leaf node of QT may be the root node of MTT. For a CU corresponding to each node of the MTT, the CU may further include partition direction information and partition form information in the form of the MTT.

The partition direction information may indicate a partition direction of the MTT partition. A "0" as a first value of the partition direction information may indicate that the CU is partitioned in the horizontal direction. A "1" as a second value of the partition direction information may indicate that the CU is partitioned in the vertical direction.

The partition form information may indicate a partition form for a multi-type tree (MTT) partition. A "0" as a first value of partition form information may indicate that a CU is partitioned in TT form. A "1" as a second value of partition form information may indicate that the CU is partitioned in BT form.

Here, each of the above-described partition direction information and partition form information may be a flag having a specific length (e.g., 1 bit).

Partition information of a CU may also include QT partition information, partition direction information, and partition form information.

A CU that is not further partitioned by QT partition, BT partition, or TT partition may be used as a unit of specific processing such as prediction, transformation, quantization, inverse transformation, entropy encoding, and entropy decoding. That is, a CU may not be partitioned further for a particular process. Thus, partition information for partitioning a CU into PUs and/or TUs may not be present in the bitstream.

Conversely, when the size of a CU is greater than the maximum TU size, the CU may be recursively partitioned until the size of the CU becomes less than or equal to the maximum TU size. For example, when the size of a CU is 64×64 and the maximum TU size is 32×32, the CU may be partitioned into four 32×32 TU to perform the transform. For example, when the size of a CU is 32×64 and the maximum TU size is 32×32, the CU may be partitioned into two 32×32 TU to perform the transform.

In this case, information indicating whether the CU is partitioned to perform the transformation may not be separately signaled. Whether a CU is partitioned may be determined through a comparison between the size (horizontal size/vertical size) of the CU and the maximum TU size (horizontal size/vertical size) without signaling. For example, a CU may be vertically halved when the horizontal size of the CU is greater than the horizontal size of the maximum TU size. Furthermore, a CU may be horizontally halved when the vertical size of the CU is greater than the vertical size of the maximum TU size.

For example, the smallest size of a CU may be 4×4. For example, the maximum size of the transform block may be 64×64. For example, the minimum size of the transform block may be 4×4. The minimum size of the QT may be the minimum size of a CU corresponding to a leaf node of the QT. The maximum depth of the MTT may be the maximum depth of the path from the root node to the leaf node of the MTT.

The maximum size of the BT may represent the maximum size of the CU corresponding to each node of the BT, and the maximum size of the TT may represent the maximum size of the CU corresponding to each node of the TT. The minimum size of BT and/or the minimum size of TT may be set to the minimum size of CU.

When the depth of a CU corresponding to a node of the MTT in the MTT is equal to the maximum depth of the MTT, the CU may not be partitioned in BT form and/or TT form.

Based on the above-described various sizes and various depths of the CU, each piece of information described in the embodiments may or may not exist in the bitstream.

Information about the maximum size or minimum size described in the embodiments may be signaled at a level higher than the CU. In an embodiment, the higher than CU levels may include video levels, sequence levels, picture levels, sprite levels, parallel block group levels, parallel block levels, slice levels, and the like.

The information described in the embodiments may be signaled separately for different types of stripes. Different types of slices may include intra slices and inter slices.

Processing of blocks depending on their attributes

Whether or not the specific processing described in the embodiment is applied/executed may be executed based on the attribute of the block related to the specific processing. Whether to apply/execute the specific process described in the embodiment may be determined based on whether the attribute of the block related to the specific process satisfies the specific condition. For example, the blocks may include a target block, a neighboring block, and a reference block. This block may include other blocks described in the embodiments. The block may be one of the blocks and units described in the embodiments.

The blocks of the specific process described in the application embodiments may have square shapes or non-square shapes.

In an embodiment, the attributes of the block may include the size of the block. The specific processing described in the embodiment may be applied/executed when a specific condition related to the block size is satisfied.

In an embodiment, the specific conditions may include a condition of a minimum block size and a condition of a maximum block size. The block to which the condition of the minimum block size is applied and the block to which the condition of the maximum block size is applied may be different from each other.

In an embodiment, a minimum block size and/or a maximum block size for a particular process may be predefined.

In an embodiment, when the block size is equal to or larger than the minimum block size and/or when the block size is smaller than or equal to the maximum block size, the process according to the embodiment may be applied/performed. Alternatively, in an embodiment, when the block size is larger than the minimum block size and when the block size is smaller than the maximum block size, the processing according to the embodiment may be applied/performed.

In the embodiment, only when the block size is equal to or larger than the minimum block size and smaller than or equal to the maximum block size, the processing according to the embodiment can be applied/performed. Alternatively, the processing according to the embodiment may be applied/executed only when the block size is greater than the minimum block size and less than or equal to the maximum block size. Alternatively, the processing according to the embodiment may be applied/performed only when the block size is equal to or larger than the minimum block size and smaller than the maximum block size. The processing according to the embodiment may be applied/performed only when the block size is greater than the minimum block size and less than the maximum block size.

In an embodiment, the processing according to the embodiment may be applied/performed only when the block size is a predefined block size.

In an embodiment, the block size may be determined based on various methods. For example, the block size may represent a horizontal size or a vertical size of the block. The block size may represent both the horizontal and vertical dimensions of the block. The block size may represent the area of the block. The block size may represent 1) a result value of a well-known equation, 2) a result value of an equation according to an embodiment, or 3) a statistical value using a horizontal size and a vertical size of the block.

Further, for the first size, the processing according to the first embodiment among the embodiments may be applied/executed, and for the second size, the processing according to the second embodiment among the embodiments may be applied/executed.

In embodiments, the block size may be 2×2,4×4, 8×8, 16×16, 32×32, 64×64, or 128×128. Alternatively, in an embodiment, the block size may be (2 x size _X)×(2*SIZE_Y) or the like. SIZE _X may be one of integers of 1 or greater. SIZE _Y may be one of integers of 1 or greater.

Prediction information for prediction

To generate a prediction block for a target block, prediction information may be used.

The encoding apparatus 110 may generate prediction information required for prediction, and may generate a bitstream including the prediction information. The prediction information may be signaled from the encoding device 110 to the decoding device 150 via a bitstream. The decoding apparatus 150 may acquire prediction information from the bitstream, and may generate a prediction block by performing prediction on a target block using the prediction information.

The prediction information may include intra prediction information, inter prediction information, and IBC prediction information. In an embodiment, the prediction information may be replaced with intra prediction information, inter prediction information, and/or IBC prediction information. Intra prediction may include information for intra prediction described in the embodiments. The inter prediction may include information for inter prediction described in the embodiments. The IBC information may include information for IBC prediction described in the embodiments.

Intra prediction

Fig. 3 illustrates a structure of intra prediction according to an embodiment.

Intra prediction may be performed using reference samples and coding parameters for the target block. The reference samples may be (reconstructed) samples in a (reconstructed) reference block. Alternatively, the intermediate prediction samples may be generated using samples such as the reconstruction samples described in the embodiments, and then the reference samples may be generated using the intermediate prediction samples. The processing such as filtering described in the embodiments may be applied when generating the reference samples.

The reference block may be a (spatially) neighboring block of the target block. The coding parameters may be coding parameters for the target block and/or coding parameters for the reference block. In intra prediction, the reference samples may be neighboring samples.

The prediction block may be generated by intra-predicting the target block according to an intra-prediction mode based on a reference sample of the target image and information related to the reference sample. The size of the target block and the size of the prediction block may be equal to each other.

In an embodiment, the prediction block may be a PU. Alternatively, the prediction block may correspond to a CU or TU described in the embodiment. The prediction block may have a square or rectangular shape.

The intra prediction mode may be represented by at least one of a mode number, a mode value, a mode angle, and a mode direction. In the lower right part of fig. 3, prediction directions of a plurality of intra prediction modes for a target block are shown. Among the plurality of intra prediction modes, the remaining intra prediction modes other than the DC mode and the plane mode may be directional modes. The direction mode may be an intra prediction mode having a specific direction or a specific angle. The intra prediction mode for the target block may be selected from a directional mode and a non-directional mode.

In the rectangle indicating the target block in the lower right part of the drawing, the numeral "0" may indicate a planar mode as a non-directional intra prediction mode. The numeral "1" may indicate a DC mode as a non-directional intra prediction mode. In a rectangle indicating a target block in the lower right part of the drawing, an arrow from the center of the rectangle to the outside may indicate a prediction direction of the directional intra prediction mode. Further, the numbers indicated near the arrow may represent examples of mode values assigned to the intra prediction mode or prediction directions of the intra prediction mode.

Intra prediction may be performed according to an intra prediction mode for a target block. One of the intra prediction modes available for the target block may be used as the intra prediction mode for the target block.

The number of intra prediction modes available for the target block may be a predefined value. Alternatively, the number of intra prediction modes available for the target block may be determined based on the properties of the prediction block. For example, the attributes of the prediction block may include coding parameters such as shape, size, and color components.

For example, in fig. 3, the directional pattern indicated by the dotted line (i.e., the directional pattern having a number corresponding to any one of-14 to-1 or a number corresponding to any one of 67 to 80) may be applied only to prediction for non-square blocks. Thus, the number of intra prediction modes available for prediction of square blocks may be 67 (planar mode, DC mode, 65 directional modes).

For example, the number of available intra prediction modes may vary according to which of the luminance signal and the chrominance signal corresponds to the color component of the block. The number of available intra-prediction modes for the luma component block may be greater than the number of available intra-prediction modes for the chroma component block.

The intra prediction modes may include a horizontal down mode, a horizontal mode, a vertical mode, and a vertical right mode. The below-horizontal mode may be an intra prediction mode located below the horizontal mode. The vertical right mode may be a mode located right of the vertical mode. For example, in fig. 3, the mode value of the horizontal mode may be 18. The mode value of the vertical mode may be 50. The intra prediction mode (each of which has a mode value of 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, and 66) may be a vertical right mode. The intra prediction mode having the mode value of one of 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17 may be a below-horizontal mode.

The number of intra prediction modes and the mode numbers of the respective intra prediction modes described above may be merely examples. The number of intra prediction modes and the mode numbers of the respective intra prediction modes described above may be defined differently according to an embodiment and its implementation and/or necessity.

When a prediction block for a target block is generated in a case where an intra prediction mode is a planar mode, a sample value of the prediction sample may be generated using a weighted sum of an upper reference sample of the target sample, a left reference sample of the target sample, an upper right reference sample of the target block, and a lower left reference sample of the target block according to a position of the prediction sample in the prediction block.

When the intra prediction mode is a DC mode, a prediction block may be generated based on an average of sample values of a plurality of reference samples. The plurality of reference samples may include an upper reference sample and a left reference sample of the target block. The value of the predicted sample in the prediction block may be determined based on an average of the sample values of the plurality of reference samples. Furthermore, filtering using the values of the reference samples may be performed on specific rows and/or specific columns in the target block. The particular row may be one or more upper rows adjacent to the upper reference sample point. The particular column may be one or more left columns adjacent to the left reference sample point.

When the intra prediction mode is a direction mode, a prediction block may be generated using an upper reference sample, a left reference sample, an upper right reference sample, and/or a lower left reference sample of the target block.

Intra prediction modes of neighboring blocks of the target block may be used to determine intra prediction modes of the target block. Information for determining the intra prediction mode of the target block may be signaled.

For example, when the intra prediction mode of the target block and the intra prediction modes of the neighboring blocks are identical to each other, an indicator indicating that the intra prediction mode of the target block and the intra prediction mode of the neighboring blocks are identical to each other may be signaled.

For example, an indicator indicating the same intra prediction mode as the intra prediction mode of the target block among the intra prediction modes of the plurality of neighboring blocks may be signaled.

For example, when intra prediction modes of a target block and neighboring blocks are different from each other, an indicator indicating the intra prediction mode of the target block may be signaled. Alternatively, information for deriving an intra prediction mode of the target block based on intra prediction modes of neighboring blocks may be signaled.

The reference samples for intra prediction of the target block may include a lower left reference sample, a left side reference sample, an upper left reference sample, an upper right reference sample, and the like.

For example, the left reference sample may be a reconstructed reference sample adjacent to the left side of the target block. The upper reference sample may be a reconstructed reference sample adjacent to the top of the target block. The upper left reference sample point may be a reconstructed reference sample point diagonally adjacent to the upper left corner of the target block. The lower left reference sample point may be a reference sample point located below the left reference sample point among sample points located on the same line as the left sample point line composed of the left reference sample points. The upper right reference sample point may be a reference sample point located to the right of the upper reference sample point among the sample points located on the same line as the upper sample point composed of the upper reference sample points.

A reference sample for intra prediction of the target block may be determined based on the intra prediction mode of the target block. To determine the sample value of the prediction samples of the prediction block, one or more reference samples may be used. In fig. 3, the direction of the intra prediction mode indicated by the arrow may indicate a direction from a prediction sample to each reference sample. The direction of the intra prediction mode may represent a dependency relationship between the reference samples and the prediction samples. For example, depending on the intra prediction mode, a sample value of a specific reference sample may be used as a sample value of at least one sample of the prediction block. Here, the specific reference sample and at least one sample of the prediction block may be samples designated as a straight line in the direction of the corresponding intra prediction mode. In other words, the sample value of a particular reference sample may be copied to the sample value of a predicted sample that is located in the opposite direction to the direction of the intra prediction mode. Alternatively, the sample value of the prediction sample in the prediction block may be a sample value of a reference sample located in the direction of the intra prediction mode based on the position of the prediction sample.

The reference samples for intra prediction may not be limited to the samples just adjacent to the target block. As shown in fig. 3, for intra prediction of a target block, at least one of reference sample line 0 to reference sample line 3 or a combination thereof may be used.

Each reference sample line of fig. 3 may include one or more reference sample points. When the number of reference sample lines is smaller, a line closer to the reference sample point of the target block may be indicated. The reference sample line 0 may be a line of reference sample points that are immediately adjacent to the target block. When the upper left coordinate of the target block is (X, Y), the horizontal length of the target block is W, and the vertical length of the target block is H, the reference sample point on the reference sample line 0 may be a sample point having an X coordinate of X-1 or a Y coordinate of Y-1. Here, the Y-coordinate of the reference sample point whose X-coordinate is X-1 may be Y-1 to y+2h. The X-coordinate of the reference sample point with Y-1 may be X-1 to x+2w. The reference sample points in the reference sample line A may be sample points with X-A-1 or Y-A-1 coordinates. Here, the Y-coordinates of the reference sample points whose X-coordinates are X-A-1 may be Y-A-1 to Y+2H+A. The X-coordinate of the reference sample point with Y-A-1 may be X-A-1 to X+2W+A. "A" may be 1,2 or 3.

The samples in segment a and segment F may be derived by padding using the closest samples to segment B and segment E, respectively, rather than being obtained from reconstructing neighboring blocks.

The reference sample line index may indicate a reference sample line for intra prediction of the target block among the plurality of reference sample lines. For example, the reference sample line index may have a value corresponding to one of 0 to 3. The reference sample line index may be signaled.

When inter-color component intra prediction is used for a target block, a prediction block of a second color component may be generated based on a reconstructed block of a first color component of the target block. For example, the first color component may be a luminance component and the second color component may be a chrominance component.

For inter-color component intra prediction between color components, parameters between a first color component and a second color component may be derived based on a template. For example, the parameter may be a parameter of a linear model.

For example, the template may include an upper reference sample and/or a left reference sample of the target block, and may include an upper reference sample and/or a left reference sample of the reconstructed block of the first color component corresponding to the reference sample.

When the parameters are derived, a prediction block for the second color component of the target block may be generated by applying the reconstructed block of the first color component to the linear model. Depending on the image format or the type of intra prediction between the color components, sub-sampling/downsampling may be performed on neighboring samples of the reconstructed block of the first color component and the reconstructed block of the first color component. When sub-sampling is performed, the derivation of parameters and intra prediction between color components may be performed using corresponding samples derived by sub-sampling.

Intra-sub-partition (ISP) prediction may refer to intra-prediction that is sequentially performed on a plurality of sub-blocks generated by partitioning a target block. In ISP prediction, a target block may be partitioned into two or four sub-blocks in the horizontal direction and/or the vertical direction. Sub-blocks generated from partitions may be reconstructed sequentially. When intra prediction is performed on each sub-block, a sub-prediction block for each sub-block may be generated. Further, when inverse quantization and/or inverse transformation is performed on each sub-block, a sub-residual block for the corresponding sub-block may be generated. The corresponding reconstructed sub-block may be generated by adding the sub-prediction block to the sub-residual block. The reconstructed sub-block may be used as a reference sample for intra prediction of other sub-blocks to be subsequently processed.

When prediction is performed with respect to a target block, it may be determined whether a sample included in a reconstructed neighboring block may be used as a reference sample of the target block. When an unavailable sample point that cannot be used as a reference sample point of the target block exists among the sample points in the neighboring block, a sample point value of each unavailable sample point may be replaced based on a value generated using copy and/or interpolation of sample point values of at least one sample point included in the reconstructed neighboring block. When a value generated by copying and/or interpolation replaces a sample value of a sample, the corresponding sample may be used as a reference sample of the target block.

When intra prediction is performed, a sample value of a prediction sample in a prediction block may be determined based on a sample value of a reference sample. The position of the reference sample may be specified by the position of the prediction sample and the direction of the intra prediction mode. When the position specified by the position of the prediction sample and the direction of the intra prediction mode is an integer position, a sample value of one reference sample indicated by the integer position may be used to determine a sample value of the prediction sample in the prediction block. When the position specified by the position of the prediction sample point and the direction of the intra prediction mode is not an integer position, an interpolation reference sample point may be generated based on two reference sample points closest to the specified position. The sample values of the interpolated reference samples may be used to determine sample values of the predicted samples. In other words, when a position specified by the position of the prediction sample and the direction of the intra prediction mode indicates a position between two reference samples, an interpolation sample value may be generated based on the sample values of the two samples.

Inter prediction

Fig. 4 illustrates a structure of inter prediction for explaining inter prediction processing according to an embodiment.

The rectangle shown in fig. 4 may show an image (screen). Further, in fig. 4, an arrow may designate a prediction direction.

The respective images constituting the video may be classified into an I picture (i.e., an intra picture), a P picture (i.e., a unidirectional prediction picture), and a B picture (i.e., a bidirectional prediction picture) according to the encoding type. The codec of each picture may be performed according to a codec type of each picture.

When the target picture is an I picture, the encoding and decoding of the target picture can be performed using information in the target picture without referring to inter prediction of other pictures. For example, the encoding and decoding of I pictures may be performed using intra prediction and/or IBC prediction.

The encoding and decoding of the P picture and the B picture may be performed by at least one of intra prediction, IBC prediction, and inter prediction using a reference picture.

When the target picture is a P picture, the encoding and decoding of the target picture may be performed using unidirectional inter prediction employing one reference picture list.

When the target picture is a B picture, the encoding and decoding of the target picture may be performed using unidirectional inter prediction or bidirectional inter prediction employing two reference picture lists.

Hereinafter, inter prediction for a target block in an inter mode according to an embodiment will be described in detail.

When the prediction mode of the target block is an inter mode, inter prediction for the target block may be performed. The target block may be a prediction block or a partition prediction block.

Inter prediction may be performed using reference pictures and motion information. In inter prediction, a reference picture (reference picture) may be selected using a reference picture index, and a reference block corresponding to a target block may be determined in the reference picture using motion information. A prediction block for the target block may be generated using the determined reference block.

The motion information may be derived using coding parameters or the like. For example, motion information may be derived using motion information of reconstructed neighboring blocks, motion information of COL blocks, and/or motion information of blocks adjacent to COL blocks.

In an embodiment, the candidate list may be used for inter prediction. The candidate list may include a plurality of candidates. An index indicating a candidate for inter prediction of the target block among candidates in the candidate list may be signaled. The candidate list may be derived by the encoding device 110 and the decoding device 150 in the same manner based on the same information. Here, the same information may include a reconstructed image and a reconstructed block. Furthermore, in order to specify candidates with indexes, the order of the candidates in the candidate list needs to be uniform.

In an embodiment, prediction for a target block may be performed by using motion information of a spatial candidate or a temporal candidate as motion information of the target block. The motion information of the spatial candidate may be referred to as spatial motion information. The motion information of the temporal candidate may be referred to as temporal motion information.

The spatial candidate may be a reconstructed spatially neighboring block that is spatially neighboring the target block.

The spatial candidates may be 1) blocks that exist in the target image, 2) have been reconstructed by decoding, and 3) are adjacent to the target block.

The spatial candidates may include a left block, an upper block, a lower left block, an upper right block, and an upper left block of the target block.

The temporal candidates may be reconstructed temporal neighboring blocks that exist in the reconstructed COL image (picture) and correspond to the target block.

In an embodiment, the motion information of the spatial candidate may be motion information of a block including the spatial candidate. The motion information of the temporal candidate may be motion information of a block including the temporal candidate.

In inter prediction, a COL block for a target block may be identified. The region of the target block in the target image and the region of the COL block in the COL picture may be identical to each other. In other words, the COL block may be a block occupying a specific area in the COL picture. The specific region may be a region corresponding to a region of the target block in the COL picture.

The temporal candidates may indicate locations inside the COL block and/or locations outside the COL block within the COL picture.

For example, the COL blocks may include a first COL block and a second COL block. The upper left coordinates of the COL block are (xP, yP) and the size of the COL block is (nPSW, nPSH), the first COL block may be a block occupying coordinates (xp+ nPSW, yp+ nPSH). The second COL block may be a block occupying coordinates (xp+ (nPSW > > 1), yp+ (nPSH > > 1)). The second COL block may be selectively used as a COL block when the first COL block is not available.

The MV of the target block may be determined based on the MV of the COL block. Scaling may be performed on MVs of COL blocks. The scaled MVs of the COL block may be used as MVs of the target block or predicted MVs. Alternatively, MVs of temporal candidates listed in the candidate list related to inter prediction may be scaled MVs.

The ratio of scaled MVs to MVs of COL blocks may be the same as the ratio of the first temporal distance to the second temporal distance. The first temporal distance may be a distance between a reference image of the target block and the target image. The second temporal distance may be a distance between a reference image of the COL block and a COL picture (COL image).

The scheme for deriving motion information may be determined according to an inter prediction mode of the target block. For example, AMVP mode, merge mode, skip mode, merge mode with MVD, sub-block merge mode, GPM, combined Inter Intra Prediction (CIIP) mode, or affine inter mode may be used as the inter prediction mode. In the following embodiments, respective inter prediction modes will be described.

AMVP mode

When AMVP mode is used as the prediction mode, MV candidate lists including one or more MV candidates may be created using MVs of spatial candidates, MVs of temporal candidates, history-based MV candidates, zero vectors, and the like. At least one of a spatial candidate MV, a temporal candidate MV, a zero vector, etc. may be determined and used as a MV candidate.

The spatial candidates may include reconstructed spatial neighboring blocks. The MVs of the reconstructed spatial neighboring blocks may be referred to as spatial Motion Vector (MV) candidates. The temporal candidates may include COL blocks and blocks adjacent to COL blocks. The MVs of the COL block or MVs of blocks adjacent to the COL block may be referred to as temporal Motion Vector (MV) candidates. The history-based MV candidates may be MVs included in a list of MVs of other blocks encoded/decoded before the target block is encoded/decoded.

The encoding apparatus 110 may determine MVs to be used for encoding the target block within the search range using the MV candidate list. The maximum number of MV candidates in the MV candidate list may be predefined. N may represent a predefined maximum number. For example, N may be 2. Alternatively, the maximum number of candidates may be signaled from the encoding device to the decoding device or may be derived by the decoding device. The encoding apparatus 110 may determine MV candidates to be used as predicted MVs of the target block among MV candidates in the MV candidate list. The MV to be used for encoding the target block may be an MV that can be encoded at a minimum cost. The encoding device 110 may determine whether the AMVP mode is to be used for encoding the target block, and may generate AMVP mode usage information indicating whether the AMVP mode is used.

The inter prediction information may include 1) AMVP mode usage information, 2) MV candidate index, 3) MVD, 4) MVD resolution information, 5) reference direction, and 6) reference picture index, and may further include a residual block. The inter prediction information may be signaled from the encoding device 110 to the decoding device 150 through a bitstream.

The decoding apparatus 150 may acquire AMVP mode usage information from the bitstream. When the AMVP mode usage information indicates that the AMVP mode is used, the decoding apparatus 150 may acquire the MV candidate index, the MVD resolution information, the reference direction, and the reference picture index from the bitstream. Among the MV candidates included in the MV candidate list, the MV candidate indicated by the MV candidate index may be selected as a predicted MV of the target block.

The MVD may represent the difference between MVs that will be actually used for inter prediction of the target block and the predicted MVs. The encoding apparatus 110 may derive a prediction MV that is closer to a MV that will be actually used for inter prediction of a target block in order to use an MVD having a magnitude as small as possible. The decoding apparatus 150 may derive MVs of the target block by adding MVDs to the predicted MVs. In other words, the MV of the target block derived by the decoding apparatus 150 may be the sum of the MVD and the predicted MV candidate.

In addition, the encoding device 110 may generate MVD resolution information. The MVD resolution information may be used to adjust the resolution of the MVD. The decoding apparatus 150 may adjust the resolution of the MVD using the MVD resolution information.

Furthermore, the encoding device 110 may calculate the MVD based on the affine model. Affine control points MV of the target block may be derived based on the sum of affine control point MV candidates and MVDs. The MV of each sub-block in the target block may be derived using the affine control point MV.

Merge mode

When the merge mode is used, a merge candidate list including a plurality of merge candidates may be created using motion information of spatial candidates, motion information of temporal candidates, and the like. The motion information may include 1) a Motion Vector (MV), 2) a reference picture index, 3) a reference direction, and the like. Each merge candidate may be motion information.

The merge candidates may include 1) spatial merge candidates generated based on spatial candidates, 2) temporal merge candidates generated based on temporal candidates, 3) history-based merge candidates, 4) average merge candidates, 5) zero merge candidates, and the like.

The history-based merge candidate may be information including pieces of motion information of other blocks encoded/decoded before the target block is encoded/decoded.

The average merge candidate may be a merge candidate generated based on an average of two merge candidates in the merge candidate list.

The zero merge candidate may be zero vector motion information. The zero vector motion information may be motion information in which MV is a zero vector.

The merge candidates may be added to the merge candidate list according to a predefined method and a predefined order such that the merge candidate list has a preset number of merge candidates. The same merge candidate list may be constructed by the encoding device 110 and the decoding device 150 by a predefined method and a predefined order.

The encoding device 110 may select a merge candidate to be used for encoding the target block from among the merge candidates in the merge candidate list. The encoding apparatus 110 may determine whether a merge mode is to be used to encode the target block, and may generate merge mode use information indicating whether the merge mode is used.

The inter prediction information may include 1) merge mode usage information, 2) merge index, 3) correction information, and the like, and may include a residual block. The inter prediction information may be signaled from the encoding device 110 to the decoding device 150 in the form of a bitstream.

The decoding apparatus 150 may acquire the merge mode use information from the bitstream. When the merge mode use information indicates that the merge mode is used, the decoding apparatus 150 may acquire information related to the merge mode, such as a merge index, from the bitstream.

The encoding apparatus 110 may select the best merge candidate from among the merge candidates included in the merge candidate list, and may set a value of the merge index to indicate the selected merge candidate.

The correction information may be information for correcting the MV. The encoding device 110 may generate correction information. The decoding apparatus 150 may derive a corrected MV by correcting the MV of the merging candidate selected through the merging index based on the correction information. The corrected MV may be used as the MV of the target block.

In an embodiment, the correction information may include MVDs. The correction information may include one or more of correction usage information, correction direction information, and correction size information. The correction use information may indicate whether correction for MV is used. The merge mode in which correction of MVs is performed based on correction information may be referred to as a merge mode with MVD.

In the merge mode, prediction for the target block may be performed using a merge candidate indicated by the merge index among the merge candidates included in the merge candidate list.

Motion information of the target block may be derived using 1) MV of the merge candidate indicated by the merge index, 2) reference picture index, 3) reference direction, and the like.

In an embodiment, the merge candidates in the merge candidate list may be a specific mode in which inter prediction information is derived. Each merge candidate may be information indicating a specific mode in which inter prediction information is derived. Inter prediction information of the target block may be derived according to a specific mode indicated by the merge candidate. In this regard, the specific mode may be regarded as a specific inter prediction information derivation mode or a specific motion information derivation mode. The specific mode may include a series of processes for deriving inter prediction information.

Inter prediction information of the target block may be derived according to a specific mode indicated by a merge candidate selected through a merge index among the merge candidates in the merge candidate list. For example, the specific modes may include a sub-block motion information derivation mode and an affine motion information derivation mode, and may include other modes of deriving motion information described in the embodiments.

The skip mode may be a mode in which a residual block is not used. In other words, when the skip mode is used, the reconstructed block may be identical to the predicted block. The description of the merge mode in the embodiment may also be applied to the skip mode. The difference between the merge mode and the skip mode is whether the residual block is signaled and used. In other words, the skip mode may be similar to the merge mode except that the residual block is not transmitted/used, and the description of the merge mode may also be applied to the skip mode.

The sub-block merging mode may be a mode in which motion information of a target sub-block is derived for the target sub-block in the target block. When the sub-block merging mode is applied, a sub-block merging candidate list may be created using affine control point motion vector merging candidates and/or sub-block-based temporal merging candidates. The temporal merging candidates based on the sub-blocks may be motion information of COL sub-blocks of the target sub-block.

In the GPM, a first prediction block and a second prediction block may be generated using two pieces of motion information of a target block. For each coordinate of the target block, a final prediction block of the target block may be generated using a weighted sum of the first prediction sample of the first prediction block and the second prediction sample of the second prediction block.

Here, the first weight of the first predicted sample and the second weight of the second predicted sample of the weighted sum may be determined based on the boundary of the GPM. The boundary may indicate a partition line for partitioning the target block. The target block may be partitioned into a first partitioned area and a second partitioned area according to the boundary.

When the distance between the final prediction sample and the boundary is less than or equal to the reference value, a weighted sum of the first prediction sample in the first prediction block and the second prediction sample of the second prediction block may be used to determine a value of the final prediction sample of the final prediction block. When the distance between the final prediction sample point and the boundary is greater than the reference value, one of the first weight and the second weight may be 1, and the other may be 0.

The Combined Inter Intra Prediction (CIIP) mode may be a mode in which a prediction sample of the target block is derived using a weighted sum of a prediction sample generated by inter prediction and a prediction sample generated by intra prediction.

In the above mode, an improvement of the derived motion information itself may be performed, and the improved motion information may be used as the motion information of the target block. For example, a block in a specific area determined based on the derived motion information may be searched, and motion information of a block having a sum of minimum absolute differences (SAD) among the found blocks may be used as improved motion information of the target block. The specific region may be a square region within the reference image specified by the motion information. The point indicated by the motion information may be the center of the specific area.

In the foregoing mode, compensation of prediction samples derived by inter prediction may be performed using optical streaming.

Fig. 5 shows an order of adding spatial candidates to a candidate list according to an embodiment.

In fig. 5, the locations of the spatial candidates are depicted.

The large block at the center of the drawing may indicate the target block. Five tiles adjacent to the target tile may represent spatial candidates.

The coordinates of the target block may be (xP, yP), and the size of the target block may be (nPSW, nPSH).

Spatial candidate a ₀ may be a block adjacent to the lower left of the target block. A ₀ may be a block that occupies a sample point corresponding to the coordinate (xP-1, yP+ nPSH).

Spatial candidate a ₁ may be a block adjacent to the left side of the target block. A ₁ may be the lowest block of the blocks adjacent to the left side of the target block. Alternatively, a ₁ may be a block adjacent to the top of a ₀. A ₁ may be a block that occupies a sample point corresponding to the coordinate (xP-1, yP+ nPSH).

Spatial candidate B ₀ may be a block adjacent to the upper right of the target block. B ₀ may be a block occupying a sample point corresponding to the coordinates (xp+ nPSW, yP-1).

Spatial candidate B ₁ may be a block adjacent to the top of the target block. B1 may be the rightmost block of the blocks adjacent to the top of the target block. Alternatively, B ₁ may be the block adjacent to the left side of B ₀. B ₁ may be a block occupying a sample point corresponding to the coordinate (xP+ nPSW-1, yP-1).

Spatial candidate B ₂ may be the block adjacent to the upper left of the target block. B ₂ may be a block occupying a sample point corresponding to the coordinate (xP-1, yP-1).

As shown in fig. 5, when adding spatial candidates to the candidate list, the order of B ₁、A₁、B₀、A₀ and B ₂ may be used. That is, the available spatial candidates may be added to the candidate list in the order of B ₁、A₁、B₀、A₀ and B ₂. In fig. 5, the order in which the spatial candidates illustrated in fig. 5 are added to the merge candidate list is merely an example.

The candidate list may include a motion information candidate list, a merge candidate list, a MV candidate list, a BV candidate list, an MPM list, and the like.

To add a spatial or temporal candidate to the candidate list, it may be determined whether the spatial or temporal candidate is available. When a candidate block exists outside the boundaries of an image, a stripe, a parallel block, etc., the availability of the corresponding candidate block may be set to "false". The description that "availability of the corresponding candidate block may be set to false" may mean that the corresponding candidate block is designated as "unavailable".

The maximum number of candidates in the candidate list may be set. N may represent a preset maximum number of candidates. The preset maximum number of candidates may be signaled by a parameter set, header, etc. For example, the maximum number of candidates in the candidate list for the target block in the stripe may be set by the stripe header. For example, N may be substantially 5.

IBC mode

The IBC mode may be an intra block copy prediction mode in which a prediction block of a target block is generated with reference to a previously reconstructed region in a target image. In this regard, the IBC mode may be referred to as a current image reference mode. A Block Vector (BV) for specifying a previously reconstructed region may be used.

IBC mode usage information may be used to determine whether to encode/decode the target block in IBC mode. The encoding device 110 may determine whether an IBC mode is to be used to encode the target block, and may generate IBC mode usage information indicating whether to use the IBC mode. The decoding apparatus 150 may acquire IBC mode usage information from the bitstream.

In IBC mode, a prediction block for a target block may be generated based on BV. BV may specify a reference block. BV may refer to a displacement between a target block and a reference block. The reference block may be a block in the target image (picture). The description of MV in the embodiment may also be applied to BV.

IBC mode may include skip mode, merge mode, AMVP mode, and the like. The descriptions of the AMVP mode, the merge mode, and the skip mode in the embodiment may be similarly applied to the AMVP mode, the merge mode, and the skip mode of the IBC mode, respectively.

In the skip mode or the merge mode, a merge candidate list may be constructed, and the merge index may specify one of the merge candidates in the merge candidate list. The BV of the specified merge candidate may be used as the BV of the target block.

In AMVP mode, BVD may be used. The description of MVD in the examples may also apply to BVD.

The reference blocks in IBC mode may be limited to blocks in a previously reconstructed region in the target image. Alternatively, the reference block may be included in the target CTU or in at least one of the left CTUs. For example, the value of BV may be limited such that the reference block is located in a specific area. The specific region may be a region corresponding to three blocks of a specific size that are encoded/decoded before a block of a specific size including the target block is encoded/decoded. The specific size may be 64 x 64.

Transformation and quantization

The quantization level may be generated by performing transform/quantization on the residual block. A residual block may refer to a difference between an original block and a predicted block. The reconstructed residual block may be generated by performing inverse quantization and/or inverse transform on the quantization levels. The reconstructed residual block may refer to the difference between the reconstructed block and the prediction block.

When performing the transformation or the inverse transformation, a separable transformation or a two-dimensional (2D) inseparable transformation may be performed on the residual block. The separable transform may be a transform that performs a 1D transform on the residual block in each of the horizontal direction and the vertical direction.

The transformation kernels for transformation may include 1) various DCT kernels, such as DCT type 2 (DCT-II), 2) DST kernels, and 3) kernels derived by training. For 1D transforms, in addition to DCT-II, DCT types and DST types may include DCT-V, DCT-VIII, DST-I, and DST-VII.

The set of transforms may be used in determining the DCT type, DST type, or kernel derived by learning for the transform. Each transformation set may include a plurality of transformation candidates. Each transformation candidate may be a kernel derived by DCT type, DST type, or training.

The encoding device 110 may perform transformation and inverse transformation using the transformation candidates included in each transformation set. The decoding apparatus 150 may perform the inverse transform using the transform candidates included in each transform set. Transform selection information indicating which of a plurality of transform candidates included in a transform set applied to a residual block is to be used may be signaled. The transform selection information may include vertical transform selection information and horizontal transform selection information. The vertical transform selection information may indicate which of the transforms belonging to the transform set is to be used for the vertical transform. The horizontal transform selection information may indicate which of the transforms belonging to the transform set is to be used for the horizontal transform.

The transformation may include at least one of a primary transformation and a secondary transformation. Primary transform coefficients may be generated by performing a primary transform on the residual block, and secondary transform coefficients may be generated by performing a secondary transform on the transform coefficients. Here, the transform coefficients may include a primary transform coefficient and a secondary transform coefficient.

The primary transform may refer to a Multiple Transform Selection (MTS) that applies different transforms to each 1D direction (i.e., vertical and horizontal directions).

The secondary transform may be a transform for increasing the energy concentration of transform coefficients generated by the primary transform. The secondary transform may be 1) a separable transform such as a primary transform, and 2) a 2D non-separable transform. The 2D inseparable transform may refer to a low frequency inseparable transform (LFNST) or an inseparable primary transform (NSPT).

NSPT can be applied to specific block sizes (such as 4 x 4, 4 x 8, 8 x 4, 4 x 16, 16 x 4, 8 x 8, 8 x 16, and 16 x 8) for intra coding.

The primary transformation may be performed using at least one of a plurality of predefined transformation methods. In an example, the plurality of predefined transformation methods may include DCT, DST, KLT or the like. Furthermore, the primary transform may be a transform having various transform types, depending on the transform kernel defining the DCT and DST. For example, depending on the multiple transform cores, the primary transform may include multiple transforms such as DCT-2, DCT-4, DCT-5, DCT-7, DCT-8, DST-1, DST-2, DST-4, DST-7, and DST-8.

In an embodiment, the transform type may be determined based on coding parameters associated with the target block. For example, the transform type may be determined based on one or more of 1) a prediction mode (e.g., one of intra prediction and inter prediction) of the target block, 2) a size of the target block, 3) a shape of the target block, 4) an intra prediction mode of the target block, 5) a component (e.g., one of a luminance component and a chrominance component) of the target block, and 6) a partition type (e.g., one of QT, BT, TT, and an inseparable type) applied to the target block.

As in the case of the primary transform, a set of transforms may be defined even in the secondary transform. The method for deriving and/or determining a set of transforms according to embodiments may also be applied to secondary transforms as well as primary transforms.

In an embodiment, the primary transform and/or the secondary transform may be determined for a particular target. The transformation selection information may include transformation target information. The transformation target information may refer to a target to which the primary transformation and/or the secondary transformation is applied.

For example, the primary transform and/or the secondary transform may be applied to one or more of the luma component and the chroma component.

In an embodiment, the transform selection information may include primary transform usage information and secondary transform usage information. The primary transform usage information may indicate whether the primary transform is applied to a residual block of the target block. The secondary transform usage information may indicate whether the secondary transform is applied to a residual block of the target block.

In an embodiment, whether to apply the primary transform and/or the secondary transform may be determined based on encoding parameters of the target block/neighboring block (such as the size and shape of the target block/neighboring block).

In an embodiment, the transform selection information may include primary transform selection information and secondary transform selection information. The primary transform selection information may indicate a transform method to be applied to the residual block among a plurality of transform methods that can be used in the primary transform. The primary transform selection information may be a primary transform index. The secondary transform selection information may indicate a transform method to be applied to the transform coefficient among a plurality of transform methods that can be used in the secondary transform. The secondary transform selection information may be a secondary transform index.

In an embodiment, the transformation methods of the primary and secondary transforms may be derived separately based on specific information such as encoding parameters. For example, the encoding parameters may include encoding parameters for the target block/neighboring block.

In an embodiment, information related to the transformation, such as transformation selection information and sub-information of the transformation selection information, may be signaled for a specific target. For example, the specific target may be a CU.

Information related to the transformation, such as transformation selection information, and sub-information of the transformation selection information may be derived for a particular objective. For example, the specific target may be a CU.

The quantization level may be generated by performing quantization on a result generated by performing the primary transform and/or the secondary transform or on the residual block.

The description of the above transformation may also be applied to the inverse transformation. In the present application, the inverse process with respect to the process described by the transformation may be performed in the inverse transformation. The term "transform" in the names of the components related to the transform may be changed to the term "inverse transform". Furthermore, the input of the transformation may be regarded as the output of the inverse transformation. The transformed output may be regarded as an input to an inverse transformation. The decoding device 150 may acquire transformation-related information such as transformation selection information, and may perform inverse processing of transformation-related processing indicated by the transformation-related information by using the transformation-related information.

The target block may include a plurality of sub-blocks. Each sub-block may be defined according to a minimum block size or a minimum block shape. The target block may be partitioned into a plurality of sub-blocks, each of which may include coefficients such as 4×4, 2×8, and 8×2. The target block may be a transform block. The transform coefficients or quantization levels may be represented in the shape of blocks. The transform coefficients may be quantized transform coefficients.

The transform coefficients or quantization levels may be scanned according to at least one scan type selected from a diagonal scan, a vertical scan, and a horizontal scan. The diagonal scan may be an upper right diagonal scan or a lower left diagonal scan.

For example, coefficients of a block may be changed or arranged in 1D vector form by scanning the coefficients using diagonal scanning. The vertical scanning may be an operation of scanning a coefficient having a 2D block shape in a column direction. The horizontal scanning may be an operation of scanning a coefficient having a 2D block shape in a row direction.

The scan type of the coefficients may be determined based on coding parameters such as intra prediction mode, block size, and block shape. For example, which of diagonal scan, vertical scan, and horizontal scan is to be used may be determined based on encoding parameters such as intra prediction mode, block size, and block shape. The block may be a transform unit.

A scan depending on each scan type may start at a particular start point and may end at a particular end point.

In the scanning, a scanning order corresponding to a scanning type may be first applied to the sub-blocks. Next, a scan order corresponding to the scan type may be applied to the transform coefficients or quantization levels in each sub-block.

The encoding apparatus 110 may generate a bitstream including the entropy-encoded transform coefficients or the entropy-encoded quantization levels by performing entropy encoding on the transform coefficients or the quantization levels.

The decoding apparatus 150 may acquire the entropy-encoded transform coefficient or the entropy-encoded quantization level from the bitstream, and may generate the transform coefficient or the quantization level by performing entropy decoding on the entropy-encoded transform coefficient or the entropy-encoded quantization level. The coefficients may be arranged in 2D blocks by inverse scanning. The arrangement of the inverse scan may be a rearrangement of the arrangement of the scans.

The inverse scanned transform coefficients or the inverse scanned quantization levels may be generated by inverse scanning the coefficients. Here, the inverse scan type may include diagonal scan, vertical scan, and horizontal scan, and an inverse scan type of inverse transform corresponding to the transformed scan type may be selected.

The decoding apparatus 150 may perform dequantization on the (inversely scanned) coefficients. Depending on whether the secondary inverse transform is to be performed, the secondary inverse transform may be performed on the result of performing the inverse quantization. Further, depending on whether the primary inverse transform is to be performed, the primary inverse transform may be performed on the result of performing the secondary inverse transform. The reconstructed residual block may be generated by selectively performing a secondary inverse transform and a primary inverse transform on the coefficients.

Filtering

To improve the image quality of the image, filtering may be performed on the block. The value of the target sample point may be determined or updated by filtering.

The target sample point may be one of the sample points described in the embodiments. For example, the target samples may be one or more of the samples described in the embodiments, such as a prediction sample, a reference sample, a residual sample, a reconstruction sample, and a filtered reconstruction sample.

The target samples may be samples within one or more of a target picture, a target slice, a target CTB, a target block, a reference sample line, and a template. The target block may be one of the above-described blocks. For example, the target block may be one or more of the blocks described in the embodiments, such as a transform block, a prediction block, a reference block, a residual block, and a reconstruction block.

In an embodiment, the filtering process described as being applied to one object may also be applied to another object. For example, the filter processing described in the specific loop filter may be applied to a transform block, a prediction block, a reference block, a residual block, or the like.

A particular type of filtering may be used for the filtering process according to embodiments. The filter types may include filter taps (or filter tap lengths), filter shapes, filter strengths, filter coefficients (or weights), and offsets for each filter.

The filter taps may represent the number of input samples for the filter. The input sample points may include target sample points. Alternatively, the input samples may include specific values determined for the target samples. The input samples may include one or more reference samples. One or more reference samples may be determined based on the properties of the target block described in the embodiments. The attributes may include encoding parameters. For example, the attribute of the target sample point may include the location of the target sample point. One or more reference spots may be specified according to the position relative to the position of the target spot.

The filter shape may represent a shape formed by the input samples. The specific value determined for the target sample point may be regarded as the target sample point. In other words, when a specific value determined for a target sample is used as an input sample of a filter, the target sample may also be considered to form a filter shape.

The samples whose values are determined by filtering may include a plurality of samples. The filter strength may represent the range of samples whose values are determined by filtering. The filter strength may be one of a strong filter strength and a weak filter strength. The number of samples whose values are determined by the strong filter strength may be greater than the number of samples whose values are determined by the weak filter strength. Alternatively, the filter strength may represent a range of values that are changed by filtering. The range of sample values that are changed by the strong filter strength may be wider than the range of sample values that are changed by the weak filter strength.

The filter coefficients may be coefficients or weights of the input samples.

The offset may be a specific value to be added to the result (such as a weighted sum) of the computation performed using the values of the input samples and coefficients.

Common for filtering, interpolation and sampling may be that the value of the sample point is updated. Thus, the description of any one of filtering, interpolation, and sampling in the embodiments is also applicable to the other one of filtering, interpolation, and sampling. Here, the sampling may include at least one of upsampling, downsampling, and subsampling.

The filtering may include filtering performed by the predictor 123, the predictor 163, or the like.

When encoding a target block, there may be a prediction error between an original sample in the original block and a prediction sample in the prediction block. In order to reduce the prediction error, filtering may be performed on at least one of a prediction sample in the prediction block and a reference sample referred to for prediction.

For example, in intra prediction, the reference samples may include one or more of an upper left reference sample, an upper right reference sample, a left reference sample, and a lower left reference sample. Filtering of the prediction samples may be performed by applying specific weights to the prediction samples, the left reference samples, the above reference samples, and/or the above left reference samples, respectively.

Filtering of at least one of the prediction samples and the reference samples may be performed based on the properties of the target block and the properties of the prediction samples. For example, whether to perform filtering, a type of filter, a region to which filtering is applied, a filtering weight, a reference sample, a range of reference samples, and a position of the reference sample may be determined based on the attribute of the target block and the attribute of the prediction sample.

For example, the attributes of the target block may include information related to the target block described in the embodiments, such as 1) size, 2) prediction mode, 3) intra prediction mode, 4) reference sample line, 5) sample value, and 6) coding parameter of the target block.

For example, the attributes of the prediction samples may include information related to the prediction samples described in the embodiment, such as 1) the sample value of the prediction samples and 2) the position of the prediction samples in the target block, and may include coding parameters related to the prediction samples.

The filtering may include loop filtering performed by filter 130, filter 170, etc.

Fig. 6 shows a plurality of loop filters according to an example.

The plurality of loop filters for loop filtering may include one or more of a Luma Map (LMCS) with chroma scaling, a deblocking filter, a Sample Adaptive Offset (SAO), and an Adaptive Loop Filter (ALF).

The plurality of loop filters may be sequentially connected to each other. For example, a plurality of loop filters may be connected in the order LMCS, deblocking filter, SAO, and ALF. Furthermore, the plurality of loop filters may be connected in a sequence of all available permutations of the plurality of loop filters. The output of one of the plurality of loop filters may be used as the input of the next filter.

As shown in fig. 6, the input image may be input to a first filter. The input image may be a block described in the embodiment. For example, the input image may be a reconstructed block generated by adder 129 or adder 169. The output of one filter may be the input of the next filter. The output image may be generated by the last filter. The output image may be the filtering block described in the embodiments. For example, the output image may be a filtered reconstructed image generated by filter 130 or filter 170.

The target block may represent an image input to the filter. The filtered target block may represent an image output from the filter.

LMCS may include luminance signal mapping of a luminance signal of a target block and chrominance signal scaling of a chrominance signal of the target block.

The luminance signal map may perform codeword reassignment for the luminance signal.

The luminance signal map may include a forward map and a reverse map. In forward mapping, the existing dynamic range may be partitioned into multiple bins. The mapping dynamic range may be determined by performing codeword reassignment on the input image using a linear model for the corresponding interval. In the reverse mapping, reverse mapping from the mapped dynamic range to the existing dynamic range is performed.

The chrominance scaling may correct the chrominance signal according to the correlation between the luminance signal and its corresponding chrominance signal.

The forward mapping may be performed between inter prediction of the luminance signal and reconstruction of the luminance signal, and the forward mapping may be performed between inter prediction for the luminance signal and chroma scaling. Inverse mapping may be performed between the reconstruction of the luminance signal and loop filtering for the luminance signal. Chroma scaling may be performed between the inverse transform and the reconstruction of the chroma signal.

With this structure, inverse quantization of the luminance signal and the chrominance signal, inverse transformation of the luminance signal and the chrominance signal, prediction of the luminance signal, and reconstruction of the luminance signal can be performed within the mapped dynamic range. Loop filtering of luminance and chrominance signals, inter prediction of luminance and chrominance signals, intra prediction of chrominance signals, and reconstruction of chrominance signals may be performed within an existing dynamic range.

The deblocking filter may remove block distortion occurring at boundaries between blocks in the reconstructed image. For example, the block may be a transform block. Furthermore, a block may be a sub-block of a particular block described in the embodiments. Here, the boundary between blocks may refer to a sample adjacent to the boundary between blocks.

Deblocking filters may be applied to vertical and horizontal boundaries between blocks. After filtering is performed on the vertical boundaries between blocks, filtering may be performed again on the horizontal boundaries between the filtered blocks.

A deblocking filter may be selectively applied. Whether to apply the deblocking filter to the target block may be determined based on at least one of samples included in a particular number of columns or rows in the target block and samples included in a particular number of columns or rows in a neighboring block adjacent to the particular boundary.

When a deblocking filter is to be applied to a target block, the filter to be applied may be determined according to a required strength of the deblocking filter. In other words, a filter determined according to the intensity of the deblocking filter among a plurality of other filters may be applied to the target block. The plurality of filters may include one of a long tap filter, a strong filter, a weak filter, a gaussian filter.

The maximum length of the deblocking filter may be determined based on properties of the target block, such as the size of the target block, the components of the target block, and the coding parameters of the target block.

The SAO may perform compensation for distortion between the original image and the reconstructed image based on the samples. To compensate, the SAO may apply an appropriate offset to the sample value of the sample. That is, an offset may be added to the sample value.

An offset may be determined for the target block. For example, an offset may be determined for each component of the CTB. The determined offset may be applied to samples in a particular component of the CTB.

The SAO may include SAO including Edge Offset (EO) and SAO using Band Offset (BO). Depending on the characteristics of the samples in a particular block, such as a CTU, it may be determined whether SAO using EO is to be performed and SAO using BO is to be performed, respectively.

In SAO using EO, correction of distortion in a sample point may be performed based on the direction of edges in a target block. The style category of EO may include horizontal style, vertical style, 135 degree diagonal style, and 45 degree diagonal style. For a target block, information indicating a style category applied to the target block and a plurality of offsets of the style category may be signaled. The number of offsets may be 4. For a target sample in a target block, a sample adjacent to the target sample may be determined according to the direction of the style category. The offset to be applied to the target sample point may be determined by the pattern of neighboring sample points.

In the offset using BO, the luminance value in the target block can be classified into a specific band, and thus distortion in the sample can be corrected. The bit depth of the input image may be divided into m sections. For example, m may be 32. The specific band may be n consecutive intervals among the m intervals. For example, n may be 4. N offsets for n intervals may be signaled. Further, information indicating a first section among n sections selected from among the m sections may be signaled. The offset of the interval corresponding to the target sample may be added to the sample value of the target sample in the target unit.

ALF may perform compensation for distortion between reconstructed and original images.

The filter coefficients of the ALF may be signaled by a bitstream.

The filter shape of the ALF may be determined by the components of the target block. For example, a diamond filter having a size of 7×7 may be used for the luminance component. A diamond filter with a size of 5 x 5 may be used for the chrominance components.

In ALF, characteristics of a specific block may be determined for the specific block, and a category of the specific block may be decided according to the characteristics. In other words, the determination of ALF and the determination of class may be performed on a4 x 4 block basis. The filter coefficients may be calculated according to class. The specific block may be a block having a size of 4×4.

One of the 25 categories may be decided as the category of the particular block based on the direction and activity determined using the gradient of the particular block. Depending on the gradient of a particular block, a rotational transformation, a vertical symmetric transformation, and/or a diagonal symmetric transformation of the filter may be applied.

Information regarding whether ALF is applied may be signaled for a specific unit such as CTB.

An index indicating a filter to be applied to a specific unit among the available filters may be signaled. Here, the available filters may include fixed filters and filters configured using parameter sets. For example, the parameter set may be an Adaptive Parameter Set (APS). The fixed filter may be equivalently defined by the encoding device 110 and the decoding device 150. Filter coefficients of a filter configured using the parameter set may be determined based on the encoding parameters.

Entropy encoding and entropy decoding

Fig. 7 illustrates entropy encoding and entropy decoding according to an example.

In the upper part of fig. 7, the process of entropy encoding by the entropy encoder 139 is shown.

The entropy encoder 139 may include a context modeler, a binarization unit, and an entropy encoding unit. The context modeler may include a context selection unit and a context memory.

The binarization unit may generate a binary number for the syntax element by performing binarization on the syntax element of the target block. Binarization may be the process of converting syntax elements into binary form.

Information about syntax elements and binary numbers may be provided from the binarization unit to the context selection unit.

The context modeler may perform the updating of the context.

The context may represent occurrence probability information of each binary number of the syntax element that has been encoded.

The context modeler may perform a context update to apply the current probability information to the encoding of the binary number of the syntax element of the target block. The updated context may be stored in a context memory. Here, the updated context corresponding to the syntax element of the target block (binary number in the syntax element of the target block) may be derived by the context modeler.

The context selection unit may select a context corresponding to a binary number of the syntax element of the target block. The selected context may be loaded from the context memory and may be used as an updated context for entropy encoding of the binary number of syntax elements of the target block.

The updated context may be used for entropy encoding of syntax elements of the target block.

The entropy encoder may generate encoding information of a syntax element of the target block by performing entropy encoding using the generated binary number and the updated context, and may generate a bitstream including the encoding information. The entropy encoder may use at least one of an arithmetic coding method and a bypass coding method.

In the lower part of fig. 7, a process of entropy decoding by the entropy decoder 161 is shown.

The entropy decoder 161 may include a context modeler, an entropy decoding unit, and an inverse binarization unit. The context modeler may include a context selection unit and a context memory.

The context modeler may perform the updating of the context.

The context may represent occurrence probability information of each binary number of the syntax element that has been decoded.

The context modeler may perform a context update to apply the currently decoded probability information to entropy decoding of binary numbers of syntax elements of the target block. The updated context may be stored in a context memory. Here, the updated context corresponding to the syntax element of the target block (binary number in the syntax element of the target block) may be derived by the context modeler.

The context selection unit may select a context corresponding to a binary number of the syntax element of the target block. The selected context may be loaded from the context memory and may be used as an updated context for entropy decoding of binary numbers of syntax elements of the target block.

The updated context may be used for entropy decoding of syntax elements of the target block.

The entropy decoder may generate a binary number of the syntax element of the target block by entropy decoding the encoded information of the bitstream based on the updated context. The entropy decoder may use at least one of an arithmetic decoding method and a bypass decoding method.

The inverse binarization unit may acquire syntax elements of the target block by performing inverse binarization on at least one binary number of the generated binary numbers. Inverse binarization may be a process of converting at least one binary number of binary numbers into a form of syntax element.

Information about syntax elements and binary numbers may be provided from the inverse binarization unit to the context selection unit.

Each syntax element may be one of the coding parameters described in the embodiments.

Method for binarization, inverse binarization, entropy coding and entropy decoding

In an embodiment, in order to perform signaling of specific information, one or more of a binarization method, an inverse binarization method, an entropy encoding method, and an entropy decoding method, which will be listed below, may be used.

Signed 0 th order exponential Golomb (exp_golomb) binarization/inverse binarization method (abbreviated as se (v))

Signed k-th order Exp-Golomb binarization/inverse binarization method (abbreviated sek (v))

0 Th order exp_golomb binarization/inverse binarization method (abbreviated as ue (v)) for unsigned positive integer

K-th order exp_golomb binarization/inverse binarization method (abbreviated as uek (v))

Fixed length binarization/inverse binarization method (abbreviated as f (n))

-Truncated Rice (Rice) binarization/inverse binarization method or truncated unary binarization/inverse binarization method (abbreviated as tu (v))

-Truncated binary/inverse binary method (abbreviated to tb (v))

Context-adaptive arithmetic coding/decoding method (abbreviated ae (v))

Bit string in bytes (abbreviated as b (8))

Signed integer binarization/inverse binarization method (abbreviated as i (n))

-Unsigned positive integer binarization/inverse binarization method (abbreviated as u (n)) (where "u (n)" may represent a fixed length binarization/inverse binarization method).

-Unitary binarization/inverse binarization method

Adaptive execution of processing in an embodiment

The above-described processes may be performed in the encoding device 110 and the decoding device 150 using the same method and/or a corresponding method. Further, in image encoding and/or decoding, a combination of one or more of the foregoing embodiments may be used.

The order of application of the embodiments may be different in the encoding device 110 and the decoding device 150. In the encoding device 110 and the decoding device 150, the order of application of the embodiments may be (at least partially) the same.

The processing according to the embodiment may be performed for each processing target. The process according to the embodiment may be equally performed for a specific object. For example, such specific targets may be luminance signals and/or chrominance signals.

The process according to the embodiment may be selectively applied/executed based on a specific condition or a specific target.

In an embodiment, the process according to an embodiment may be selectively applied/performed according to a temporal layer. The temporal layer information for a specific process may be information indicating a temporal layer to which a process can be applied/performed. The temporal layer information may be signaled for a particular process. The temporal layer information may represent a lowest layer and/or an uppermost layer to which a specific process can be applied, and may indicate a specific layer to which a specific process can be applied/performed. Alternatively, a fixed time layer to which the process according to the embodiment is applied/performed may be defined.

In an embodiment, a type to which the process according to the embodiment is applied/executed may be defined, and it may be determined whether the process according to the embodiment is applied/executed based on the defined type. The types may include picture types, slice types, parallel block types, and so on.

According to the description of the embodiment, when a specific process is applied/performed to a specific target, a specific condition may be required, and the specific process may be performed under a specific determination. The determination of whether a particular condition is met based on a particular coding parameter or when a particular determination is made based on a particular coding parameter may be interpreted as a replacement with another coding parameter. In other words, the particular conditions described in the embodiments or encoding parameters affecting a particular decision may be considered exemplary, and it is understood that one or more other encoding parameters or a combination of one or more other encoding parameters perform the function of the specified encoding parameters in addition to the specified encoding parameters.

The processes in the embodiments may be applied/performed based on the size of at least one of the blocks described in the embodiments. For example, the blocks may include an encoding block, a prediction block, a transform block, a reference block, a current block, and a target block. Alternatively, in an embodiment, a block may comprise a neighboring block. Here, the size may be defined as a minimum size and/or a maximum size that allows the embodiment to be applied, and may be defined as a fixed size for the process in the embodiment. Further, for the processing of the case of the embodiment, the first embodiment may be applied to the first size, and the second embodiment may be applied to the second size. That is, the processes of the embodiments can be applied compositely according to the size. Further, the processing in the embodiment may be applied only to a case where the size of the target is equal to or larger than the minimum size and smaller than or equal to the maximum size. That is, the processing in the embodiment may be applied only to the case where the block size falls within a specific range.

Hereinafter, improved techniques relating to image encoding and decoding implemented in an image encoding apparatus and an image decoding apparatus will be disclosed.

In the present disclosure, the "case where an indicator indicating whether a specific method is performed is true" may refer to a case where whether a specific method is performed based on a prediction mode, motion information, coding parameters, and/or a position indicated by the indicator is true. Similarly, "the case where the indicator indicating whether to execute the specific method is false" may refer to the case where the indicator indicating whether to execute the specific method is not true.

In the present disclosure, the "case of executing a specific method/mode" may refer to a case where an indicator indicating whether to execute a specific method/mode has a first value. The first value may be true or 1.

In the present disclosure, when a specific mode (or a specific method) is not enabled in a specific unit, signaling/encoding/decoding of at least one of syntax elements for the specific mode (or the specific method) may be omitted in the specific unit and sub-units of the specific unit.

The specific unit may be at least one of a video sequence, a picture, a sprite, a slice, a parallel block, a partition, a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), a Transform Unit (TU), a Coding Block (CB), a Prediction Block (PB), or a Transform Block (TB), and the syntax element describing whether a specific mode is enabled in the specific unit may refer to at least one of a video parameter set, a decoding parameter set, a sequence parameter set, an adaptive parameter set, a picture header, a sprite header, a slice header, a parallel block group header, a parallel block header, a block, CTU, CU, PU, TU, CB, PB, or a TB.

The sub-units may be lower level units than the bit units. In other words, a sub-unit may refer to a unit belonging to an area of a specific unit or information referring to a specific unit. For example, if a particular unit is a slice, its sub-unit may be CTU, CU, PU or a TU that belongs to and references the syntax element of the respective slice.

When it is stated that a particular mode (or particular method) is performed only under certain conditions, it may mean that the particular mode (or particular method) is enabled only when those conditions are met.

In the present disclosure, "motion information improvement" and "motion information refinement" may be used interchangeably and may be replaced with each other.

In the present disclosure, "motion information improvement value" and "motion information refinement vector" may be used interchangeably and may be replaced with each other.

In the present disclosure, deriving the second motion information from the first motion information may represent acquiring the second motion information by refining the first motion information.

1. Adaptive Motion Vector Resolution (AMVR) and Adaptive Block Vector Resolution (ABVR)

In an embodiment, the term "resolution" may indicate "motion vector resolution".

In the adaptive motion vector resolution, the resolution of the motion vector difference may be adjusted in block units.

The adaptive motion vector resolution information may indicate a resolution of a motion vector difference. The resolution of the motion vector difference of the target block may be determined by signaling/encoding/decoding of the adaptive motion vector resolution information.

The information related to the adaptive motion vector resolution may include at least one of an indicator indicating whether to use the adaptive motion vector resolution, an index indicating the motion vector resolution, a number of resolution candidates in the adaptive motion vector resolution, and a type of resolution candidates in the adaptive motion vector resolution. For example, if an indicator indicating whether the adaptive motion vector resolution is used in the target block is true, an index indicating the motion vector resolution in the target block may be signaled/encoded/decoded.

The motion vector resolutions applicable to each block may be the same or different from each other. For example, the resolution of the motion vector applicable to the target block may be determined based on at least one of encoding parameters, motion information, and mode information of the target block.

Adaptive motion vector resolution can improve coding efficiency by adjusting the resolution of the motion vector difference.

For example, the adjusted resolution may be one of 16 pixels, 8 pixels, 4 pixels, full pixels, half pixels, and quarter pixels. However, the present disclosure is not limited to a particular resolution, but may employ various motion vector resolutions.

When the adjusted resolution is n pixels, if the component value of the motion vector difference is changed by 1, the position indicated by the motion vector difference may be changed by n pixels. In other words, when the adjusted resolution of the target block is n pixels, each component of the motion vector difference may indicate the reference block in units of n pixels.

For example, in the case where the motion vector difference to be actually applied to the target block is (a, b) and the adjusted resolution is p pixels, instead of (a, b), the (a/p, b/p) may be encoded. In other words, the motion vector difference signaled/encoded in the encoding device may be (a/p, b/p). The decoding device may derive the original motion vector difference (a, b) again by multiplying the signaled motion vector difference (a/p, b/p) by p.

Hereinafter, the description of the adaptive motion vector resolution may be equally applied to the adaptive block vector resolution.

2. Template matching

Fig. 8 illustrates one embodiment of a template match.

In template matching, motion information of a target block may be determined and/or changed based on a result of calculating a cost function between a target template of the target block and a reference template of a reference block.

In some embodiments, template matching may be used to determine motion information for the target block. Based on the cost calculation between the target template and each reference template, motion information corresponding to a displacement from the position of the target block to the position of the reference block of the reference template having the lowest cost may be used as the motion information of the target block.

In another embodiment, template matching may be used to change or refine motion information of the target block. For example, a reference template of a reference block located at a position indicated by initial motion information for a target block and a position separated from the position by a predetermined direction and interval may be determined. Then, based on the cost calculation between the target template and each reference template, motion information corresponding to the displacement from the target block to the reference block of the reference template having the lowest cost is determined as the final motion vector of the target block.

In another embodiment, template matching may be used to reorder motion information candidates included in a motion information candidate list of a target block. For example, a reference template of a reference block at a position indicated by each of the motion information candidates is configured. The motion information candidates are then reordered in ascending order of cost based on the calculation of cost between the target template and the reference template. By reordering, a low index can be assigned to a candidate with a high probability of being selected as motion information of the target block.

In the template matching, the reference block may include at least one of a reference block to which initial motion information is directed, a reference block to which motion information derived during a search process of the template matching is directed, and a reference block to which motion information finally improved by the template matching is directed.

Here, the initial motion information may be motion information of a target block signaled from the encoding apparatus to the decoding apparatus. The motion information improved by the template matching may be the motion information with the lowest matching cost derived during the search process of the template matching. However, the method of deriving motion information is not limited to the above criteria.

The template matching method may include at least one of an inter-frame template matching pattern and an intra-frame template matching pattern.

The inter template matching mode may refer to a template matching method of configuring a reference block based on at least one of a prediction sample, a reconstruction sample, and a residual sample of a reference picture reconstructed before a target picture.

The intra template matching mode may refer to a template matching method of configuring a reference block based on at least one of a prediction sample, a reconstruction sample, and a residual sample of a target picture.

The template matching cost may be a result obtained by calculating a cost using a cost function between the template of the target block and the template of the reference block used in the template matching. The template matching cost may mean a template matching cost for displacement between a target block and a reference block of the template matching, i.e., a template matching cost of motion information.

When the target block is a chrominance component block, a template matching cost may be determined based on sample values of a reconstructed luminance component region corresponding to the chrominance component block.

As an exemplary embodiment, when the target block is a chrominance component block, the template matching cost may mean a cost between a sample point within a luminance component region corresponding to a reference chrominance component block determined based on initial motion information of the target chrominance component block and a sample point within a luminance component block corresponding to the target chrominance component block. Here, the initial motion information may be a zero vector or a motion vector obtained by scaling a predetermined motion vector of a luminance component region corresponding to the target chrominance component block based on the chrominance component format. Here, the initial motion information is that the zero vector indicates that the position of the target chroma component block within the target picture is the same as the position of the reference chroma component block within the reference picture.

The luminance component region corresponding to the target chrominance component block may be regarded as one luminance component block, and template matching may be performed on the luminance component block. Then, motion information of the target chroma component block may be determined based on the motion information exhibiting the minimum cost.

When a plurality of luminance component blocks exist in a luminance component region corresponding to a target chrominance component block, motion information of the target chrominance component block may be determined by performing template matching on at least one luminance component block of the plurality of luminance component blocks.

In another embodiment, the template matching cost for the target chroma component block may be obtained by calculating a cost between a reference template for the reference chroma component block or a luma component region corresponding to the reference chroma component block and a target template for the target chroma component block or a luma component region corresponding to the target template for the target chroma component block.

2.1 Template configuration

Templates for template matching may include a target template and a reference template.

The target template may be configured by a reference area including samples around the target block. The reference region of the target block may include at least one of a sample point located in a lower left region, a left side region, an upper left region, an upper right region, and an upper left region of the target block.

In some embodiments, the target template may be the same as the reference region of the target block.

In some other embodiments, when the target template is configured for template matching, a portion of the samples within the reference region of the target block may be selected. And the target template may be configured using the selected samples.

The reference template may be configured using a reference region including neighboring samples of the reference block. The reference region of the reference block may be a region corresponding to the reference region of the target block. For example, the reference region of the reference block may include at least one of samples located in a lower left region, a left side region, an upper left region, an upper right region, and an upper left region of the reference block.

In some embodiments, the reference template for template matching may be the same as the reference region of the reference block. For example, the samples of the reference template based on the reference block may be samples corresponding to the samples of the target template based on the target block.

In some other embodiments, when the reference template is configured for template matching, a portion of the samples within the reference region of the reference block may be selected. And the reference template may be configured using the selected samples. For example, the samples selected for configuring the reference template based on the reference block may be samples corresponding to the samples selected for configuring the template of the target block based on the target block.

In the inter template matching mode, each of the reference block, the reference template, and the reference region is composed of at least one of a prediction sample, a reconstruction sample, and a residual sample of a reference picture reconstructed before the target picture.

In the intra template matching mode, each of the reference block, the reference template, and the reference region is composed of at least one of a prediction sample, a reconstruction sample, and a residual sample of the target picture.

The target/reference templates for template matching may include at least one of 1) at least one of the sample points within the TMSIZE _left bar line adjacent to the LEFT side of the target/reference block, and 2) at least one of the sample points within the TMSIZE _above bar line adjacent to the top of the target/reference block.

However, the positional relationship between each of the sample points in the template and the target/reference block and/or the method of configuring the template are not limited to the above-described relationship or method.

Each of TMSIZE _left and TMSIZE _above may be a positive integer greater than or equal to 0,1, 2, 3, 4, or 4.

TMSIZE _LEFT and TMSIZE _ABOVE may be equal to each other. Alternatively, TMSIZE _left and TMSIZE _above may be different from each other.

Each of TMSIZE _left and TMSIZE _above may be a predefined value or a value determined based on signaled/encoded/decoded information.

Each of TMSIZE _left and TMSIZE _above may be determined based on at least one of motion information, coding parameters, size, and prediction mode of the target block.

For example, when the width of the target block is W and its height is H, TMSIZE1 may be used as TMSIZE _left/TMSIZE _above if the smaller (or larger) of W and H is less than (or greater than) TMSIZE _thres, otherwise TMSIZE2 may be used as TMSIZE _left/TMSIZE _above.

TMSIZE1 and TMSIZE may be predefined values.

TMSIZE1 and TMSIZE may be, but are not limited to, 0,1, 2, 4, or positive integers, respectively.

To configure the templates, interpolation filtering may be performed.

The interpolation filter used to configure the template is the same as or different from the interpolation filter used for motion compensation of the prediction block.

The difference between interpolation filters may be due to a change in at least one of the interpolation filter type, interpolation filter tap, and interpolation filter coefficient, however, the criteria for determining the difference of the interpolation filters is not limited to a particular factor.

For example, to reduce computational complexity, an interpolation filter with fewer taps than an interpolation filter for motion compensation of the prediction block may be used to configure the template.

In another embodiment, reference sample filtering may be performed to configure the templates.

The reference sample filter used to configure the template is the same as or different from the reference sample filter used for intra prediction of the prediction block.

For example, the difference between the reference point filters may be due to a change in at least one of the reference point filter type, the reference point filter tap, and the reference point coefficient, however, the criteria for determining the difference of the reference point filters is not limited to a particular factor.

For example, to reduce computational complexity, the template may be configured using a reference sample filter having fewer taps than a reference sample filter used for intra prediction of the prediction block.

In addition, template matching for the target block may be performed using a plurality of target templates. Template matching using a plurality of target templates may be performed as follows.

By performing template matching using each individual target template, a matching cost for each target template can be calculated. Regarding the matching cost for the target template, the template matching is performed using the target template, and the lowest cost obtained from the template matching may be referred to as the cost for the reference block or the motion information indicating the reference block.

Based on the matching cost for each target template, a prediction block of the target block may be generated using one of the reference blocks determined by the motion information derived from each target template. For example, the reference block with the lowest matching cost may be determined as the prediction block of the target block.

Alternatively, based on the matching cost for each target template, a prediction block of the target block may be generated using at least two reference blocks determined by motion information derived from each target template.

For example, the prediction block of the target block may be generated by a weighted sum of a plurality of reference blocks. The weight to be applied to each reference block may be determined based on the matching cost for each reference block. For example, higher weights may be applied to reference blocks with lower matching costs.

Whether to generate a prediction block of the target block using a reference block determined by motion information derived from a specific target template may be determined based on at least one of a size of the target block, motion information of the target block, an encoding mode of the target block, encoding parameters of the target block, a matching cost for the specific target template, and a matching cost for at least one other target template of the target block.

In an exemplary embodiment, whether to generate the prediction block of the target block using the reference block determined by the motion information derived from the first target template may be determined based on the matching cost of the second target template having the lowest matching cost among the plurality of target templates. For example, if the cost difference between the first target template and the second target template is less than or equal to a predetermined threshold, a reference block derived from the first target template may be used.

When performing template matching using a plurality of target templates, the size of the template area used may be always the same.

Alternatively, in the case where template matching using a plurality of target templates is performed, the size of the template region used may be different. In this case, the matching cost for each target template may be replaced with a result obtained by multiplying or dividing the matching cost for the corresponding target template by a predetermined value.

The predetermined value may be determined based on at least one of a plurality of target templates for the target block (e.g., a size of an area of at least one of the plurality of target templates for the target block), a size of the target block, motion information of the target block, an encoding mode of the target block, and an encoding parameter of the target block.

If the cost function used to calculate the matching cost is based on the sum of the errors among the samples within the target template rather than the average of the errors, the matching cost may increase in proportion to the size of the template region. In this case, it may be unsuitable to directly compare matching costs for the first template and the second template occupying the areas of different sizes. Thus, by taking into account the size differences or ratios between template regions, the matching costs calculated for different sized template regions can be normalized to the dimensions of the same sized regions. For example, the matching cost may be scaled to a predetermined value determined by considering the size of the template region. Scaling may be performed by a multiplication or division operation between the matching cost and a predetermined value.

FIG. 9 illustrates a target template that may be used for a target block.

The target template may be configured using one or more lines composed of reconstructed samples surrounding the target block.

For example, as shown in fig. 9 (a), at least one of templates labeled 0 to 7 may be used as a target template for a target block.

Alternatively, as shown in fig. 9 (b), at least one of templates labeled 0 to 3 may be used as a target template for a target block.

As described above, the target template may be configured using reconstructed samples around the target block. However, in this case, since the target template for the target block cannot be configured until the reconstruction of the neighboring sample point is performed, a delay problem may occur. To solve the delay problem, the target template in the above description may be replaced with an average template. The template matching cost may correspond to the result of calculating a cost function between the reference template and the average template.

An average template may be determined based on the at least one piece of motion information.

For example, the average template may be a reference template of the block indicated by one piece of motion information.

Alternatively, the average template may be determined based on N reference templates of N blocks indicated by N pieces of motion information. For example, the average template may be determined as statistics of the N reference templates. For example, the average template may be one of an average, median, or weighted average of N reference templates of N blocks indicated by N pieces of motion information.

N may be 2,3,4, 5,6, 8, 12 or a positive integer.

N may be a fixed value determined independently of the target block.

Alternatively, N may be determined based on at least one of a prediction mode of the target block, motion information of the target block, encoding parameters of the target block, size of the target block, a range of values allowed for a luminance component of the target block, a range of values allowed for a chrominance component of the target block, availability of neighboring blocks of the target block, encoding parameters of neighboring blocks of the target block, neighboring samples of the target block, and motion information.

At least one of the motion information used to determine the average template may be one of motion information around the target block and/or motion information in a particular motion information list.

For example, at least one of the pieces of motion information used to determine the average template may be motion information at a specific location around the target block. The specific location may be one of the locations shaded in fig. 29.

The positions of the shadows in fig. 29 are sequentially checked according to a predetermined order to check the presence of motion information, and for a first position containing motion information, a reference template for a block indicated by the motion information of the corresponding position may be used as an average template.

Alternatively, the positions hatched in fig. 29 are sequentially checked according to a predetermined order to check the presence of motion information, and for the first NN positions containing motion information, an average value of reference templates for blocks indicated by the motion information of the corresponding NN positions may be used as an average template.

For example, at least one of the motion information for determining the average template may be one of the motion information in the AMVP candidate list or the merge candidate list.

For example, for NN pieces of motion information having the lowest index in the merge candidate list, the average value of the reference templates of the blocks indicated by each individual motion information may be used as the average template.

NN may be 2, 3, 4, 5, 6, 8, 12 or a positive integer.

NN may be a fixed value that is determined independently of the target block.

Alternatively, the NN may be determined based on at least one of a prediction mode of the target block, motion information of the target block, encoding parameters of the target block, size of the target block, a range of values allowed for a luminance component of the target block, a range of values allowed for a chrominance component of the target block, availability of neighboring blocks of the target block, encoding parameters of neighboring blocks of the target block, neighboring samples of the target block, and motion information.

In addition, the target templates may be configured to be hardware friendly.

From the hardware point of view of the decoder, the decoding process may be divided into several pipeline stages. The partitioning of the decoding process allows pipelining of commands for several pipeline stages to be processed simultaneously.

The target picture (or target stripe) may be partitioned in a particular unit, and the partitioned unit may be input or output to each pipeline stage.

For example, the specific unit may be a rectangular unit of m×n size. Here, M and N may be 4, 8, 16, 32, 64, 128, 256 or a positive integer, respectively, and M and N may be the same.

The size of a particular unit may be equal to or smaller than the maximum CU size (or CTU size) in the target picture (or target slice).

The partitioned units completely include at least one CU in the target picture (or target slice), or the partitioned units are completely included in one CU in the target picture (or target slice).

The data units (specific units) input to or output from each pipeline stage are referred to as Virtual Pipeline Data Units (VPDUs).

The VPDU is a data unit used in the decoding process without accessing the entire CTU.

If an image is partitioned into VPDUs and input as a VPDU unit to multiple pipeline stages, consecutive VPDUs may be processed simultaneously across different pipeline stages.

The target templates of the target blocks may be configured using only the samples that meet certain criteria.

The specific criterion may be at least one of a prediction mode of predicting the corresponding sample, a position of the corresponding sample, a size value of the corresponding sample, a size of the target block, a position of the target block, a prediction mode of the target block, or a coding parameter of the target block.

For example, the target template may be configured using only samples predicted by inter-frame prediction. For example, when the target template is configured in inter-frame template matching, the target template may be configured using only the samples predicted by inter-frame prediction.

In the P-slice and/or the B-slice, prediction and reconstruction for an intra-prediction block may be performed after prediction and reconstruction for an inter-prediction block are performed. Therefore, when a template is constructed using only samples predicted by inter prediction, template matching for a target block can be performed even if prediction and reconstruction of intra-prediction blocks around the target block are not completed.

In another example, the target template may be configured using only the samples in the VPDUs other than the target block. Since the same pipeline stages as used for the target block may be performed on other blocks in the target VPDU to which the target block belongs, prediction samples or reconstruction samples in other blocks within the same VPDU may not be available in the prediction stage of the target block.

Whether block a and block B belong to the same VPDU or different VPDUs may be determined based on at least one of the coordinates of at least one location in block a, the coordinates of at least one location in block B, the CTU to which block a belongs, the CTU to which block B belongs, or the size of the VPDU.

In another example, the target template may be configured using only the samples in CTUs that are different from the target block.

2.2 Template matching in sub-Block units

Template matching may be performed in sub-block units. The target block may be partitioned into subblock units of size n×m, and template matching may be performed in the subblock units to determine motion information.

N and M may be 2,4, 8, 16, 32, 64 or a positive integer. N and M may be predefined values. Alternatively, information about at least one of N and M may be signaled/encoded/decoded, and at least one of N and M may be determined based on the information. Alternatively, at least one of N and M may be determined based on at least one of a size of the target block, motion information of the target block, intra coding method information of the target block, and coding parameters of the target block.

If the target block is predicted in the template matching mode in sub-block units, each sub-block may be regarded as one block, and then template matching may be performed.

In some embodiments, the target templates for each sub-block may be constructed using neighboring samples of the target block and may not include samples within the target block. In this case, template matching and prediction for each sub-block may be performed in parallel.

Fig. 10 is an exemplary embodiment showing a target template for each sub-block in a sub-block unit based template matching pattern.

As shown in fig. 10, reconstructed samples adjacent to the left and/or upper side of each sub-block may be used as a target template.

If reconstructed samples immediately adjacent to the sub-block do not exist within the target block, reconstructed samples adjacent to the target block other than the sub-block may be used instead as samples for forming the target template of the sub-block. Optionally, reconstructed reference points around previously predicted or reconstructed sub-blocks may be used.

In some other embodiments, the target template for each sub-block may be constructed by including the sample points within the target block, in which case template matching and prediction for each sub-block may be performed in order.

Fig. 11 is another exemplary embodiment showing a target template for each sub-block in a sub-block unit based template matching pattern.

As shown in fig. 11, the target template for each sub-block may be composed of samples adjacent to the corresponding sub-block. In this embodiment, each sub-block may be predicted or reconstructed in sub-block units. Thus, when performing template matching for a particular sub-block, there may be prediction samples or reconstruction samples in the sub-block that has been previously performed with template matching. Thus, the target template for a particular sub-block may consist of predicted or reconstructed samples within previously predicted or reconstructed sub-blocks.

In sub-block based template matching, template matching may be performed in sub-block units using each of two or more (N, M) combinations.

A final (N, M) combination is determined based on the calculated matching cost for each (N, M) combination, and a target block may be predicted based on a result of performing template matching using the determined (N, M) combination.

Information about the template matching pattern in sub-block units at the level of the target block can be signaled/encoded/decoded. For example, an indicator indicating whether to perform a template matching mode on the target block in a sub-block unit may be signaled/encoded/decoded.

Alternatively, whether to perform the template matching mode on the target block in sub-block units may be determined based on at least one of motion information of the target block, intra coding method information of the target block, coding parameters of the target block, intra coding method information of neighboring blocks, coding parameters of neighboring blocks, neighboring samples of the target block, and at least one neighboring sample within a reference block of the target block, without involving signaling/encoding/decoding of information. If the target block is a chrominance component block, luminance component samples of the target block (at least one sample within a luminance component block or region corresponding to the target block) may be additionally considered.

For example, when the prediction mode of the target block is an inter prediction mode, template matching in sub-block units may be performed, and an affine mode and/or a template matching mode is applied to the target block. Alternatively, template matching in sub-block units may be performed based on motion information of a target block generated by template matching in full block units or signaling. For example, when the merge index is greater than or equal to a specific value, template matching in sub-block units may be performed to refine motion information of the target block.

In another example, when intra coding method information of a target block indicates an IBC mode or an intra template matching mode and indicates that prediction is performed in a sub-block unit, template matching may be performed in a sub-block unit. Alternatively, when the merge index of the target block to which the IBC merge mode is applied is greater than or equal to a specific value, template matching in sub-block units may be performed to refine the block vector of the target block.

When the target block is predicted in the template matching mode, information on the template matching mode in the sub-block unit may be signaled/encoded/decoded.

Alternatively, information about the template matching pattern in the sub-block unit may be signaled/encoded/decoded regardless of whether the target block is predicted in the template matching pattern.

Whether to perform the template matching mode in sub-block units and/or whether to signal/encode/decode information on the template matching mode in sub-block units on the target block may be determined based on at least one of the encoding mode of the target block, the motion information of the target block, the intra-coding method information of the target block, the encoding parameters of the target block, the intra-coding method information of the neighboring block, the encoding parameters of the neighboring block, the neighboring sample points of the target block, and the at least one neighboring sample point in the reference block of the target block. If the target block is a chrominance component block, luminance component samples of the target block (at least one sample within a luminance component block or region corresponding to the target block) may be additionally considered.

For example, only when the size of the target block is greater than or equal to the threshold value, the template matching pattern in the sub-block unit may be performed on the target block, and information about the template matching pattern in the sub-block unit may be signaled/encoded/decoded. The threshold may be 4, 8, 16, 32, 64, 128 or a positive integer.

2.3 Template matching in geometric partition mode

If the prediction mode of the target block is a geometric partition mode, a template to be used for generating a prediction block for each partition region may be determined according to partition information of the geometric partition mode.

Partition information of the geometric partition mode refers to information for distinguishing partition areas in the geometric partition mode, and may be, but is not limited to, at least one of a partition angle, a partition offset, and an index for specifying partition information from a partition information candidate list.

The partition angle may refer to, but is not limited to, an angle between a partition line and an X-axis or a Y-axis.

Partition offset may refer to the distance from a particular sample position relative to a target block to a straight line. For example, the particular sample position relative to the target block may be, but is not limited to, one of the center of the target block, the upper left of the target block, the lower left of the target block, the upper right of the target block, and the lower right of the target block.

For example, if the prediction mode of the target block is a geometric PARTITION mode and the PARTITION angle is less than or equal to a specific angle, the prediction block for the x_parameter PARTITION region may configure the template using only one of the left sample set or the upper sample set with reference to the target block.

For example, if the prediction mode of the target block is a geometric PARTITION mode and the PARTITION angle is greater than or equal to a specific angle, the prediction block for the x_parameter PARTITION region may configure the template using only one of the left sample set or the upper sample set with reference to the target block.

X_PARTITION may be 1, 2 or a positive integer.

As shown in fig. 12, the upper region of the partition line may be referred to as a first partition region, and the lower region may be referred to as a second partition region. The partition line is a straight line specified by partition information, which means a straight line serving as a boundary between partition areas of the geometric partition mode.

The template for the prediction block of the first partition region may be composed of only samples belonging to an upper template region with respect to the target block, and the template for the prediction block of the second partition region may be composed of only samples belonging to a left template region with respect to the target block.

For example, if the target block is based on a geometric PARTITION mode and the PARTITION angle is less than/greater than or equal to a particular angle, the prediction block for the x_parameter PARTITION region may configure the template using both the left-side set of samples or the upper set of samples referenced to the target block.

When the target block is partitioned as shown in fig. 13, the template of the prediction block for the first partition region may be composed of samples belonging to the left template region and the upper template region with respect to the target block, and the template of the prediction block for the second partition region may be composed of only samples belonging to the left template region with respect to the target block.

For example, if the prediction mode of the target block is a geometric partition mode and partition information is specified, the partition information may be extended to a template region, and the template of the prediction block for the corresponding region may be configured using only the samples of the region corresponding to each extended partition region.

When the target block is partitioned as shown in fig. 14, the template of the prediction block for the first partition region may be composed of the samples belonging to the first extended partition region, and the template of the prediction block for the second partition region may be composed of only the samples belonging to the second extended partition region.

2.4 Subsampling in template matching

In template matching, sub-sampling may be performed to configure templates, calculate costs, etc. Furthermore, sub-sampling may be performed on the search area for template matching.

■ Subsampling for configuration templates

Sub-sampling may be used to configure templates for template matching. In other words, all of the samples in the reference area may be used in configuring the template, or only a portion of the samples in the reference area may be used. Here, the reference region may mean at least one of a region referred to by the target block and a region referred to by the reference block for configuring the template.

When templates for template matching are configured using only a portion of the samples located in the reference region, sub-sampling may be performed on the entire reference region or a portion of the reference region.

When templates for template matching are configured using only a portion of the samples located in the reference region, the reference region may be partitioned into two or more regions.

In some embodiments, the partitioned area may be one of 1) an area where sub-sampling is performed, 2) an area for configuring the template without involving sub-sampling, and 3) an area not for configuring the template. Templates for template matching may be configured using the samples selected from region 1) by sub-sampling and the samples within region 2).

In one example, the region corresponding to case 1) may be a region belonging to the left side and/or upper left of the block among the regions within the reference region.

In another example, the region corresponding to case 1) may be a region belonging to the upper and/or upper left of the block among the regions within the reference region.

In some other embodiments, the partitioned area may be one of 1) an area where sub-sampling is performed and 2) an area not used for configuring the template. Templates for template matching may be configured using samples selected by sub-sampling within region 1).

■ Subsampling for cost calculation

In the cost calculation between the target template and the reference template in the template matching, all the samples in each template may be used, or only a part of the samples in each template may be used. In other words, the cost between templates may be calculated using only a portion of the samples in each template.

To calculate costs between templates using only partial samples in the templates, sub-sampling may be performed on whole or partial template areas.

When cost calculation between templates is performed using only a part of the samples in the templates, a template region for template matching may be partitioned into two or more regions.

In some embodiments, the partitioned area may be one of 1) an area where subsampling is performed, 2) an area where the cost function is calculated without involving subsampling, and 3) an area where the cost function is not calculated. The calculation of the cost function between templates for template matching may be performed using the samples selected from region 1) by sub-sampling and the samples within region 2).

In another embodiment, the partitioned area may be one of 1) an area where sub-sampling is performed and 2) an area not used for calculating the cost function. The calculation of the cost function between templates for template matching may be performed using the samples selected by sub-sampling within region 1).

■ Subsampling of search areas

The search process of template matching may be performed using reference templates included in a search area within a predetermined range from a sample position indicated by an initial motion vector of the target block. In this case, the search process may use all of the samples/positions within the search area, or may select only a portion of the samples/positions within the search area. The search and/or matching costs may be calculated for only the selected samples/positions or motion information indicating the selected samples/positions.

When performing a template-matched search process using only partial samples/locations within a search area, sub-sampling may be performed on the entire or partial search area.

When the search process of template matching is performed using only a part of the sample points/positions within the search area, the search area may be partitioned into two or more areas.

In some embodiments, the partitioned area may be one of 1) an area in which sub-sampling is performed, 2) an area in which search processing is performed without involving sub-sampling, and 3) an area in which search processing is not performed. The search process of template matching may be performed on pixels and/or selected locations within region 1) selected by sub-sampling and sample/locations within region 2). Alternatively, a search process of template matching may be performed on the sample points/positions selected by sub-sampling within the region 1) and the motion information indicating the sample points/positions within the region 2).

In some other embodiments, the partitioned area may be one of 1) an area in which sub-sampling is performed and 2) an area in which search processing is not performed. The search process of template matching may be performed using the sample/position selected by sub-sampling within region 1). Alternatively, a search process of template matching may be performed on motion information indicating the sample/position selected by sub-sampling within the region 1).

The shaded samples (or positions) in fig. 15 and 16 represent the samples (or positions) selected by sub-sampling.

The template may be configured by performing sub-sampling on all or a part of the reference region as shown in fig. 16 and using only the sampling points (or positions) selected by the sub-sampling.

The cost calculation may be performed by performing sub-sampling of all or a portion of the template region of the template matching as shown in fig. 16 and performing sub-sampling only of selected samples (or positions).

Further, sub-sampling may be performed on all or a portion of the template-matched search region as shown in fig. 15, and calculation of search and/or matching costs may be performed on only selected pixels and/or positions or only motion information indicating the selected pixels and/or positions.

2.5 Searching method for template matching

By employing the first motion information encoded in or decoded from the bitstream as the initial motion information, a first search step may be performed to derive second motion information that is the result of refining the first motion information. The second search step performed after the first search step may use the second motion information as the initial motion information.

If the initial motion information (e.g., the initial motion vector or the initial block vector) is not motion information in integer pixel units (i.e., in fractional pixel units), a search may be performed using a result obtained by performing rounding (or truncation or rounding up) on the initial motion information.

In order to generate a reference template at a position indicated by motion information obtained by adding a specific offset to initial motion information in fractional pixel units during a search process, an interpolation filter must be applied to samples at the fractional pixel positions to generate samples at the fractional pixel positions. However, if the initial motion information is limited to integer pixel units, the computational complexity may be reduced because interpolation of fractional pixel positions is not required during the search process.

■ Definition of search

A search may be performed using a calculation of a cost function to determine similarity between num_template_compare TEMPLATEs.

The search may include a process of determining at least one piece of motion information satisfying a specific condition within a specific search range. The motion information of the target block may be determined and/or changed based on at least one motion information determined by the search.

The motion information satisfying the specific condition may represent, but is not limited to, motion information having the lowest matching cost among motion information within a search range.

■ Cost function

The cost function for calculating the cost may refer to a function that determines a degree of similarity between at least one sample in the target template and at least one sample in the reference template.

The similarity between the first value and the second value may be determined using at least one of 1) a difference between the two values, 2) a ratio between the two values, and 3) an operation of comparing the difference between the two values with a specific value.

In one embodiment of the operation of comparing the difference between two values with a particular value, a method may be used that assigns a plurality of similarity values depending on which segment the difference between two values belongs to, where a segment is defined by a plurality of thresholds. For example, when a threshold is used, a method may be used that assigns a first similarity value (e.g., 1) if the difference between the two values is less than or equal to the threshold, and assigns a second similarity value (e.g., 0) otherwise.

The cost function may be one or more of Sum of Absolute Differences (SAD), sum of Absolute Transformed Differences (SATD), sum of absolute differences with mean removed (MR-SAD), mean Square Error (MSE) and Sum of Squared Errors (SSE). However, the cost function is not limited to the items listed above.

The cost function used in template matching may be predefined or may be determined based on signaled/encoded/decoded information.

The cost function used in template matching may be determined based on at least one of whether bilateral matching is performed, a condition related to bilateral matching, whether bi-prediction with CU weights is performed, and a size of a target block.

In some embodiments, the MR-SAD may be used as a cost function for template matching if the target block meets one or more of the enabling conditions or a portion of the enabling conditions for bilateral matching described below, or if bilateral matching is performed on the target block.

In other embodiments, the SAD may be used as a cost function for template matching if the target block does not satisfy one or more of the enabling conditions or a portion of the enabling conditions for bilateral matching, or if bilateral matching is not performed on the target block.

In another embodiment, the type of cost function in the template match may be determined based on specific conditions for determining the type of cost function in the bilateral match. The type of cost function in the bilateral match may be determined based on whether a particular condition is satisfied in the bilateral match. In this case, the type of cost function in the template matching may be determined based on the enabling condition of the bilateral matching and whether or not a specific condition for determining the type of cost function in the bilateral matching is satisfied.

For example, if the target block satisfies both the enabling condition of bilateral matching and the specific condition for determining the type of cost function of bilateral matching, the MR-SAD may be used as the cost function of template matching, otherwise the SAD may be used as the cost function of template matching.

In another embodiment, if the target block satisfies the enabling condition of bilateral matching and bi-prediction with CU weights is performed, or if the number of samples within the target block is greater than a specific value, the MR-SAD may be used as a cost function in template matching, otherwise the SAD may be used as a cost function in template matching.

■ Search scope (search area)

The search range may be a specific range centered on the position indicated by the initial motion information. In other words, the center of the search range may be the position indicated by the initial motion information.

Alternatively, the search range may be a specific range in which the position indicated by the initial motion information is the upper left point of the search range. In other words, the upper left point of the search range may be the position indicated by the initial motion information.

Alternatively, the search range may be a region including the adjacent sample point position in at least one direction of the lower left direction, the upper right direction, and the upper left direction with respect to the target block.

The search range may have a rectangular shape with a horizontal length of sr_x and a vertical length of sr_y. Alternatively, the search range may have a diamond shape, the horizontal length of which is sr_x and the vertical length of which is sr_y. Alternatively, the search range may have a hexagonal shape obtained by excluding a rectangular region on the right lower side of the rectangle. However, the shape and size of the search range are not limited to a particular embodiment.

Each of sr_x and sr_y may be a positive integer. Each of sr_x and sr_y may have a predefined value or a value determined based on signaled/encoded/decoded information.

The initial motion information may be determined based on at least one of motion information of the target block, encoding parameters of the target block, a motion vector of the target block, a reference image of the target block, a block vector of the target block, a motion vector predictor of the target block, a block vector predictor of the target block, motion information of at least one neighboring block of the target block, a merge candidate of the target block, a motion vector difference of the target block, and a block vector difference of the target block.

■ Searching method

The search method may be defined based on at least one of a search pattern, a search resolution, a search range, initial motion information, and a unit of derived motion information.

The search pattern may be one of a diamond pattern, a cross pattern, and a full search pattern. However, the search pattern is not limited to the patterns listed above.

When (0, 0) indicates the position indicated by the initial motion information, the search using the diamond pattern may correspond to searching one or more of the (0, 2×rr), (RR, RR), (2×rr, 0), (RR, -RR), (0, -RR), (-RR, 0), (-RR, RR) and (0, 0) positions.

When (0, 0) represents the position indicated by the initial motion information, the search using the cross pattern may correspond to searching for one or more of the (0, RR), (RR, 0), (0, -RR), (-RR, 0) and (0, 0) positions.

The RR may be a search resolution or a value determined based on the search resolution, and the RR may be a predefined positive integer.

A search using a full search pattern may correspond to searching all locations within a predefined search range.

For example, when fs_i has values from-fs_x to fs_x and fs_j has values from-fs_y to fs_y, a search using a full search pattern may correspond to a search (fs_i×rr, fs_j×rr) location. Here, (0, 0) may be a position indicated by the initial motion information. However, the search range is not limited to a specific location. Each of fs_x and fs_y may be a predefined positive integer.

The search resolution may be one of 4 pixels, full pixels, half pixels, and quarter pixels. However, the search resolution is not limited to a particular pixel.

The search resolution may be predefined, determined based on at least one of the information about the adaptive motion vector resolution, or determined based on signaled/encoded/decoded values.

The unit on which the motion information is derived may include both monolithic and sub-block units.

In order to determine a search method defined by a search pattern, a search resolution, or the like, at least one of motion information of a target block, coding parameters of the target block, a size of the target block, a prediction mode of the target block, a reference image of the target block, at least one sample value in the target block, a target template, at least one sample value in the target template, and a region of the target template may be considered.

Based on the motion information of the target block, the encoding parameters, the prediction mode, and the adaptive motion vector resolution, a specific column in the tables shown in fig. 17 to 22 can be determined. The searches may be performed in descending order from the top to the bottom of the selected column using the search pattern and search resolution corresponding to the row labeled "v".

For example, in fig. 17, if the target block is in AMVP mode and the resolution determined by the adaptive motion vector resolution is 4 pixels, a search using a diamond pattern having a 4-pixel search resolution may be performed, and then a search using a cross pattern having a 4-pixel search resolution may be performed.

ALT IF may represent an index of the adaptive interpolation filter. Interpolation filters may be applied to calculate pixel values at sample points with a particular resolution. The adaptive interpolation filter may be an interpolation filter selected by index from a plurality of interpolation filters. In other words, when an adaptive interpolation filter is applied, pixel values at sample points having a specific resolution may be calculated using different interpolation filters according to indexes.

For example, the particular resolution may be half-pixels. However, the specific resolution is not limited to half pixels.

For example, the interpolation filter determined by the index may be one of a 6-tap interpolation filter and an 8-tap interpolation filter. The method for determining the interpolation filter is not limited to the above-described determination method.

Fig. 23 shows a method for configuring a first template (target template) in an affine mode. Fig. 24a and 24b illustrate a method for configuring a second template (reference template) in an affine mode.

CPMV may mean affine Control Point Motion Vectors (CPMV) used in affine mode. The MV may be derived for each sub-block within the target block using CPMV.

Referring to fig. 23, the target templates (a ₀ to a ₃ and L ₀ to L ₃) may be composed of sub-blocks adjacent to the top and left sides of the target block (target CU).

Referring to fig. 24a, in one embodiment, a motion vector (or block vector) determined for each reference template position based on a CPMV of a target block may be used to determine a reference template (a ₀ to a ₃ and L ₀ to L ₃) corresponding to the target template. For example, a sub-block at a position indicated by a motion vector of the reference template from the target template position may be configured as the reference template.

Referring to fig. 24b, in another embodiment, reference templates (A0 to A3 and L0 to L3) corresponding to target templates may be determined using a motion vector (or a block vector) of a sub-block closest to each target template within a target block. For example, the CPMV of the target block may be used to determine the motion vector (or block vector) of the sub-block closest to the target template. The sub-block at the position indicated by the determined motion vector from the target template position may be configured as a reference template.

When affine transformation is performed on a target block, template matching for the target block may be performed in sub-block units.

When affine transformation is performed on a target block, the target block may be partitioned into sub-block units of width N and height M. The motion information for each sub-block may be determined based on at least one of motion information, coding parameters, and size of the target block. The template matching cost for the target block may be determined based on at least one of the template matching costs for the sub-blocks of the partition. For example, the template matching cost for the target block may be the sum or average of the template matching costs for the sub-blocks of the partition.

N and M may be 2, 4, 8 or a positive integer, respectively.

N and M may be predefined values, respectively, or values determined based on signaled/encoded/decoded information.

After affine transformation is performed on the target block, template matching may be performed to improve motion information of the target block. The motion information of the target block may include at least one of CPMV of the target block, affine transformation model of the target block, or motion information of at least one sub-block of the target block. Here, the motion information may be a motion vector or a block vector.

Referring to fig. 23, 24 (a) and 24 (b), motion information of a target block may be improved by configuring a target template and a reference template for at least one sub-block located at the left or upper side of the target block and performing template matching.

For example, it is assumed that a set of sub-blocks composed of at least one of sub-blocks located at the left or upper side of the target block is represented as set a. As shown in fig. 23, 24 (a) and 24 (b), the motion information of the sub-blocks can be improved by configuring a target template and a reference template for the sub-blocks belonging to the set a and performing template matching. The improved motion information for the sub-blocks belonging to set a may then be used to improve the motion information for at least one sub-block within the target block.

In some embodiments, the CPMV of the target block may be improved by the improved motion information of the sub-blocks belonging to set a. For example, the mapping model may be derived by performing a regression operation that includes as input the locations (x-and/or y-coordinates) of the sub-blocks belonging to set a and that includes as output the modified motion vectors at the respective locations. Motion information at each CPMV location (control point) may be derived using the derived mapping model, and each CPMV may be refined using the derived motion information. The affine transformation model of the target block may be recalculated from the modified CPMV to derive motion information of the sub-blocks in the target block again.

For example, when a 4-parameter affine transformation model is used (i.e., when two CPMV of the upper left corner and the lower left corner are used as shown in fig. 26), at least one of the two CPMV may be improved from the improved motion information of the sub-block of the improved set a by affine template matching.

Alternatively, the CPMV to be improved may correspond to at least one of two CPMV used in the 4-parameter affine transformation model and CPMV at a position not used in the affine transformation model. For example, CPMVs for the lower left corner can be newly derived.

In some other embodiments, the motion information of the sub-blocks in the target block may be improved from the improved motion information of the sub-blocks belonging to set a. For example, the mapping model may be derived by performing a regression operation that includes as input the locations (x-and/or y-coordinates) of the sub-blocks belonging to set a and that includes as output the modified motion vectors at the respective locations. For each sub-block in the target block, motion information of at least one position in the corresponding sub-block (e.g., a center position of the corresponding sub-block) may be derived, and the motion information of the corresponding sub-block may be improved using the derived motion vector (or block vector).

After affine template matching is performed on the target block while taking its complexity into consideration, only the translation coefficients of the affine transformation model of the target block may be improved, and the non-translation coefficients may remain unchanged. For example, all sub-blocks in a target block may be equally improved. In other words, the (improved motion information-initial motion information) of each sub-block may be the same. The translation coefficient and the non-translation coefficient will be described later by equations 5 and 6.

2.6 Template matching in bilateral prediction blocks

The motion information of the target block may be determined based on the motion information of the neighboring blocks. This attribute may be expressed as "the target block inherits motion information from neighboring blocks".

For example, if the target block is in the merge mode, one merge candidate may be specified from the merge candidate list based on the merge index, and motion information of the specified merge candidate may be used as motion information of the target block.

If the target block is in AMVP mode, one MV candidate may be designated from the MV candidate list based on the MV candidate index, and motion information of the designated MV candidate may be used as motion information of the target block.

If bi-prediction is performed on the target block, for example, if motion information inherited from neighboring blocks indicates bi-prediction, an embodiment of performing template matching in the target block may include the following steps.

■ Step 1:

Template matching is performed for each of the L0 and L1 directions, and template matching costs C0 and C1 of the motion information determined for the L0 and L1 directions are calculated.

At this time, when template matching is performed for each direction, motion information in other directions is not considered, and template matching may be performed in the same manner as template matching in unidirectional prediction along the corresponding direction.

If the target block meets the predefined condition, the MR-SAD may be used as a cost function, otherwise the SAD may be used as a cost function.

Here, the predefined condition may be a condition based on at least one of whether a local illumination compensation mode is performed in the target block, whether bi-prediction with CU weights is performed in the target block, an indicator indicating whether a model-based prediction method is performed in the target block, whether bilateral matching is performed in the target block, an indicator indicating whether bilateral matching is performed in the target block, motion information of the target block, a size of the target block, coding parameters of the target block, motion information of neighboring blocks of the target block, coding parameters of neighboring blocks of the target block, and a type of a cost function in template matching in neighboring blocks of the target block.

In some embodiments, the MR-SAD may be used as a cost function if the condition of whether the model-based prediction method is performed in the target block is true or an indicator indicating whether the model-based prediction method is performed in the target block is true, otherwise the SAD may be used as a cost function.

In another embodiment, the MR-SAD may be used as a cost function if the condition of whether bilateral matching is performed in the target block is true or an indicator indicating whether bilateral matching is performed in the target block is true, and the SAD may be used as a cost function if the number of samples in the target block is greater than or equal to a particular value, otherwise.

The cost function may mean a cost function used in searching for template matches, and/or a cost function for calculating at least one of C0, C1, and C'. C' will be described in the following step 3.

The cost function used in the template matching search and the cost function used to calculate at least one of C0, C1, and C' may be the same or different. For example, MR-SAD may be used as a cost function in a template matching search, and SAD may be used as a cost function for C0, C1, and C'.

In another embodiment, the SAD may be used as a cost function in template matching if the size of the target block is less than a particular value, otherwise the MR-SAD may be used as a cost function in template matching. The size of the block may include at least one of a width of the block, a height of the block, (a sum of the width of the block and the height of the block), and (a product of the width of the block and the height of the block). The specific value may be 16, 32, 64, 128 or a positive integer.

In another embodiment, if BCW is not performed on the target block or the same weights are used in BCW for reference blocks in the L0 and L1 directions, SAD may be used as a cost function in template matching, otherwise MR-SAD may be used as a cost function.

In another embodiment, the SAD may be used as a cost function in template matching if a Local Illumination Compensation (LIC) mode is not performed on the target block, otherwise the MR-SAD may be used as a cost function. The local illuminance compensation mode may be a mode in which at least one of a weight and an offset is derived by calculating a correlation between a template of the target block and a template of the reference block, and the derived weight or offset is applied to all or part of the target block (or the reference block of the target block). Here, the weight means a parameter for multiplication with a reference block of the target block, and the offset means a parameter for addition with a reference block of the target block. In the local illumination compensation mode, the sample value P may be modified as follows.

P' =w×p+o (where w represents a weight and o represents an offset).

■ Step 2:

if C0< C1, a new target template T' is generated using the target template and the template in the L0 direction.

For example, when the target template is T, the reference template in the L0 direction is T0, and the reference template in the L1 direction is T1, a new target template T' may be determined as follows:

T’=w_T T+ w_T0 T0

w _T and w _T0 may be predefined values, respectively.

W _T may be a positive number and w _T0 may be a negative number. w _T and w _T0 may be values determined based on whether bi-prediction with CU weights is performed in the target block, and/or weights in bi-prediction with CU weights.

For example, w _T may be 2 and w _T0 may be-1. If bi-prediction with CU weights is not performed in the target block, or if weights in bi-prediction with CU weights for the L0 and L1 directions are the same, w _T may be 2 and w _T0 may be-1.

For example, if bi-prediction with CU weights is performed in the target block, w _T may be a value determined based on the weights for the L1 direction. w _T0 may be a value determined based on the weight for the L0 direction divided by the weight for the L1 direction.

If C0> C1, a new target template T' may be generated using the target template and the template in the L1 direction.

If the values of C0 and C1 are the same, then this may be considered to be the case for either C0< C1 or C1> C0, and the process of step 2 may be performed.

■ Step 3:

if C0< C1, template matching is performed with the target template as T 'for the L1 direction, and template matching cost C' of the motion information determined for the L1 direction is calculated.

If C0> C1, template matching is performed with the target template as T 'for the L0 direction, and template matching cost C' of the motion information determined for the L0 direction is calculated.

If the value of C0 is the same as the value of C1, then it is considered that C0< C1 or C1> C0, and the process of step 3 can be performed.

■ Step 4:

The motion information of the target block may be changed to one-sided motion information based on at least one of the C', C0, or C1 values.

For example, if a value obtained by multiplying C' by a predetermined value w _C' is larger than the smaller of C0 or C1 (or a value obtained by multiplying the smaller by a predetermined value w _CX), the motion information of the target block is changed to motion information indicating unidirectional prediction in the L0 direction or the L1 direction. Here, the result of multiplying the two values may represent one of rounding, rounding up, or truncated values of a×b or a×b.

For example, if C0< C1, the motion information of the target block may be changed to motion information indicating unidirectional prediction in the L0 direction. Alternatively, the motion information for the L1 direction may be considered as not available in the target block.

If C0> C1, the motion information of the target block may be changed to motion information indicating unidirectional prediction in the L1 direction. Alternatively, the motion information for the L0 direction may be regarded as not available in the target block.

If the value of C0 is the same as the value of C1, then it is considered that C0< C1 or C1> C0, and the process of step 4 can be performed.

W _c' and w _CX may be predefined values, respectively.

W _c' and w _CX may be values determined based on whether bi-prediction with CU weights is performed in the target block, and/or weights in bi-prediction with CU weights.

For example, w _c' may be 1/2 and w _CX may be 9/8.

For example, w _c' may be a value determined based on the weight in the L0 direction or L1 direction in bi-prediction with CU weights. If C0< C1, w _c' may be the weight in the L1 direction in bi-prediction with the CU weight or a value determined based on the weight. If C1< C0, w _c' may be the weight in the L0 direction in bi-prediction with the CU weight or a value determined based on the weight.

Steps 2 to 4 may be performed only when the target block satisfies a predefined condition.

For example, steps 2 through 4 may be performed only when the target block is based on bilateral prediction, and bilateral matching is not performed in the target block, or the target block does not satisfy the enabling condition of bilateral matching.

In addition, when refinement using template matching is performed on the first motion information using the first picture as a reference picture, second motion information using the second picture as a reference picture may be derived from the first motion information. Thereafter, refinement may be performed using template matching for the second motion information.

The first reference block indicated by the first refinement motion information may be used to generate a prediction block of the target block, wherein the first refinement motion information is a result of refining the first motion information using template matching.

Further, a second reference block indicated by second refinement motion information may be used to generate a prediction block of the target block, wherein the second refinement motion information is a result of refining the second motion information using template matching.

In generating the prediction block of the target block, weighted summation of the first reference block and the second reference block may be performed.

Here, the first picture and the second picture may be one of the pictures in the reference picture list of the target block in the L0 direction and/or one of the pictures in the reference picture list of the target block in the L1 direction, respectively.

When deriving the second motion information from the first motion information, a result obtained by scaling the first motion vector based on the POC of the first reference picture, the POC of the second reference picture, and the POC of the target picture may be used as the motion vector of the second motion information.

For example, the motion vector of the second motion information may have the same magnitude as the motion vector of the first motion information, but in the opposite direction.

The first motion information and the second motion information may be different in at least one of a reference picture, a reference picture index, and motion information.

Alternatively, the first motion information and the second motion information may be identical except for the reference picture, the reference picture index, and the motion information.

Up to now, a method has been described which derives second motion information from first motion information and generates a prediction block of a target block based on the first motion information and the second motion information. However, in the same manner as the method for deriving the second motion information and the method for generating the prediction block of the target block, the nth motion information may be derived, and when generating the prediction block of the target block, the reference block indicated by the nth motion information may be used. N may be 2,3 or a positive integer.

3. Bilateral matching

In the bilateral matching, a reference block in the L0 direction and a reference block in the L1 direction are used as templates, and motion information of a target block can be determined and/or changed based on calculation of a cost function between the two templates.

The reference block may include at least one of 1) a reference block to which initial motion information is directed, 2) a reference block to which motion information derived during a search process of bilateral matching is directed, and 3) a reference block to which motion information finally improved by bilateral matching is directed.

For example, in configuring templates for bilateral matching, a reference block in the L0 direction and a reference block in the L1 direction may be used as templates.

The bilateral matching cost may correspond to a result value obtained by calculating a cost function of templates of the reference block in the L0 direction and the reference block in the L1 direction used in bilateral matching.

If the target block is in IBC mode and two or more reference blocks are used to predict the target block, bilateral matching may be performed using two different ones of the reference blocks of the target block as templates.

Hereinafter, bilateral prediction in inter prediction is described instead of IBC mode, however, technical features of bilateral prediction in inter prediction may be equally applied to IBC mode. In this case, the reference blocks in the L0 and L1 directions may be replaced by two reference blocks generated in the IBC mode.

3.1 Sub-sampling in bilateral matching

■ Subsampling for configuration templates

In configuring templates for bilateral matching, only a portion of pixels and/or locations within the reference block in the L0 and L1 directions may be selected. The template may be configured using only selected pixels and/or locations.

The template for bilateral matching may correspond to at least one of the template in the L0 direction and the template in the L1 direction.

In some embodiments, sub-sampling may be used for reference blocks in the L0 direction and reference blocks in the L1 direction when configuring templates for bilateral matching.

In another embodiment, sub-sampling may be used for a portion of the reference block in the L0 direction and a portion of the reference block in the L1 direction when configuring templates for bilateral matching.

For example, in configuring templates for bilateral matching, a reference block in the L0 direction and a reference block in the L1 direction may be partitioned into two or more regions, respectively. The partitioned area may be one of 1) an area in which sub-sampling is performed, 2) an area in which the template is configured without involving sub-sampling, and 3) an area in which the template is not configured. Templates for bilateral matching may be configured using pixels and/or locations selected from region 1) by sub-sampling and pixels and/or locations within region 2).

Alternatively, the partitioned area may be one of 1) an area in which sub-sampling is performed and 2) an area not used for configuring the template. Templates for bilateral matching may be configured using pixels and/or locations selected by sub-sampling in region 1).

In another embodiment, when configuring templates for bilateral matching, the area for configuring templates may be a part of the reference block in the L0 direction and a part of the reference block in the L1 direction.

In another embodiment, in configuring a template for bilateral matching, pixels (or positions) for configuring the template may be selected from only a part of the reference block in the L0 direction and a part of the reference block in the L1 direction.

The size of a portion of the reference block in the L0 direction may be smaller than the size of a portion of the reference block in the L1 direction.

For example, the height (vertical dimension) of a portion of the reference block in the L0 direction may be smaller than the height (vertical dimension) of the reference block in the L1 direction.

For example, a width (horizontal size) of a portion of the reference block in the L0 direction may be smaller than a width (horizontal size) of the reference block in the L1 direction.

The size of a portion of the reference block in the L1 direction may be smaller than the size of the reference block in the L0 direction.

For example, the height (vertical dimension) of a portion of the reference block in the L1 direction may be smaller than the height (vertical dimension) of the reference block in the L0 direction.

For example, a width (horizontal size) of a portion of the reference block in the L1 direction may be smaller than a width (horizontal size) of the reference block in the L0 direction.

■ Subsampling for cost calculation

In performing cost calculations between templates in bilateral matching, only a portion of pixels and/or locations within the template region may be selected. The cost calculation may be performed for only selected pixels and/or locations.

In some embodiments, sub-sampling may be performed on the template region in the L0 direction and the template region in the L1 direction when performing cost calculations between templates in bilateral matching.

In another embodiment, sub-sampling may be performed on a portion of the template region in the L0 direction and a portion of the template region in the L1 direction when performing cost calculations between templates in bilateral matching.

Alternatively, in performing cost calculation between templates in bilateral matching, a template region in the L0 direction and a template region in the L1 direction may be partitioned into two or more regions, respectively. The partitioned area may be one of 1) an area in which sub-sampling is performed, 2) an area for cost calculation that does not involve sub-sampling, and 3) an area for no cost calculation. Cost calculations between templates in bilateral matching may be performed using pixels and/or locations selected from region 1) by sub-sampling and pixels and/or locations within region 2).

Alternatively, the area of the partition may be one of 1) an area in which sub-sampling is performed and 2) an area not used for cost calculation. The computation of the cost function between templates in bilateral matching may be performed using pixels and/or locations selected by sub-sampling from region 1).

In another embodiment, in performing cost calculations between templates in bilateral matching, the area for cost calculations may be a portion of the template in the L0 direction and a portion of the template in the L1 direction.

The size of a portion of the template in the L0 direction may be different from the size of a portion of the template in the L1 direction.

In one example, a size of a portion of the template in the L0 direction may be smaller than a size of a portion of the template in the L1 direction.

For example, the height (vertical dimension) of a portion of the template in the L0 direction may be smaller than the height (vertical dimension) of the template in the L1 direction.

For example, a width (horizontal dimension) of a portion of the template in the L0 direction may be smaller than a width (horizontal dimension) of the template in the L1 direction.

In another example, a size of a portion of the template in the L1 direction may be smaller than a size of the template in the L0 direction.

For example, the height (vertical dimension) of a portion of the template in the L1 direction may be smaller than the height (vertical dimension) of the template in the L0 direction.

For example, a width (horizontal dimension) of a portion of the template in the L1 direction may be smaller than a width (horizontal dimension) of the template in the L0 direction.

■ Subsampling of search areas

For example, in performing a bilateral matching search process, only a portion of pixels and/or locations within the search area may be selected. The search and/or matching cost calculation may be performed only for selected pixels and/or selected locations. Alternatively, the search and/or matching cost calculation may be performed only for motion information indicating the selected pixel and/or location.

For example, in performing a search process of bilateral matching, sub-sampling may be performed on all or part of the search area.

Alternatively, for example, in performing the search processing of bilateral matching, the search area may be partitioned into two or more areas. The partitioned area may be one of 1) an area in which sub-sampling is performed, 2) an area in which search processing is performed without involving sub-sampling, and 3) an area in which search processing is not performed. A search process for bilateral matching may be performed on pixels and/or locations within region 1) selected by sub-sampling and pixels and/or locations within region 2). Alternatively, a search process of bilateral matching may be performed on the pixels and/or positions selected by sub-sampling within the region 1) and the motion information indicating the pixels and/or positions within the region 2).

Alternatively, for example, in performing the search processing of bilateral matching, the search area may be partitioned into two or more areas. The area of the partition may be one of 1) an area in which sub-sampling is performed and 2) an area in which search processing is not performed. A search process for bilateral matching may be performed on pixels and/or locations selected by sub-sampling within region 1). Alternatively, a search process of bilateral matching may be performed on motion information indicating the pixel and/or position selected by sub-sampling within the region 1).

Fig. 15 shows various examples of a sub-sampling method in bilateral matching.

The shaded samples (or positions) in fig. 15 represent the samples (or positions) selected by sub-sampling.

As shown in fig. 15, sub-sampling may be performed on the reference block region in the L0 direction and the reference block region in the L1 direction, and the template may be configured using only selected samples (or positions).

As shown in fig. 16, sub-sampling may be performed on a portion of the reference block region in the L0 direction and a portion of the reference block region in the L1 direction, and the template may be configured using only selected samples (or positions).

For all or part of the bilateral matched template regions, sub-sampling may be performed as shown in fig. 15, and cost calculation may be performed for only selected samples (or locations).

For all or part of the search area of bilateral matching, sub-sampling may be performed as shown in fig. 15, and cost calculation may be performed for only selected pixels and/or locations.

For all or part of the search area of bilateral matching, sub-sampling may be performed as shown in fig. 15, and cost calculation may be performed only for motion information indicating selected pixels and/or locations.

3.2 Conditions for performing bilateral matching

Bilateral matching can always be performed. Alternatively, bilateral matching may be performed only when a predefined enabling condition is met.

For example, bilateral matching may be performed when an inter prediction mode is used for a target block and two or more reference blocks are employed.

In some embodiments, bilateral matching may be performed when the first direction is different from the second direction and the first POC interval (or difference) is the same as the second POC interval. The first direction may be a direction from the target image to the reference image in the L0 direction. The second direction may be a direction from the target image to the reference image in the L1 direction. The first POC interval may be a difference between the POC of the target image and the POC of the reference image in the L0 direction. The second POC interval may be a difference between the POC of the target image and the POC of the reference image in the L1 direction.

In another embodiment, bilateral matching may be performed when the first direction is different from the second direction. For example, bilateral matching may be performed when the first direction is different from the second direction even though the first POC interval is different from the second POC interval.

Here, the fact that the first direction and the second direction are different from each other may indicate that the following equation 1 is satisfied.

[ Equation 1] (POCt-PCO 0) × (POCt-POC 1) <0

Here, the fact that the first direction is the same as the second direction may indicate that the following equation 2 is satisfied.

[ Equation 2] (POCt-POC 0) × (POCt-POC 1) >0

In equation 1, POCt denotes the POC of the target image, POC0 denotes the POC of the reference image in the L0 direction, and POC1 denotes the POC of the reference image in the L1 direction.

In another embodiment, bilateral matching may be performed even when the first direction and the second direction are the same.

For example, bilateral matching may be performed when the prediction mode of the target block is an inter-picture prediction mode, two or more reference blocks are used, the first direction and the second direction are the same, and at least one of the first POC interval and the second POC interval is employed.

For example, bilateral matching may be performed when the first direction and the second direction are the same, and the smaller of the first POC interval and the second POC interval is less than a predetermined value. In another example, bilateral matching may be performed when two or more reference blocks are used, the first direction and the second direction are the same, and a greater one of the first POC interval and the second POC interval is less than a predetermined value.

In another example, the double-sided matching may be performed based on a ratio of the first POC interval to the second POC interval. For example, bilateral matching may be performed when a ratio of a larger one of the first POC interval and the second POC interval to a smaller one of the two POC intervals is less than a predetermined value.

The predetermined value may be 1, 2, 3, 4, 8 or a positive integer.

In bilateral matching, a relationship between the magnitude of the motion information improvement value in the L0 direction and the magnitude of the motion information improvement value in the L1 direction may be determined based on the first POC interval and the second POC interval.

For example, (magnitude of motion information improvement value in L0 direction) = (second POC interval) = (first POC interval) in (magnitude of motion information improvement value in L1 direction.

In the bilateral matching, a relationship between the direction of the motion information improvement value in the L0 direction and the direction of the motion information improvement value in the L1 direction may be determined based on the first direction and the second direction.

For example, when the first direction and the second direction are the same, the direction of the motion information improvement value in the L0 direction and the direction of the motion information improvement value in the L1 direction are the same, otherwise, the direction of the motion information improvement value in the L0 direction and the direction of the motion information improvement value in the L1 direction may be opposite to each other.

3.3 Search method for bilateral matching

■ Definition of search

A search may be performed using a calculation of a cost function to determine similarity between two templates.

In other words, the motion information satisfying the specific condition may mean, but is not limited to, motion information having the lowest matching cost among motion information within a search range.

The search may include a process of determining at least one piece of motion information satisfying a specific condition within a specific search range. The motion information indicating the block determined by the search may be used as the motion information of the target block.

In other words, a block satisfying a particular condition may mean, but is not limited to, motion information having the lowest matching cost among reference blocks within a search range.

■ Cost function

Given two templates that are bilaterally matched, a cost function may refer to a function that determines the similarity between at least one sample in a first template and at least one sample in a second template.

The cost function used in bilateral matching may be predefined or may be determined based on signaled/encoded/decoded information.

In one embodiment, if the size of the target block is less than a particular value, the SAD may be used as a cost function in bilateral matching, otherwise the MR-SAD may be used as a cost function in template matching. The size of the block may include at least one of a width of the block, a height of the block, (a sum of the width of the block and the height of the block), and (a product of the width of the block and the height of the block).

In another embodiment, if BCW is not performed on the target block or the same weight is used for reference blocks in L0 and L1 directions in BCW, SAD or SATD may be used as cost function in bilateral matching, otherwise MR-SAD or MR SATD may be used as cost function.

In yet another embodiment, if the local illumination compensation mode is not performed on the target block, the SAD may be used as a cost function in a bilateral match, otherwise the MR-SAD may be used as a cost function.

■ Searching method

The same search method as that described in the template matching can be applied.

■ Search scope (search area)

In bilateral matching, the size of the search area in the L0 direction may be the same as the size of the search area in the L1 direction.

Alternatively, in bilateral matching, the size of the search area in the L0 direction and the size of the search area in the L1 direction may be determined based on the first POC interval and the second POC interval, and the size of the search area in the L0 direction and the size of the search area in the L1 direction may be the same or different.

In the bilateral matching, the search area in the LX direction may be a rectangle centered on a position (or block) indicated by the motion information in the LX direction, the rectangle having a height of a first value and a width of a second value. The first value and the second value may be the same or different.

The first value and the second value may be predefined values. Alternatively, each of the first value and the second value may be determined based on at least one of a prediction mode of the target block, motion information of the target block, encoding parameters of the target block, a size of the target block, a range of values allowed for a luminance component of the target block, a range of values allowed for a chrominance component of the target block, availability of neighboring blocks of the target block, encoding parameters of neighboring blocks of the target block, neighboring samples of the target block, and motion information.

X may be 0 or 1.X may always be 0. Alternatively, X may always be 1. Alternatively, X may be 1 when the second POC interval is greater than the first POC interval, and X may be 0 otherwise. Alternatively, X may be 0 when the second POC interval is greater than the first POC interval, and X may be 1 otherwise.

The search area in the L (1-X) direction may be a rectangle centered on the position (or block) indicated by the motion information in the L (1-X) direction, the rectangle having a height of the third value and a width of the fourth value.

The third value and the fourth value may be values determined based on the first value and the second value, respectively.

The third value may be a value determined based on the first value and the first and second POC intervals. The fourth value may be a value determined based on the second value and the first and second POC intervals.

For example, the third value may be a value obtained by multiplying or dividing the first value by (first POC interval/second POC interval). For example, the fourth value may be a value obtained by multiplying or dividing the second value by (the first POC interval/the second POC interval).

In another example, the third value may be a larger between a value obtained by multiplying or dividing the first value by (the first POC interval/the second POC interval) and a predetermined value. In yet another example, the fourth value may be a larger between a value obtained by multiplying or dividing the second value by (the first POC interval/the second POC interval) and a predetermined value.

The predetermined value may be 4, 8, 16, 32 or a positive integer. The predetermined value may be a value determined based on at least one of a prediction mode of the target block, motion information of the target block, coding parameters of the target block, a size of the target block, a range of values allowed for a luminance component of the target block, a range of values allowed for a chrominance component of the target block, availability of neighboring blocks of the target block, coding parameters of neighboring blocks of the target block, neighboring samples of the target block, and motion information.

In another embodiment, in bilateral matching, the size of the search area in the L0 direction and the size of the search area in the L1 direction may be the same if the first direction and the second direction are opposite and the first POC interval and the second POC interval are the same in size, otherwise the size of the search area in the L0 direction and the size of the search area in the L1 direction may be different.

For example, if the first direction and the second direction are the same or the sizes of the first POC interval and the second POC interval are different, when the sizes of the search areas are the same, at least one of the sizes of the search areas in the L0 direction or the sizes of the search areas in the L1 direction may be determined based on the sizes of the search areas in the L0 direction and the search areas in the L1 direction.

For example, the size of the search area in the LX direction may be the same as the size of the search area in the L0 direction or the size of the search area in the L1 direction, where the LX direction is any one of the L0 direction and the L1 direction that provides a larger POC interval from the target image. Also, the size of the search area in the L (1-X) direction may be equal to or smaller than the size of the search area in the LX direction. For example, the size of the search area in the LX direction may correspond to the size of the search area in the L (1-X) direction multiplied by the (1+x) th POC interval/the (2-X) th POC interval) or multiplied by a value based on a multiplier value.

Alternatively, when the sizes of the search areas are the same, the size of the search area in the LX direction may be the same as the size of the search area in the L0 direction or the same as the size of the search area in the L1 direction, where the LX direction is any one of the L0 direction and the L1 direction providing a smaller POC interval from the target image. Also, the size of the search area in the L (1-X) direction may be equal to or larger than the size of the search area in the LX direction. For example, the size of the search area in the L (1-X) direction may correspond to the size of the search area in the LX direction multiplied by the (2-X) th POC interval/the (1+x) th POC interval) or multiplied by a value based on a multiplier value.

In bilateral matching, a method of performing a search may be determined based on at least one of the first direction, the second direction, the first POC interval, and the second POC interval.

The LX direction search location determined based on the location indicated by the initial motion vector in the LX direction and the L (1-X) direction search location determined based on the location indicated by the initial motion vector in the L (1-X) direction corresponding to the LX direction search location may be determined based on at least one of the first direction, the second direction, the first POC interval, and the second POC interval with respect to each other.

For example, when searching is sequentially performed for a 1-pixel upper position and a 2-pixel upper position based on a position indicated by an initial motion vector in the L0 direction, bilateral matching in the target block may be performed as follows. A block corresponding to a particular location may refer to a block whose upper left corner is a particular location (i.e., the upper left sample point of the block is a particular location).

Case 1 when the first and second directions are opposite in the target block and the first and second POC intervals are the same

The matching cost between the block corresponding to the 1-pixel upper position of the position indicated by the initial motion vector in the L0 direction and the block corresponding to the 1-pixel lower position of the position indicated by the initial motion vector in the L1 direction is calculated.

Then, a matching cost between a block corresponding to a position 2-pixels above the position indicated by the initial motion vector in the L0 direction and a block corresponding to a position 2-pixels below the position indicated by the initial motion vector in the L1 direction is calculated.

Case 2 when the first and second directions are opposite in the target block and the first and second POC intervals are different

The matching cost between the block corresponding to the position 1 pixel above the position indicated by the initial motion vector in the L0 direction and the block corresponding to the position toMult ×1 pixel below the position indicated by the initial motion vector in the L1 direction is calculated.

Then, a matching cost between a block corresponding to a position 2 pixels above the position indicated by the initial motion vector in the L0 direction and a block corresponding to a position toMult ×2 pixels below the position indicated by the initial motion vector in the L1 direction is calculated.

Case 3 when the first and second directions are the same in the target block and the first and second POC intervals are the same

The matching cost between the block corresponding to the 1-pixel upper position of the position indicated by the initial motion vector in the L0 direction and the block corresponding to the 1-pixel upper position of the position indicated by the initial motion vector in the L1 direction is calculated.

Then, a matching cost between a block corresponding to a position 2-pixel above the position indicated by the initial motion vector in the L0 direction and a block corresponding to a position 2-pixel above the position indicated by the initial motion vector in the L1 direction is calculated.

Case 4 when the first and second directions are the same in the target block and the first and second POC intervals are different

The matching cost between the block corresponding to the 1-pixel upper position of the position indicated by the initial motion vector in the L0 direction and the block corresponding to the toMult ×1-pixel upper position of the position indicated by the initial motion vector in the L1 direction is calculated.

Then, a matching cost between a block corresponding to a position 2 pixels above the position indicated by the initial motion vector in the L0 direction and a block corresponding to a position toMult ×2 pixels above the position indicated by the initial motion vector in the L1 direction is calculated.

For example, if the second POC interval is greater than the first POC interval, bilateral matching may be performed as described in cases 1 to 4, and if the first POC interval is greater than the second POC interval, bilateral matching may be performed by exchanging L0 direction and L1 direction in cases 1 to 4.

ToMult may be 1 or a positive number.

For example, toMult may be (second POC interval/first POC interval) or a value determined based on a division value.

Alternatively, toMult may always be 1 regardless of the first POC interval and the second POC interval.

3.4 Search step for bilateral matching

If the initial motion information (e.g., an initial motion vector or an initial block vector) is not motion information in integer pixel units (i.e., in fractional pixel units), the initial motion information may be modified by rounding (or truncating or rounding up) the initial motion information and a search may be performed using the modified initial motion vector.

For example, in order to generate a reference template at a position indicated by motion information obtained by adding a specific offset to initial motion information of a fractional pixel unit in search processing, an interpolation filter has to be applied to a sample at the integer pixel position to generate a sample at the fractional pixel position. However, if the initial motion information is limited to integer pixel units, the computational complexity may be reduced because interpolation of fractional pixel positions is not required during the search process.

Bilateral matching may include one or more search steps.

For example, bilateral matching may be configured to include, in order, 1) a step of deriving motion information for a whole block and 2) a step of deriving motion information for a sub-block of a block. However, the method of deriving motion information and the order of executing steps performed at each step are not limited to the above configuration.

In each search step of bilateral matching, motion information about bm_num direction among motion information about L0 direction and motion information about L1 direction may be improved. Bm_num may be 0, 1, 2 or a positive integer. The bm_num directions used in the step of bilateral matching may be the same or different from each other.

For example, when bm_num is 1 in a specific search step, motion information improvement may be performed only for motion information in LX _BM direction of the specific search step. Here, X _BM may have a value of 0 or 1. In other words, if X _BM is 0, the L0 direction is set to LX _BM direction, and if X _BM is 1, the L1 direction is set to LX _BM direction.

For example, if bm_num is 1 and X _BM is 0 in a specific search step, in the case where the template in the L1 direction and the motion information in the L1 direction are fixed, only the search in the L0 direction may be performed for the specific search step.

Information about X _BM may be signaled/encoded/decoded.

Alternatively, X _BM may be predefined.

For example, X _BM may be set to a value in a direction that results in a larger POC difference from the target picture, among the L0 direction and the L1 direction. The POC difference may be the difference between the POC of the reference picture (in a particular direction) and the POC of the target picture.

Alternatively or additionally, X _BM may be set to the direction that yields higher template matching costs from among the L0 direction and the L1 direction. Here, the template matching cost in the specific direction may be the template matching cost of the motion information in the specific direction.

Hereinafter, various embodiments for determining X _BM will be described.

■ Method for determining X _BM

In one embodiment, X _BM may be 0 if the first POC difference is greater than the second POC difference, otherwise X _BM may be 1. Alternatively, X _BM may be 1 if the first POC difference is greater than the second POC difference, otherwise X _BM may be 0. The first POC difference may be a difference between the POC of the target image and the POC of the reference image in the L0 direction. The second POC difference may be a difference between the POC of the target image and the POC of the reference image in the L1 direction.

In another embodiment, X _BM may be determined based on a context model and/or a probability model for entropy encoding and entropy decoding motion information and encoding parameters of the target block.

For example, X _BM may be determined based on at least one of a context model and/or a probability model for entropy encoding and entropy decoding an inter-prediction indicator indicating an inter-prediction direction of a target block.

From the context model and/or the probability model used in the entropy encoding and entropy decoding of the inter prediction indicator, a direction more likely in the target block among the single-sided prediction in the L0 direction and the unidirectional prediction in the L1 direction may be selected as the LX _BM direction, i.e., a direction in which motion information enhancement is performed.

Alternatively, a direction more likely in the target block among the single-sided prediction in the L0 direction and the unidirectional prediction in the L1 direction may be selected as the L (1-X _BM) direction, i.e., a direction in which no motion information enhancement is performed, by a context model and/or a probability model used in entropy encoding and entropy decoding of the inter prediction indicator.

The more probable direction may mean a direction using fewer bits when performing entropy encoding using a context model and/or a probability model. Alternatively, a more likely direction may mean a direction for which the probability of deriving a context model and/or a probability model for the respective direction is higher.

In another embodiment, LX _BM may be determined based on the weights in bi-prediction with CU weights for the target block. For example, LX _BM may be a direction having a higher weight among the L0 direction and the L1 direction. Alternatively, LX _BM may be the direction having the lower weight among the L0 direction and the L1 direction.

In another embodiment, X _BM may be determined based on at least one of motion information and coding parameters of neighboring blocks. For example, X in the target block may be determined based on at least one of the motion information and the encoding parameters of the neighboring block corresponding to at least one of A0, A1, B0, B1, and B2 of fig. 5.

For example, X _BM of the target block may be determined based on at least one of an inter prediction indicator and an inter bi-prediction weight in the neighboring block.

For example, one or more context models and/or probability models may be used for entropy encoding and entropy decoding of X _BM. Among the plurality of context models and/or probability models, a context model and/or probability model for entropy encoding and entropy decoding X _BM in the target block may be determined based on at least one of motion information and encoding information of the neighboring blocks.

The context model and/or the probability model for entropy encoding and entropy decoding of X _BM in a block may be the same. Optionally, the context model and/or the probability model for entropy encoding and entropy decoding of X _BM in a block may vary according to at least one of inter prediction direction and inter bi-prediction weight of neighboring blocks.

The same X _BM may be used in the search step of bilateral matching. Alternatively, different X _BM may be used in each search step of the bilateral matching.

Hereinafter, bm_num indicating the number of directions in which motion information is improved among the L0 direction and the L1 direction will be described.

■BM_NUM

Bm_num may be 0,1, 2 or a positive integer.

Bm_num may be predefined.

BM _ NUM in each search step of bilateral matching may be determined based on the encoding parameters. Alternatively, bm_num may be determined based on at least one of motion information, a search step of bilateral matching, a matching cost in a previous search step, a matching cost of initial motion information for a current search step, and bm_num in a previous search step.

In one embodiment, bm_num in the first search step of bilateral matching may be 1 or 2.

In one embodiment, bm_num for the current search step may be determined based on the matching costs in the previous search step.

For example, if the difference between the matching cost of the initial motion information for the previous search step and the matching cost of the improved motion information for the previous search step is less than COSTDIFF _ FORBMNUM, the bm_num of the current search step may be 0.

COSTDIFF _ FORBMNUM may be 0, 1, 2, 4, 8, 16 or a positive integer.

COSTDIFF _ FORBMNUM may be determined based on the size of the target block. COSTDIFF _ FORBMNUM may be the product of the number of pixels in the target block and a particular value. The specific value may be 0, 1, 2, 4, 8 or a positive integer.

In one embodiment, if bm_num in the previous search step is 0, bm_num in the current search step may be 0.

In one embodiment, if the matching cost of the initial motion information of the current search step is less than COSTDIFF _ FORBMNUM _init, the bm_num of the target block may be 0.

COSTDIFF _ FORBMNUM _INIT may be 0,1, 2, 4, 8 or a positive integer.

For example, COSTDIFF _ FORBMNUM _init may be determined based on the size of the target block. COSTDIFF _ FORBMNUM _INIT may be the product of the number of pixels in the target block and a particular value. The specific value may be 0, 1,2, 4, 8 or a positive integer.

In one embodiment, when motion refinement is performed in a search stage of bilateral matching, refinement of motion information in the L0 direction may be performed only when the matching cost of motion information in the L0 direction of initial motion information of a current search stage is greater than COSTDIFF _ FORBMNUM _init.

In some embodiments, when motion refinement is performed in a search stage of bilateral matching, refinement of motion information in the L1 direction may be performed only when the matching cost of motion information in the L1 direction of initial motion information of a current search stage is greater than COSTDIFF _ FORBMNUM _init.

A bm_num of 0 for a particular search step of bilateral matching may indicate that no improvement of motion information is performed in that particular search step. Alternatively, a bm_num of 0 for a particular search step of bilateral matching may indicate that the particular search step is not performed.

For example, when bilateral matching is performed, bm_num in the motion information derivation step for a whole block may be 1, and bm_num in the motion information derivation step for a sub block may be 2. In this case, only the motion information in the LX _BM direction may be improved in the motion information derivation step for the whole block, and both the motion information in the L0 direction and the motion information in the L1 direction may be improved in the motion information derivation step for the sub-block.

FIG. 25 illustrates bilateral matching according to one embodiment.

Fig. 25 shows a case where bm_num is 2 in the motion information derivation step of the bilateral matching block.

MV ₀ represents initial motion information in the L0 direction, and MV ₁ represents initial motion information in the L1 direction.

MV _diff may mean motion information improvement values derived by bilateral matching.

MV _0' and MV _1' may be motion information derived by bilateral matching.

In bilateral matching, the magnitude of the motion information improvement value in the L0 direction may be the same as the magnitude of the motion information improvement value in the L1 direction. The direction of the motion information improvement value in the L0 direction may be opposite to the direction of the motion information improvement value in the L1 direction. In other words, the following equations 3 and 4 can be established.

[ Equation 3] MV _0'=MV₀+MV_diff

[ Equation 4] MV _1'=MV₁-MV_diff

For example, given that the target block is in IBC mode and two or more reference blocks are used, when performing bilateral matching on the target block, the values and directions for the motion information improvement of the first reference block and for the motion information improvement in the L1 direction may be the same.

The sub-sampling in the above template matching or bilateral prediction may be performed based on at least one of whether to perform the template matching or bilateral prediction, an indicator indicating whether to perform the template matching or bilateral prediction, motion information of the target block, encoding parameters of the target block, and a search step in the template matching.

For example, the sub-sampling method may be determined based on at least one of whether to perform template matching, an indicator indicating whether to perform template matching, motion information of a target block, encoding parameters of the target block, and a search step in template matching.

The sub-sampling method used in template matching or bilateral matching may be the same for the search step. Alternatively, the sub-sampling method used in performing template matching and/or bilateral matching may be different for each search step.

In another example, whether to perform sub-sampling may be determined based on at least one of whether to perform template matching, an indicator indicating whether to perform template matching, motion information of a target block, encoding parameters of the target block, and a search step in template matching.

In performing template matching, whether to perform sub-sampling may be the same for each search step. Alternatively, whether or not to perform sub-sampling may be different for each search step.

Whether or not sub-sampling is performed in the horizontal direction and whether or not sub-sampling is performed in the vertical direction may be the same. Alternatively, whether or not sub-sampling is performed in the horizontal direction and whether or not sub-sampling is performed in the vertical direction may be different from each other.

3.5 Sub-block based bilateral matching

When performing bilateral matching, motion information may always be derived in sub-block units only. In other words, the derivation of motion information may not be performed for the whole block.

In performing the bilateral matching, the indicator information regarding whether the derivation of the motion information is performed in sub-block units may be signaled/encoded/decoded.

In performing the bilateral matching, whether to refine the motion information in sub-block units may be determined based on at least one of a size of the target block, a coding parameter of the target block, motion information of the target block, a coding parameter of a neighboring block, and motion information of the neighboring block.

In some embodiments, when performing bilateral matching, when the width of the target block is W and the height of the target block is H, deriving motion information in SUB-block units may be performed only when the larger of W and H is greater than min_size_thres_for_sub. FOR example, only when the larger of W and H is greater than a first threshold value (min_size_thres_for_sub), an indicator indicating whether the derivation of motion information is performed in SUB-block units may be encoded/decoded.

MIN_SIZE_THRES FOR SUB may be a predefined value. FOR example, MIN_SIZE_THRES_FOR_SUB may be 8, 16, 32, 64, 128 or a positive integer.

Alternatively or additionally, deriving motion information in SUB-block units may be performed only when the larger of W and H is less than a second threshold (max_size_thres_for_sub). FOR example, only when the larger of W and H is smaller than max_size_thres_for_sub, an indicator indicating whether the derivation of motion information is performed in SUB-block units may be encoded/decoded.

Max_size_thres_for_sub may be a predefined value, e.g. 8, 16, 32, 64, 128 or a positive integer.

In some embodiments, deriving motion information in SUB-block units may be performed only when the smaller of W and H is greater than min_size_thres_for_sub. FOR example, only when the smaller of W and H is greater than min_size_thres_for_sub, an indicator indicating whether the derivation of motion information is performed in SUB-block units may be encoded/decoded.

Alternatively or additionally, deriving motion information in SUB-block units may be performed only when the smaller of W and H is less than max_size_thres_for_sub. FOR example, only when the smaller of W and H is smaller than max_size_thres_for_sub, an indicator indicating whether the derivation of motion information is performed in SUB-block units may be encoded/decoded.

In some embodiments, when the width of the target block is W and the height of the target block is H, deriving motion information in SUB-block units may be performed only when the result value of w×h (i.e., the AREA of the target block or the total number of samples in the target block) is greater than a third threshold (min_size_area_thres_for_sub). FOR example, only when the result value of w×h is greater than min_size_area_thres_for_sub, an indicator indicating whether the derivation of motion information is performed in a SUB-block unit may be encoded/decoded.

MIN_SIZE_AREA THRES FOR u SUB may be a predefined value. FOR example, MIN_SIZE_AREA_THRES_FOR_SUB may be 8, 16, 32, 64, 128, or a positive integer.

Alternatively or additionally, deriving motion information in SUB-block units may be performed only if the resulting value of w×h is less than a fourth threshold (max_size_area_thres_for_sub). FOR example, only when the result value of w×h is smaller than max_size_area_thres_for_sub, an indicator indicating whether to perform derivation of motion information in a SUB-block unit may be encoded/decoded.

Max_size_area_thres_for_sub may be a predefined value, e.g. 8, 16, 32, 64, 128 or a positive integer.

Further, when performing bi-directional matching in sub-block units, size information of at least one sub-block performing bi-directional matching (i.e., performing derivation of motion information) may be signaled/encoded/decoded.

Alternatively, at least one of the width and the height of the sub-block may be determined based on at least one of a size of the target block, an encoding parameter of the target block, motion information of the target block, an encoding parameter of a neighboring block, and motion information of the neighboring block.

In some embodiments, at least one of the width and the height of the sub-block may be one of the sizes belonging to the predetermined list.

For example, the predetermined list may be a list of size information.

For example, the predetermined list may be predefined. In some embodiments, deriving motion information in sub-block units (i.e., bi-directional matching, wherein at least one of the width and height of the sub-block has a SIZE of SUBBLOCK _size_all) may be performed at least once.

SUBBLOCK _size_always may be a predefined value. For example, SUBBLOCK _size_always may be 1,2, 4, 8, 16, 32, 64, 128, or a positive integer.

Information about SUBBLOCK _size_always may be signaled/encoded/decoded.

In some embodiments, if the width of the target block is greater than a particular threshold width (e.g., SUBBLOCK _size_always), motion information may be derived in sub-block units at least once, where the sub-block units have a particular threshold width.

In some embodiments, if the height of the target block is greater than a particular threshold height (e.g., SUBBLOCK _size_always), deriving motion information in sub-block units may be performed at least once, where the sub-block units have a particular threshold height.

In some embodiments, if the width of the target block is less than a certain threshold width (e.g., SUBBLOCK _size_always), motion information may be derived in sub-block units at least once, where the sub-block has the same width as the target block.

In some embodiments, if the height of the target block is less than a certain threshold height (e.g., SUBBLOCK _size_always), deriving motion information in sub-block units may be performed at least once, where the sub-blocks have the same height as the target block.

In some embodiments, given that the width of the target block is W and the height of the target block is H, if the larger of W and H is greater than min_size_thres_for_ SUBBLOCK _size, the process of deriving motion information in sub-block units may be performed at least once, wherein the SIZE of the sub-block is SUBBLOCK _size1. For example, information about SUBBLOCK _size1 may be encoded/decoded. For example SUBBLOCK _size1 may be 1,2, 4, 8, 16, 32, 64, 128 or a positive integer. MIN_SIZE_THRES FOR SUBBLOCK. Mu.u the SIZE may be a predefined value. FOR example, min_size_thres_for_ SUBBLOCK _size may be 8, 16, 32, 64, 128 or a positive integer.

Additionally or alternatively, if the larger of W and H is less than max_size_thres_for_ SUBBLOCK _size, the process of deriving motion information in sub-block units may be performed at least once, wherein the sub-block has a SIZE of SUBBLOCK _size2. For example, information about SUBBLOCK _sig2 may be encoded/decoded. For example SUBBLOCK _sig2 may be 1, 2, 4, 8, 16, 32, 64, 128 or a positive integer. MAX SIZE THRES FOR SUBBLOCK. Mu.u the SIZE may be a predefined value. FOR example, max_size_thres_for_ SUBBLOCK _size may be 8, 16, 32, 64, 128 or a positive integer.

In some other embodiments, if the smaller of W and H is greater than min_size_thres_for_ SUBBLOCK _size, the process of deriving motion information in sub-block units may be performed at least once, where the sub-block is SUBBLOCK _size1 in SIZE.

In an additional or alternative embodiment, if the smaller of W and H is less than max_size_thres_for_ SUBBLOCK _size, the process of deriving motion information in sub-block units may be performed at least once, wherein the sub-block has a SIZE of SUBBLOCK _size2.

In some other embodiments, if the result value of w×h (i.e., the AREA of the target block or the total number of samples in the target block) is greater than min_size_area_thres_for_ SUBBLOCK _size, the process of deriving motion information in sub-block units may be performed at least once, wherein the sub-block has the SIZE of SUBBLOCK _size 3. For example, information about SUBBLOCK _sig3 may be encoded/decoded. For example SUBBLOCK _size3 may be 1, 2, 4, 8, 16, 32, 64, 128 or a positive integer. MIN_SIZE_AREA\u thres_for_ SUBBLOCK the_size may be a predefined value. FOR example, min_size_area_thres_for_ SUBBLOCK _size may be 8, 16, 32, 64, 128 or a positive integer.

Additionally or alternatively, if the result value of w×h is greater than max_size_area_thres_for_ SUBBLOCK _size, the step of deriving motion information in sub-block units may be performed at least once, wherein the sub-block has a SIZE of SUBBLOCK _size4. For example, information about SUBBLOCK _size4 may be encoded/decoded. For example SUBBLOCK _size4 may be 1,2, 4, 8, 16, 32, 64, 128 or a positive integer. MAX_SIZE_AREA\u thres_for_ SUBBLOCK the_size may be a predefined value. FOR example, max_size_area_thres_for_ SUBBLOCK _size may be 8, 16, 32, 64, 128 or a positive integer.

3.6 Double sided matching in monolithic units

In performing bilateral matching, it may be predefined whether to perform deriving motion information in whole units.

For example, when performing bilateral matching, it may always be performed to derive motion information in only a whole unit.

For example, when performing bilateral matching, indicator information regarding whether motion information derived in whole units is performed may be encoded/decoded.

In some embodiments, when the width of the target block is W and the height of the target block is H, deriving the motion information in a monolithic unit may be performed only if the larger of W and H is greater than min_size_thres_for_ WHOLE. Alternatively, an indicator indicating whether to perform deriving motion information in a monolithic unit may be encoded/decoded only if the larger of W and H is greater than min_size_thres_for_ WHOLE. MIN_SIZE_THRES FOR WHOLE may be a predefined value. FOR example, min_size_thres_for_ WHOLE may be 8, 16, 32, 64, 128, or a positive integer.

Additionally or alternatively, deriving motion information in a monolithic unit may be performed only if the larger of the targets W and H is less than max_size_thres_for_ WHOLE. Alternatively, the indicator indicating whether to perform deriving motion information in a monolithic unit may be encoded/decoded only if the larger of the targets W and H is less than max_size_thres_for_ WHOLE. MAX SIZE THRES FOR WHOLE may be a predefined value. FOR example, max_size_thres_for_ WHOLE may be 8, 16, 32, 64, 128, or a positive integer.

In some embodiments, deriving motion information in a monolithic unit may be performed only if the smaller of W and H is greater than min_size_thres_for_ WHOLE. Alternatively, an indicator indicating whether to perform deriving motion information in a monolithic unit may be encoded/decoded only if the smaller of W and H is greater than min_size_thres_for_ WHOLE. MIN_SIZE_THRES FOR WHOLE may be a predefined value. FOR example, min_size_thres_for_ WHOLE may be 8, 16, 32, 64, 128, or a positive integer.

Additionally or alternatively, deriving motion information in a monolithic unit may be performed only if the smaller of W and H is less than max_size_thres_for_ WHOLE. Alternatively, an indicator indicating whether to perform deriving motion information in a monolithic unit may be encoded/decoded only if the smaller of W and H is less than max_size_thres_for_ WHOLE. MAX SIZE THRES FOR WHOLE may be a predefined value. FOR example, max_size_thres_for_ WHOLE may be 8, 16, 32, 64, 128, or a positive integer.

In some embodiments, deriving motion information in a monolithic unit may be performed only if the result value of w×h (i.e., the AREA of the target block or the total number of samples in the target block) is greater than min_size_area_thres_for_ WHOLE. Alternatively, the indicator indicating whether to perform the derivation of the motion information in the whole unit may be encoded/decoded only if the result value of w×h is greater than min_size_area_thres_for_ WHOLE. MIN_SIZE_AREA THRES FOR u WHOLE may be predefined values. FOR example, min_size_area_thres_for_ WHOLE may be 8, 16, 32, 64, 128, or a positive integer.

Additionally or alternatively, deriving motion information in a monolithic unit may be performed only if the resulting value of w×h is less than max_size_area_thres_for_ WHOLE. Alternatively, the indicator indicating whether to perform the derivation of the motion information in the whole unit may be encoded/decoded only if the result value of w×h is less than max_size_area_thres_for_ WHOLE. Above MAX_SIZE AREA THRES u for_sub may be a predefined value. FOR example, MAX_SIZE_AREA_THRES_FOR_SUB may be 8, 16, 32, 64, 128, or a positive integer.

The template matching or bilateral matching described above may be enabled or performed under certain conditions.

In some embodiments, template matching or bilateral matching may be enabled (or performed) based on the type of target picture and/or the type of target strip.

For example, template matching and/or bilateral matching may be enabled (or performed) only if the target picture is a B picture or B slice.

In another example, template matching and/or bilateral matching may be enabled (or performed) in the intra merge mode described below only if the target picture is a B picture or B stripe.

In yet another example, template matching and/or bilateral matching may be enabled (or performed) in intra merge mode only if the target picture is an I-picture or I-slice.

In some embodiments, whether to enable (or perform) template matching and/or bilateral matching may be determined based on at least one of the POC of the reference picture candidate and the POC of the target picture in the reference picture list.

The reference picture list may include at least one of a reference picture list in an L0 direction and a reference picture list in an L1 direction. For example, template matching and/or bilateral matching may be enabled (or performed) in the target block only if the POC of all reference picture candidates in the reference picture list is less than the POC of the target picture.

The above-described information about the cost function to be used for template matching or bilateral matching may be encoded by the image encoding apparatus and transmitted to the image decoding apparatus.

An indicator indicating whether template matching or bilateral matching is performed in the target block may be signaled/encoded/decoded.

If an indicator is signaled/encoded/decoded in the target block indicating whether to perform template matching or bilateral matching in the target block, information for determining a cost function for calculating a matching cost when performing template matching or bilateral matching in the target block may be signaled/encoded/decoded. The cost function to be used when performing template matching or bilateral matching in the target block may be determined based on information for determining the cost function.

For example, the information used to determine the cost function may be an indicator. For example, when the information for determining the cost function has a first value, the SAD may be used as the cost function in the target block, and when the information has a second value, the MR-SAD may be used as the cost function in the target block.

The first and second values may be 0 and 1, respectively. Alternatively, the first and second values may be 1 and 0, respectively.

In some embodiments, information for determining the cost function may be signaled/encoded/decoded according to the size of the target block.

For example, if the size of the target block is less than a predetermined value, information for determining the cost function may be signaled/encoded/decoded.

If the size of the target block is greater than a predetermined value, information for determining a cost function in the target block may not be signaled/encoded/decoded, and a predefined cost function may be used instead. For example, the predefined cost function may be MR-SAD.

In another embodiment, when a matching cost is calculated using template matching or bilateral matching for candidates included in a motion information candidate list (e.g., a merge candidate list), a cost function to be used among a plurality of cost functions may be adaptively determined through a candidate index. In other words, different cost functions may be selected depending on whether the candidate index is a particular value.

For example, when the candidate index of the candidate selected from the candidate list is an even number, the first cost function may be used to calculate a matching cost to be used to refine the motion information of the candidate. On the other hand, when the candidate index is an odd number, a second cost function may be used.

Furthermore, the cost function used in template matching and the cost function used in bilateral matching may be different from each other.

For example, in the case of template matching, SATD may be used as a first cost function and SAD may be used as a second cost function. In the case of bilateral matching, the weighted SAD may be used as a first cost function and the SAD may be used as a second cost function.

4. Signaling based on neighborhood availability

It may be determined whether to perform at least one of signaling, encoding, and decoding of information about a specific mode based on availability of neighboring blocks of the target block and/or availability of neighboring samples of the target block.

The specific pattern used in encoding or decoding the target block may correspond to a pattern of neighboring blocks of the reference target block and/or neighboring samples of the target block.

For example, the particular pattern may correspond to a pattern of templates that configure neighboring blocks including the target block and/or neighboring samplings of the target block, similar to the template matching pattern.

At least one of signaling, encoding, and decoding of information for a specific mode may be performed only when a neighboring block of the target block and/or a neighboring sample of the target block is available.

For example, the availability of neighboring blocks of the target block and/or the availability of neighboring samples of the target block may be checked, and signaling/encoding/decoding of information regarding a specific pattern may be performed only when neighboring blocks and/or neighboring samples are available.

5. Prediction in sub-block units

The following disclosure relates to prediction for a target block in sub-block units. The prediction modes in sub-block units according to the present disclosure may include affine transformation modes and prediction modes in sub-block units using motion shifting (or block shifting).

Affine transformations can be used to predict various motions in real world applications. For example, affine transformation may be used to predict zoom-in/out, rotation, and irregular motion.

Affine transformation may be performed using a plurality of CPMV. Multiple transformation models may be defined depending on the number of CPMV to be used.

FIG. 26 illustrates an affine transformation model according to one embodiment.

Fig. 26 (a) shows a 4-parameter affine transformation model, and fig. 26 (b) shows a 6-parameter affine transformation model.

In the context of the view of figure 26,、AndThe CPMV corresponding to the upper left corner, the upper right corner, and the lower left corner of the target block are represented, respectively.

Affine transformation modes may include affine merge mode and affine Advanced Motion Vector Prediction (AMVP) mode, depending on the method of deriving control point motion information.

When prediction in sub-block units is performed on a target block, information indicating whether the target block is in affine merge mode or affine AMVP mode may be signaled/encoded/decoded or predefined. Based on this information, affine transformation combining mode or affine transformation AMVP mode may be performed on the target block.

When the affine transformation pattern is applied to the target block, information indicating whether the type of affine transformation model to be applied to the target block is a 4-parameter affine transformation model or a 6-parameter affine transformation model may be signaled/encoded/decoded or predefined. Based on the above information, affine transformation using a 4-parameter affine transformation model or affine transformation using a 6-parameter affine transformation model may be performed on the target block.

In some embodiments, when the mode of the target block corresponds to the affine transformation AMVP mode, information indicating whether the affine transformation model is a 4-parameter affine transformation model or a 6-parameter affine transformation model may be signaled/encoded/decoded.

In some embodiments, when the mode of the target block corresponds to an affine transformation merging mode, an affine mode using motion information offset, or a prediction mode in sub-block units using motion shifting, information indicating whether the affine transformation model used in the target block is a 4-parameter affine transformation model or a 6-parameter affine transformation model may be determined based on at least one piece of information (e.g., whether the candidate is a 4-parameter affine transformation model or a 6-parameter affine transformation model) among candidates specified from a motion information candidate list according to the signaled/encoded/decoded information and/or a predefined rule for the target block.

For example, if the affine transformation model for a specific candidate selected from the motion information candidate list of the target block is a 4-parameter affine transformation model, prediction may be performed using the 4-parameter affine transformation model for the target block, and if the affine transformation model is a 6-parameter affine transformation model, prediction may also be performed using the 6-parameter affine transformation model for the target block.

Information indicating whether prediction with a sub-block unit is allowed or enabled may be signaled/encoded/decoded or predefined in units of at least one of a video parameter set, a decoding parameter set, a sequence parameter set, an adaptive parameter set, a picture header, a sub-picture header, a slice header, a parallel block group header, a parallel block header, a block, a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), a Transform Unit (TU), a Coding Block (CB), a Prediction Block (PB), or a Transform Block (TB).

If prediction in the sub-block unit is not performed at the specific unit, information indicating whether prediction in the corresponding sub-block unit is performed may be set to a first value (e.g., 0), and information indicating whether prediction in the sub-block unit is performed at a lower level of the specific unit may not be signaled/decoded.

If prediction in a sub-block unit is performed at a specific unit, information indicating whether prediction in a corresponding sub-unit is performed may be set to a second value (e.g., 1), and information indicating whether prediction in a sub-block unit is performed at a lower level of the specific unit may be signaled/decoded. Based on the above information, prediction in sub-block units may or may not be performed for the target block.

In addition, the prediction mode in sub-block units may include a prediction mode in sub-block units to which a motion information offset is applied. The motion information offset refers to a value added to the base motion vector. When a motion information offset is applied to a prediction mode in a sub-block unit using a motion shift (or block shift), a base motion vector may correspond to the motion shift. Further, when applied to the affine transformation mode, the basic motion vector may be a Control Point Motion Vector (CPMV).

5.1 Prediction in sub-block units Using motion Shift (or block Shift)

In the sub-block unit prediction mode using motion shifting, prediction information of a target block may be acquired in units of sub-blocks.

A sub-block unit prediction mode using motion shifting may be applied to at least one of inter prediction, IBC mode, and intra prediction. For example, for each of inter prediction, IBC mode, and intra prediction, prediction information acquired in a sub-block unit may be motion information, a block vector, and an intra prediction mode.

In the sub-block unit prediction mode using motion shifting, a target block is partitioned into sub-block units of an n×m size, and prediction information may be determined in sub-block units. Then, a prediction block of the target block may be acquired by predicting each sub-block using prediction information corresponding to the sub-block.

N and M may be 2, 4, 8, 16, 32, 64 or a positive integer. N and M may be predefined values.

Alternatively, information about at least one of N and M may be signaled/encoded/decoded, and at least one of N and M may be determined based on the information.

Alternatively, at least one of N and M may be determined based on at least one of a size of the target block, motion information of the target block, intra coding method information of the target block, and coding parameters of the target block.

The term "intra coding method" used in the present disclosure may include conventional intra prediction modes (directional mode and non-directional mode), matrix-based intra prediction modes, IBC modes, and intra template matching modes. Further, the term "intra coding method information" may include information (indicator) indicating whether to perform each intra coding method and/or information required to perform each intra coding method. For example, in the case of IBC mode, the information may be a block vector, a block vector candidate list configured to derive a block vector, etc.

Motion shifts for prediction in sub-block units may be derived from blocks encoded or decoded prior to the target block. For example, the motion shift may be derived from at least one of a neighboring block of the target block, a non-neighboring block of the target block, a motion shift candidate list derived from the neighboring or non-neighboring block, or a motion information table storing motion information of inter prediction blocks that have been encoded or decoded.

The motion shift may be a motion vector. Alternatively, the motion shift may correspond to information including a motion vector and a reference picture index. For example, the motion-shifted reference picture index may be set as the reference picture index of a block that has been encoded or decoded.

In some embodiments, the motion shift may be derived based on motion information of one or more neighboring or non-neighboring blocks at predefined particular locations.

In some other embodiments, information (e.g., candidate index) for specifying a motion shift may be signaled/encoded/decoded through a motion shift candidate list or motion information table. This information may be binarized by rice codes having specific parameters, such as the basic motion information index 1.

In some other embodiments, the motion shift may be determined based on at least one of motion information of the target block (e.g., a motion shift candidate list of the target block or an index of candidates in the motion shift candidate list), template matching costs for each candidate in the motion shift candidate list, a location indicated by the motion shift (or a sample or block at the location), a neighboring block (or sample) of the location indicated by the motion shift (or a sample or block at the location), intra-coding method information of the target block, coding parameters of the target block, intra-coding method information of the neighboring block, coding parameters of the neighboring block, neighboring samples of the target block, and at least one neighboring sample of a reference block of the target block. If the target block is a chrominance component block, luminance component samples of the target block (at least one sample in a luminance component block or region corresponding to the target block) may be additionally considered.

For example, motion information corresponding to at least one candidate in the motion shift candidate list having a predefined specific index may be determined as the motion shift. The specific index may refer to an index whose value is odd or even, or an index less than or equal to a predetermined threshold value.

In another example, the motion shift may be derived by performing template matching on the target block. In other words, the motion shift may be set by searching for a reference block having a reference template most similar to the target template of the target block (i.e., providing the lowest matching cost) and using the displacement between the target block and the searched reference block. By template matching of the target blocks, one or more motion shifts may be determined in order of lowest matching cost.

In another example, the motion shift may be selected from the candidate list based on a template matching cost for each candidate in the motion shift candidate list derived from neighboring blocks to the target block, non-neighboring blocks to the target block, etc. In other words, a matching cost between the reference template at the position indicated by the motion information of each candidate and the target template of the target block may be calculated, and the motion shift may be set based on at least one candidate providing the lowest matching cost. For example, one or more motion shifts may be determined in the order of lowest matching cost.

In the embodiments of the present disclosure described above and/or the embodiments described below, a col picture to be described later may be used to calculate a matching cost of a motion shift candidate or a motion information candidate. For example, the reference block and the reference template may be included in the col picture.

When determining the at least one motion shift, prediction information of each sub-block within the target block may be determined based on prediction information of the reference block indicated by the motion shift. The size of the reference block indicated by the motion shift for the target block may be the same as the size of the target block.

The prediction information (motion information, block vector, or intra prediction mode) of a specific sub-block within the target block may be determined based on the prediction information of a reference sub-block at a position corresponding to the specific sub-block in the reference block indicated by the motion shift. The invalid reference subblock may be replaced with a centrally located reference subblock. Here, the invalid reference subblock may correspond to a case where there is no required prediction information. For example, when inter prediction is applied to a target block, a reference sub-block having no motion information is considered invalid.

For example, when inter prediction is applied to a target block, motion information of a specific sub-block within the target block may be derived from motion information of a corresponding reference sub-block within a reference block at a position indicated by a motion shift within a reference picture (or col picture) other than the target picture.

For example, the motion vector of a particular sub-block within the target block may be derived based on the motion vector of the corresponding reference sub-block.

For example, the reference pictures of a particular sub-block within a target block may always be fixed, regardless of the reference pictures of the corresponding reference sub-block. For example, a first reference picture (reference picture with reference index 0) within the reference picture list may be set as the reference picture.

The col picture may refer to a picture containing a reference block that is determined based on a motion shift to determine motion information of a sub-block within a target block. In other words, the motion information of the sub-block within the target block may be determined by using, as a reference block, a block at a position indicated by a motion shift (or motion vector of the motion shift) existing in the col block of the target block from the col block.

The col picture may be a picture existing within the reference picture list. For example, the col picture may be selected from at least one of an L0 reference picture list and an L1 reference picture list.

At least one or more col pictures may be selected.

For example, the col picture may be determined as at least one reference picture having the smallest reference picture index among the reference pictures in the reference picture list.

Alternatively, the col picture may be set to at least one reference picture having the smallest POC difference from the target picture among the reference pictures in the reference picture list.

Alternatively, col picture may be set as the picture indicated by the motion-shifted reference picture index.

Alternatively, information indicating col pictures may be signaled/encoded/decoded.

In addition, if a sub-block within a reference block of a col picture is not predicted using inter prediction, the motion information of the sub-block at the center of the reference block may be replaced with the motion information of the sub-block.

In some embodiments, after determining the motion information of the sub-block in the target block from the reference block in the col picture, the motion information of the sub-block may be further improved.

For example, a specific offset vector may be added to the motion vector of the sub-block. Information about the offset vector may be signaled/encoded/decoded. The information about the offset vector may be an index for specifying one from a list including a plurality of predefined offset vector candidates. In other words, an offset vector is selected from among a plurality of predefined offset vector candidates.

Further, information related to a combination of prediction information of sub-blocks in the target block derived by the motion shift may be included in a candidate list related to the prediction information.

For example, in the case of inter prediction, a candidate related to a combination of motion information of sub-blocks within a target block derived by motion shift (hereinafter, this candidate is referred to as a "sub-block-based motion information candidate") may be included as a candidate of a motion information candidate list of the target block. At this time, one or more sub-block-based motion information candidates may be included as candidates of the motion information candidate list.

The sub-block based motion information may be a motion shift used to derive a motion information combination of the sub-blocks or a motion information combination of the sub-blocks derived from the motion shift.

When the candidate from the motion information candidate list is sub-block-based motion information, the target block may be predicted in units of sub-blocks using a motion shift corresponding to the candidate. Alternatively, the target block may be predicted in units of sub-blocks using a combination of motion information of sub-blocks corresponding to candidates.

Here, the motion information candidate list may include at least one of an affine merge candidate list, an affine AMVP list, an affine transformation candidate list, a sub-block-based motion information candidate list, a merge candidate list, or an AMVP list.

If the motion information candidate list of the target block includes sub-block-based motion information candidates, each candidate of the motion information candidate list may include information on whether the candidate corresponds to sub-block-based motion information. Alternatively, based on signaled/encoded/decoded information and/or implicit rules, it may be determined whether a candidate selected from the motion information candidate list is used for sub-block based prediction using motion shifting.

Information about sub-block unit-based prediction using motion shifting in a target block may be signaled/encoded/decoded.

For example, an indicator indicating whether to perform sub-block unit prediction using motion shifting in a target block may be signaled/encoded/decoded.

Alternatively, whether to perform sub-block unit prediction using motion shift in the target block may be determined based on at least one of motion information of the target block (e.g., a motion information candidate list, a merge index in a merge mode, or an MV candidate index in an AMVP mode), intra coding method information of the target block (e.g., an indicator indicating whether an IBC mode or an intra template matching mode is applied to the target block), coding parameters of the target block, intra coding method information of neighboring blocks, coding parameters of neighboring blocks, neighboring samples of the target block, and at least one neighboring sample of a reference block of the target block, without involving signaling/encoding/decoding of information. If the target block is a chrominance component block, luminance component samples of the target block (at least one sample within a luminance component block or region corresponding to the target block) may be additionally considered.

Alternatively, whether to signal/encode/decode information regarding whether to perform sub-block unit prediction using motion shift in the target block may be determined based on at least one of the encoding mode of the target block, the motion information of the target block, the intra-coding method information of the target block, the coding parameters of the target block, the intra-coding method information of the neighboring block, the coding parameters of the neighboring block, the neighboring sample point of the target block, and the at least one neighboring sample point of the reference block of the target block, without involving the signaling/encoding/decoding of the information. If the target block is a chrominance component block, luminance component samples of the target block (at least one sample within a luminance component block or region corresponding to the target block) may be additionally considered.

For example, if the size of the target block is greater than or equal to a threshold, sub-block unit prediction using motion shifting may be performed for the target block, and information regarding sub-block unit prediction using motion shifting may be signaled/encoded/decoded. The threshold may be 4, 8, 16, 32, 64, 128 or a positive integer.

5.2 Affine transformation modes

Affine transformation modes can be applied to IBC modes as well as inter prediction.

Hereinafter, for convenience of description, an affine transformation mode for inter prediction is described, but the same description may be applied to an IBC mode. When the affine transformation mode is applied to the IBC mode, the terms "motion information" and "motion vector" used below may be replaced with "block vector information" and "block vector".

As described above, the affine transformation mode is a mode of predicting a sample point within a target block based on a transformation model using a plurality of Control Point Motion Vectors (CPMV). The plurality of CPMV may be motion vectors corresponding to respective corners of the target block.

The affine transformation may be divided into an affine merge mode and an affine AMVP mode according to the derivation method of CPMV.

In the affine merge mode, the CPMV of the target block inherits information of motion information candidates belonging to the motion information candidate list. In other words, a plurality of motion vectors of the candidate are set as CPMV of the target block.

In the affine AMVP mode, the image encoding apparatus generates a motion vector difference between each CPMV of the target block and a corresponding CPMV of a motion information candidate selected from the motion information candidate list, and encodes the motion vector difference. The image decoding apparatus derives each CPMV of the target block using a motion vector difference decoded from the bitstream and a motion information candidate selected from the motion information candidate list.

When a plurality of CPMVs for a target block are determined, an image encoding apparatus or an image decoding apparatus derives motion information in units of sub-blocks or in units of samples within the target block using the CPMVs and predicts samples within the target block in units of sub-blocks or in units of samples using the derived motion information.

Hereinafter, a process of (1) configuring a motion information candidate list and/or deriving CPMV and (2) predicting a target block using an affine transformation model will be described in detail.

(1) Configuration of motion information candidate list and/or derivation of CPMV

The motion information candidate list may mean at least one of an affine transformation AMVP list and an affine transformation merge candidate list.

The maximum size of the motion information candidate list may be num_max_cand.

Num_max_cand may be 1,2,3, 4, 6, 8, 12, 24 or a positive integer.

Num_max_cand may be determined based on motion information or encoding parameters of the target block.

The motion information candidate list of the target block may include up to num_max_cand motion information candidates. Hereinafter, the term "motion information candidates" used in the affine transformation mode may be replaced with the term "affine transformation candidates".

Each affine transformation candidate has a combination of CPMV. For example, the combination of CPMV corresponding to each candidate may be expressed as { CPMV1, CPMV2} or { CPMV1, CPMV2, CPMV3}.

In the above description, CPMV1, CPMV2, and CPMV3 may refer to CPMV corresponding to the upper left corner of the target block (in fig. 26) CPMV corresponding to the upper right corner (in FIG. 26) And CPMV corresponding to the lower left corner (in FIG. 26)。

The affine transformation candidates may include at least one of inter prediction direction, CPMV combination, or reference picture information (e.g., reference picture index).

For example, the motion information candidate may include at least one of a CPMV combination in the L0 direction or a CPMV combination in the L1 direction.

For example, the motion information candidate may include at least one of reference picture information in the L0 direction (e.g., reference picture index) or reference picture information in the L1 direction (e.g., reference picture index).

The motion information candidate list for the target block may be configured using motion information of at least one of neighboring blocks, non-neighboring blocks, or motion information tables of the target block according to a predefined rule (or order). Alternatively, as described below, the temporal neighboring blocks of the target block may be used to derive a motion information candidate list.

The type of each motion information candidate constituting the motion information candidate list may correspond to at least one of motion information candidate types 1 to 4 described below.

The motion information candidates used in the affine transformation mode may be configured using the motion information candidate type 2 to the motion information candidate type 4.

■ Motion information candidate type 1 motion information candidates based on subblocks for use in subblock unit prediction using motion shifting

As described above, sub-block based motion information candidates related to sub-block unit prediction using motion shifting may be included in the motion information candidate list.

The motion shift for sub-block unit prediction may be a motion vector of a block encoded/decoded before the target block or a motion vector determined based thereon.

For example, when the motion shift candidate of the target block is configured based on the reconstructed block and/or the specific motion information (or block) in the motion information table, the specific motion information may be bi-directional motion information. In this case, the motion shift candidate for the target block may be configured using one of the following methods.

Method 1 configuring motion shift candidates using only L0-direction motion information of corresponding motion information

Method 2 configuring motion shift candidates using only L1-direction motion information of corresponding motion information

Method 3 configures a first motion shift candidate using only L0-direction motion information of corresponding motion information and configures a second motion shift candidate using only L1-direction motion information of corresponding motion information.

Method 4 configures a motion shift candidate using motion information in a direction having a lower template matching cost among L0-direction motion information of corresponding motion information and L1-direction motion information of corresponding motion information

At least one sub-block based motion information candidate generated by using motion-shifted sub-block unit prediction may be generated and added as a candidate of a motion information candidate list.

For example, a reference block using at least one motion shift may be obtained from each of a first col picture and a second col picture selected based on a POC difference from a target picture. For example, at least one motion shift may be selected based on the template matching cost for each candidate within the motion shift candidate list as described above.

Here, it is assumed that the first col picture has a smaller POC difference from the target picture than the second col picture.

A sub-block motion information combination for the target block may be generated from each of a plurality of reference blocks indicated by at least one motion shift in each of the first col picture and the second col picture. Then, sub-block based motion information candidates related to the plurality of sub-block motion information combinations may be added to the motion information candidate list.

For example, candidates related to a sub-block motion information combination derived from the first col picture by a motion shift that provides the lowest matching cost may be added as a first candidate to the motion information candidate list.

Candidates related to the sub-block motion information combination derived from the second col picture by the motion shift providing the lowest matching cost may be added as second candidates to the motion information candidate list.

The maximum number of sub-block based motion information candidates that may be added to the motion information candidate list may be limited. For example, the maximum number may be 2, 3 or 4. The remaining sub-block based motion information candidates may be reordered along with other motion information candidates within the motion information candidate list based on the template matching cost, in addition to the first added sub-block based motion information candidate.

■ Motion information candidate type 2 inherited affine transformation candidate

The CPMV of the target block and/or the motion information candidates of the target block may be derived based on affine transformation models in blocks encoded/decoded before the target block.

For example, CPMV of a block encoded/decoded using an affine transformation mode before a target block may be used as affine transformation candidates for the target block.

The inherited affine transformation candidates may always be candidates for the 6-parameter affine transformation model. In other words, the inherited affine transformation candidates may always have three CPMV. In this case, inherited affine transformation candidates can be derived from a block precoded or pre-decoded to which a 6-parameter affine transformation model is applied.

■ Motion information candidate type 3 constructed affine transformation candidate

At least one constructed affine transformation candidate may be derived for the target block based on at least two blocks (candidate blocks) encoded/decoded before the target block.

The candidate block may include at least one of a neighboring block adjacent to the target block and a non-neighboring block not adjacent to the target block.

The constructed affine transformation candidates may be constructed by combining motion information of at least two candidate blocks. In other words, the CPMV combination of affine transformation candidates of the target block can be constructed using the motion vectors (translational motion vectors) of at least two candidate blocks to which the affine transformation pattern is not applied.

For example, when the motion information of the first candidate block and the motion information of the second candidate block are referred to as first motion information and second motion information, respectively, the constructed affine transformation candidates derived from the first candidate block and the second candidate block may correspond to { first motion information, second motion information } and/or { second motion information, first motion information }.

The CPMV combination of constructed affine transformation candidates may be constructed by combining motion vectors of at least two blocks.

The constructed affine transformation candidates may include parallel translation candidates.

Parallel translation candidates may refer to affine transformation candidates that provide the same CPMV.

For example, parallel translation candidates may always be candidates for a 6-parameter affine transformation model. In other words, the parallel translation candidate may always have three CPMV.

The constructed affine transformation candidates may be derived using motion information of at least two candidate blocks determined from at least one of neighboring blocks, non-neighboring blocks or a motion information table according to a predefined rule (or order).

The constructed affine transformation candidates may be derived based on a 4-parameter affine transformation model using CPMV at the upper left corner as motion information of the first candidate block and CPMV at the upper right corner as motion information of the second candidate block.

In some embodiments, the constructed affine transformation candidates of the 4-parameter affine transformation model may be derived as follows.

(1) First, it is checked whether the condition that L0 direction motion information exists in both the first candidate block and the second candidate block and that the L0 direction reference picture (or reference picture index) is the same in all the candidate blocks is satisfied.

(2) When the above condition is satisfied, the CPMV combination in the L0 direction for the constructed affine transformation candidate is set to { motion information of the first candidate block, motion information of the second candidate block }. Then, it may be determined that the reference picture (or reference picture index) in the L0 direction is the same as the reference picture (or reference picture index) in the L0 direction of the first candidate block and/or the reference picture (or reference picture index) in the L0 direction of the second candidate block.

(3) If the above condition is not satisfied, it is considered that the constructed affine transformation candidate does not have motion information in the L0 direction.

(4) The processes of (1) to (3) described above are performed in the same manner on the first candidate block and the second candidate block in the L1 direction.

Through the above processing, if the constructed affine transformation candidate lacks both the motion information in the L0 direction and the motion information in the L1 direction, the constructed affine transformation candidate may not be derived from the first candidate block and the second candidate block.

If the constructed affine transformation candidate has motion information only in one direction LX (x=0 or 1) among the L0 direction and the L1 direction, the inter-prediction direction of the constructed affine transformation candidate is determined as one direction in the LX direction.

If the constructed affine transformation candidates have motion information in both the L0 and L1 directions, the inter prediction direction of the constructed affine transformation candidates is determined to be bi-directional.

In some other embodiments, the constructed affine transformation candidates of the 4-parameter affine transformation model may be derived as follows.

(1) It is checked whether a first condition is fulfilled, the first condition indicating that motion information in the L0 direction is present for both the first candidate block and the second candidate block.

(2) If the first condition is satisfied, it is checked whether a second condition is satisfied, the second condition indicating that L0 direction reference pictures (or reference picture indexes) of all candidate blocks are identical.

If the second condition is satisfied, the CPMV combination in the L0 direction of the constructed affine transformation candidate is set to { the motion information of the first candidate block, the motion information of the second candidate block }. And, the reference picture (or reference picture index) in the L0 direction of the constructed affine transformation candidate is set to be the same as the reference picture (or reference picture index) in the L0 direction of the first candidate block and/or the reference picture (or reference picture index) in the L0 direction of the second candidate block.

If the second condition is not satisfied, the CPMV combination in the L0 direction of the constructed affine transformation candidate may be set to (motion information of the first candidate block, motion information of the scaled second candidate block) or (motion information of the scaled first candidate block, motion information of the second candidate block). Also, a specific picture (or a specific value) may be determined as a reference picture (or a reference picture index) in the L0 direction of the constructed affine transformation candidate.

For example, the specific picture (or specific value) may be one of a reference picture (or reference picture index) in the L0 direction of the first candidate block or a reference picture (or reference picture index) in the L0 direction of the second candidate block. Alternatively, for example, the specific picture (or specific value) may be the first reference picture (or index 0) of the L0 direction reference picture list.

The motion information of the scaled first/second candidate block may refer to motion information of the scaled first/second candidate block for the L0-direction reference picture of the constructed affine transformation candidate. The scaling value may be determined based on a temporal distance (POC difference) between the target picture and the L0 reference picture of the first candidate block and a temporal distance (POC difference) between the target picture and the L0 reference picture of the second candidate block.

(3) If the first condition is not satisfied, the constructed affine transformation candidate is considered to have no motion information in the L0 direction.

Embodiments relate to a method for deriving constructed affine transformation candidates with 4-parameter affine transformation models. However, affine transformation candidates constructed for the 6-parameter affine transformation model can also be derived by the same process. In other words, the constructed affine transformation candidates of the 6-parameter affine transformation model can be derived through the above processing using the first to third candidates.

In another embodiment, motion information of at least one CPMV may be derived based on motion information of at least two candidate blocks.

For example, when the motion information of the first candidate block is used as the motion information at the upper left corner of the target block and the motion information of the second candidate block is used as the CPMV at the lower left corner, the CPMV at the upper right corner of the target block may be derived from the motion information at the upper left corner and the lower left corner using a 4-parameter affine transformation model.

In another embodiment, template matching may be used to derive constructed affine transformation candidates.

The L0-direction reference picture (or reference picture index) and/or the L1-direction reference picture (or reference picture index) of the constructed affine transformation candidates derived based on the template matching in the target block may be determined according to a predefined rule.

For example, an L0 direction reference picture (or L1 direction reference picture) may always be the first reference picture of an L0 direction reference picture list (or L1 direction reference picture list). In other words, the L0 direction reference picture index (or L1 direction reference picture index) may always be 0.

In another example, the L0-direction reference picture (or reference picture index) and/or the L1-direction reference picture (or reference picture index) of the constructed affine transformation candidate may be the most frequently used reference picture (or reference picture index) in the neighboring block of the target block.

The constructed affine transformation candidates derived based on template matching may use two CPMV or three CPMV.

For example, three CPMV may be used at all times for constructed affine transformation candidates derived based on template matching.

In another example, affine transformation candidates constructed using at least one of the three CPMV may be derived based on template matching, and affine transformation candidates constructed using at least one of the two CPMV may be derived based on template matching.

In another example, affine transformation candidates constructed using three CPMV may be derived based on template matching and inserted into the motion information candidate list, and affine transformation candidates constructed using at least one of two CPMV may be derived based on target template matching only when the motion information candidate list of the target block is not full.

To generate the constructed affine transformation candidates based on the template, at least one temporary list may be constructed using at least one piece of motion information derived based on neighboring blocks, non-neighboring blocks, a motion information table, or a default motion information list of the target block.

Here, the motion information table may mean at least one of a general motion information table or a motion information table for affine transformation.

The general motion information table is a table managed similarly to the motion information table for affine transformation, but may mean a motion information table storing blocks (or motion information of corresponding blocks) encoded or decoded by general inter prediction instead of affine transformation.

The default motion information list may mean a motion information list already defined in the encoder/decoder. For example, the default motion information list may include a zero vector.

Up to three temporary lists may be configured. Alternatively, a temporary list (hereinafter, "integrated temporary list") may be configured.

Hereinafter, each temporary list configured for the target block is referred to as an nth temporary list (n=1, 2, 3).

When a plurality of temporary lists are configured in a target block, each temporary list in the target block may be associated with each CPMV of the target block. For example, the first temporary list, the second temporary list, and the third temporary list may be associated with an upper left CPMV, an upper right CPMV, and a lower left CPMV, respectively.

When a plurality of temporary lists are configured, the number of temporary lists may be determined based on the number of CPMV of affine transformation candidates to be derived based on template matching.

For example, when the number of CPMV of affine transformation candidates to be derived based on template matching is 2, the number of temporary lists may be 2, and when the number of CPMV of affine transformation candidates to be derived based on template matching is 3, the number of temporary lists may be 3.

The maximum size of the integrated temporary list or the first to third temporary lists may be max_num_cand_temp.

Max_num_cand_temp may be 1, 2, 3, 4, 5, 6, 8, 10, 12 or a positive integer.

In configuring at least one temporary list of the target block, if there is L0 direction motion information of at least one piece of motion information derived from a reference neighboring block, non-neighboring block, motion information table, or default motion information list, an L0 direction reference picture (or reference picture index) of the corresponding L0 direction motion information may be changed to a predefined picture (or predefined value) and then added to the corresponding temporary list. Similarly, if there is L1-direction motion information of at least one piece of motion information derived from a reference neighboring block, non-neighboring block, motion information table, or default motion information list, an L1-direction reference picture (or reference picture index) of the corresponding L1-direction motion information may be changed to a predefined picture (or predefined value) and then added to the corresponding temporary list.

Alternatively, after the temporary list is constructed, if L0 direction motion information of a corresponding component exists for each of all components (candidates) of the temporary list, an L0 direction reference picture (or reference picture index) of the corresponding component may be determined by a predefined picture (or predefined value). Similarly, if L1 direction motion information of a corresponding component exists for each of all components of the temporary list, an L1 direction reference picture (or reference picture index) of the corresponding component may be determined by a predefined picture (or predefined value).

For example, the L0 direction reference picture (or reference picture index) of all components of the temporary list may be determined by a predefined picture (or predefined value).

When an L0 direction (or L1 direction) reference picture (or reference picture index) of the referenced motion information changes, an L0 direction (or L1 direction) motion vector of the corresponding motion information may be scaled to match the changed reference picture. For example, scaling may be performed based on the temporal distance (or POC difference) between the target picture and the original LX direction reference picture and the temporal distance between the target picture and the changed LX direction reference picture.

For at least one CPMV of the target block, a block of size a×b (hereinafter, referred to as a "CPMV block") including a position (i.e., a control point) of the CPMV can be assumed.

A and B may be 2, 4, 8, 16 or a positive integer, respectively.

A and B may always be the same value. Alternatively, a and B may be different values, and the ratio of a and B may be determined based on the ratio of the width and the height of the target block.

If a plurality of temporary lists are configured in a target block, in a CPMV block for each CPMV of the target block, motion information improvement may be performed on at least one component of the temporary list for the corresponding CPMV.

The improved motion information may be added as a new component of the temporary list of CPMV, or alternatively a component on which motion information improvement is performed.

If a plurality of temporary lists are configured for a target block, a matching cost for each component in the temporary list of CPMV (i.e., temporary list composed of candidates of CPMV) may be calculated for each CPMV block corresponding to each CPMV of the target block. At least a portion of the components in the temporary list of each CPMV may be reordered in ascending order of matching cost.

In one example, for a CPMV block at the lower left corner of the target block, the matching cost for each element in the temporary list of CPMV candidates at the lower left corner may be calculated using the template matching method described above.

The matching cost calculation for each component in the temporary list may be performed by calculating a matching cost between a target template for the lower left corner CPMV block and a reference template obtained using each component in the temporary list.

As shown in fig. 27, a CPMV block of size a×b may be composed of samples including the location (i.e., control point) of the corresponding CPMV in the target block. In this case, the target template for each CPMV block may be composed as shown in fig. 10.

Alternatively, as shown in fig. 28, a CPMV block of size a×b may be configured to include the location and proximity of the corresponding CPMV (i.e., control point) to the reconstructed sample point.

In this case, as shown in fig. 28 (a), the target template may be composed of reconstructed samples in the CPMV block. Alternatively, the target template may be composed of reconstructed samples around the CPMV block, as shown in fig. 28 (b). Alternatively, the target template may consist of a combination of reconstructed samples within the CPMV block and reconstructed samples surrounding the CPMV block. When the integrated temporary list is configured for the target block, a matching cost of each component of the integrated temporary list may be calculated in the CPMV block for each CPMV of the target block. From the integrated temporary list, a new temporary list may be derived for each CPMV.

The new temporary list for each CPMV may be configured by including at least one component of the integrated temporary list (or motion information derived based thereon).

Since a new temporary list is also configured for each CPMV, the new temporary list may be referred to as a first temporary list to a third temporary list.

The NEW temporary list for each CPMV may consist of up to max_num_cand_temp_new components with the lowest matching cost for the CPMV block of the corresponding CPMV among the components in the integrated temporary list.

Max_num_cand_temp_new may be 1, 2, 3, 4, 5, 6, 8, 10, 12 or a positive integer.

A matching cost may be calculated for each piece of motion information in the temporary list (or a portion of the temporary list). Next, for each CPMV, motion information having the lowest matching cost calculated at the CPMV location may be determined as the CPMV. After each CPMV is determined, the determined CPMV may be used to derive constructed affine transformation candidates.

At least one constructed affine transformation candidate may be derived from the target block based on the temporary lists (first to third temporary lists) configured for each CPMV.

For example, the constructed affine transformation candidates of the target block may be derived by combining the components with the lowest matching cost in the temporary list for each CPMV.

In another example, when combining the components of the temporary list for each CPMV, up to N constructed affine transformation candidates may be derived for the target block based on up to N combinations with the lowest matching cost.

N may be 1, 2, 3, 4, 5, 6 or a positive integer.

The matching cost of the CPMV combination may be determined based on the matching cost of the CPMV constituting the corresponding CPMV combination.

For example, the matching cost of the CPMV combination may be a sum or average of the matching costs of the CPMV constituting the CPMV combination. Alternatively, the matching cost of the CPMV combination may be a minimum or maximum value of the matching cost of the CPMV constituting the CPMV combination.

■ Motion information candidate type 4 default affine transformation candidate

The default affine transformation candidates may mean motion information predefined in the encoder/decoder.

In some embodiments, default affine transformation candidates may be constructed from a predefined affine transformation model set (or CPMV set list) shared in advance by the encoder/decoder.

For example, the set of predefined affine transformation models may comprise an identity matrix.

In another example, the predefined CPMV set list may include CPMV sets in which all CPMV are zero vectors.

In another example, the predefined CPMV set list may include CPMV sets, where all CPMV are motion information offsets in the sub-block unit prediction. For example, the predefined CPMV set list may include { (off_x, off_y), (off_x, off_y) } and/or { (off_x, off_y), (off_x, off_y) }, where all CPMV are motion information offsets (off_x, off_y) in the sub-block unit prediction using a particular motion information offset.

In some other embodiments, the default affine transformation candidate may be derived by adding or subtracting a predefined offset to or from the motion information candidate pre-inserted into the motion information candidate list.

Here, the previously inserted motion information candidate may be a motion information candidate having the smallest index or a motion information candidate having the largest index among the motion information candidates inserted into the motion information candidate list.

The predefined offset may be a motion information offset for prediction in sub-block units.

When the number of motion information candidates included in the motion information candidate list is smaller than the threshold value, a default affine transformation candidate may be inserted. The threshold may be a maximum value (num_max_cand) of motion information candidates that may be included in the motion information candidate list or a value less than the maximum value by M. M may be an integer such as 1 or 2.

For example, when the number of motion information candidates is smaller than the threshold even if the motion information candidates are inserted into the motion information candidate list according to a predefined order, a default affine transformation candidate may be inserted into the motion information candidate list.

For example, the predefined order may place spatially neighboring blocks first, followed by spatially non-neighboring blocks, and then the motion information stored in the motion information table described below. Alternatively, the predefined order may place the spatially neighboring blocks first, followed by the spatially non-neighboring blocks, the temporally neighboring blocks described below, and then the motion information stored in the motion information table. Alternatively, the predefined order may place the spatially neighboring blocks first, followed by the spatially non-neighboring blocks, the motion information stored in the motion information table, and then the temporally neighboring blocks.

A motion information candidate list composed of the above-described various types of motion information candidates may be configured. A separate motion information candidate list may be generated for each of the affine AMVP mode and the affine merge mode. Alternatively, one motion information candidate list commonly used in the affine AMVP mode and the affine merge mode may be generated.

In order to configure motion information candidates to be included in the motion information candidate list of the target block, at least one block (candidate block) encoded or decoded before the target block may be used.

The candidate block may include at least one of a neighboring block immediately adjacent to the target block, a non-neighboring block not neighboring the target block, and a temporal neighboring block.

Motion information candidates to be included in the motion information candidate list may be derived based on the at least one candidate block.

For example, at least one candidate block may be included in the motion information candidate list as a motion information candidate. In this case, if a specific motion information candidate (i.e., candidate block) is selected from the motion information candidate list, motion information of the specific candidate block or motion information derived from the motion information may be used.

In another example, motion information of at least one candidate block or motion information derived from motion information may be included as motion information candidates in the motion information candidate list.

In the case of the affine transformation mode, a combination of CPMV derived based on at least one candidate block may be included in the motion information candidate list as a motion information candidate.

As shown in fig. 29 (a), the adjacent blocks may include at least one of blocks adjacent to upper, upper left, upper right, left side, and lower left portions of the target block.

As shown in fig. 30, the non-neighboring blocks may include blocks having a distance N from at least one of an upper boundary and a left boundary of the target block. N may be an integer such as 4, 8 or 16.

Here, the non-adjacent block may refer to a block in which a horizontal distance or a vertical distance is spaced apart from a specific position of the target block or a specific position around the target block by at least one of a unit distance or N times the unit distance.

For example, as shown in fig. 30, if the upper left position of the target block is (x, y), the motion information candidate may be derived from a spatially non-adjacent block whose upper left position is (x- (n×vertical), y), (x, y- (n×vertical)) or (x- (n×vertical), y- (n×vertical)).

Here, "horizontal" represents a cell distance in the horizontal direction, and "vertical" represents a cell distance in the vertical direction.

The cell distance may be determined according to the size of the target block. For example, the horizontal cell distance and the vertical cell distance may be set to be equal to the width and the height of the target block, respectively.

Alternatively, the horizontal cell distance and the vertical cell distance may be always set to predefined values.

For example, the horizontal cell distance and the vertical cell distance may be 1/16 pixel, 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 8 pixel, 16 pixel, or a positive integer.

N may be a natural number greater than or equal to 1. If the non-neighboring block of space is not available when the first value is applied to N, N may be increased by 1 to search for the non-neighboring block of space.

Alternatively, N may be 1/4, 1/2, or a positive integer.

After deriving the motion information candidates from neighboring blocks, non-neighboring blocks may be used to derive the motion information candidates.

In an additional or alternative embodiment, the non-adjacent block may be a block indicated by at least one of a motion vector of the target block, a motion vector of a luminance component block corresponding to the target block when the target block is a chrominance component block, and a motion vector of at least one block located around the target block.

Here, the neighboring block may be, for example, the aforementioned neighboring block. The motion vector of the target block may be an initial motion vector of the target block. For example, the motion vector may be an initial motion vector determined or signaled by inter-template matching for the target block. Alternatively, the motion shift may be set to the initial motion vector in the above-described sub-block unit prediction.

Alternatively, in the foregoing embodiment, the matching cost may be calculated for at least two blocks among the plurality of blocks indicated by the motion vector of the target block, and the motion information candidate may be derived from at least one block having the lowest template matching cost.

Motion information candidates to be included in the motion information candidate list may be derived from the temporal neighboring blocks.

As shown in fig. 29 (b), the temporal neighboring block may be a block (col block) corresponding to the position of the target block within a reference picture (col picture) other than the target picture, or may be a neighboring block of the col block.

For example, the adjacent block of the col block may be a block adjacent to the lower right portion of the col block. However, the present disclosure is not limited to a specific example, and the neighboring blocks of the col block may be blocks neighboring to the right side, lower right, upper left, or lower left portion of the col block.

The maximum number of temporal neighboring blocks searched for deriving motion information candidates of the target block may be N. N may be 0,1, 2, 3, 6, 12 or a positive integer. Further, up to M motion information candidates derived from the temporal neighboring blocks of the target block may be included in the motion information candidate list. M may be 0,1, 2, 3, 6, 12 or a positive integer.

The number of temporally adjacent blocks may be adaptively determined according to the number of available spatially adjacent/non-adjacent blocks. For example, if the number of motion information candidates based on spatially neighboring/non-neighboring blocks is smaller than the maximum size of the motion information candidate list, the motion information candidates based on temporally neighboring blocks may be added to the motion information candidate list until the motion information candidate list is filled.

Further, not only the neighboring blocks of the col block, but also non-neighboring blocks may be used as the temporal neighboring blocks, and motion information candidates may be derived from the non-neighboring blocks of the col block.

In some embodiments, the motion information candidates may be derived based on a temporal block, wherein the temporal block is separated from at least one of a horizontal distance or a vertical distance of an upper left portion of the col block by a unit distance or N times the unit distance. When the upper left position of the col block is (x, y), the motion information candidate may be derived from a time block whose position is (x+ (n×vertical), y), (x, y+ (n×vertical), or (x+ (n×vertical), y+ (n×vertical)).

Here, "horizontal" represents a cell distance in the horizontal direction, and "vertical" represents a cell distance in the vertical direction. The cell distance may be determined according to the size of the target block. For example, the horizontal cell distance and the vertical cell distance may be set to be equal to the width and the height of the target block, respectively.

N may be a natural number greater than or equal to 1. When the time block derived when the first value was applied to N is not available, N may be increased by 1 to search for the time block.

As shown in fig. 31, adjacent blocks or non-adjacent blocks at positions shifted by N times the horizontal unit distance and/or the vertical unit distance from col blocks at the same positions as the target blocks within the col picture are scanned. The numerals shown in fig. 31 indicate the scanning order.

In an additional or alternative embodiment, the motion information candidates may be derived by a block (hereinafter, this block is referred to as a shift time block) located at a position shifted from a time neighboring block (e.g., col block or neighboring blocks of col block). Here, the shift time block may be located at a position spaced apart from the position of the time block by a motion vector. Here, the motion vector may be a motion vector of a spatially neighboring block of the target block.

For example, the shift time block may be determined based on a motion vector of a left neighboring block or an upper neighboring block of the target block. Alternatively, when spatially neighboring blocks of the target block are searched in a predefined order, a shift time block may be determined based on the motion vector of the first available block found. If no available block is found in the spatially neighboring blocks, the temporal block at its original position may be used without performing a shift.

Whether a temporal neighboring block is available may be determined based on at least one of a picture/slice type (e.g., whether the block is an I-picture/I-slice), a prediction mode of the target block, or an encoding parameter of the target block.

The prediction mode of the target block may include at least one of inter prediction, intra prediction, or IBC mode.

In the affine transformation mode, at least one inherited affine transformation candidate for the target block may be determined from at least one time neighboring block.

In other words, CPMV of the target block and/or motion information candidates of the target block may be derived based on affine transformation models of the temporally neighboring blocks. For example, the CPMV at each position (control point) of the target block may be derived from the CPMV of the corresponding position of the time-adjacent block to which the affine transformation mode is applied.

The motion information candidates to be included in the motion information candidate list may be derived based on a motion information table storing motion information of blocks that have been encoded or decoded.

Motion information (e.g., information about CPMV or information about affine transformation model) of a block predicted in a sub-block unit and/or a block predicted based on affine transformation may be stored in a motion information table for affine transformation.

For example, for a block predicted in a sub-block unit or a block predicted based on affine transformation, motion information of a specific block may be used to update a motion information table.

The number of candidates included in the motion information table for affine transformation may be predefined in the encoder and decoder. The motion information table for affine transformation may be initialized in accordance with a predefined unit. Here, the predefined units may be CTUs, CTU rows, parallel blocks, stripes, CTU rows or sprites.

Alternatively, the information stored in the motion information table for affine transformation may not be motion information but a block having motion information. In this case, when a specific block is selected from among blocks stored in the motion information table, motion information corresponding to the specific block or motion information derived from the motion information may be used.

The motion information stored in the motion information table may be inserted into the motion information candidate list. For example, if the number of motion information candidates added to the motion information candidate list is smaller than the threshold value, the motion information stored in the motion information table may be inserted into the motion information candidate list as a new motion information candidate.

The threshold may be 1, 2, max_num_cand or a positive integer.

At least one piece of motion information stored in the motion information table for affine transformation may be added to the motion information candidate list, but the motion information table for affine transformation itself may also be used as the motion information candidate list. For example, information indicating whether affine transformation is performed using the motion information table itself as a motion information candidate list may be signaled/encoded/decoded.

The motion information candidates may include at least one of default affine transformation candidates. For example, when the number of motion information candidates added according to the above description is smaller than a threshold (the maximum number of components that may be included in the motion information candidate list), a default affine transformation candidate may be added to the motion information candidate list.

Further, in some embodiments, information regarding whether the motion information candidate list is used for the target block may be signaled. The information may be a 1-bit flag.

When the motion information candidate list is used, motion information of the target block may be determined from motion information candidates included in the motion information candidate list. For example, information specifying one of a plurality of motion information candidates may be encoded and signaled. The image decoding apparatus selects a motion information candidate from the motion information candidate list based on the received information, and determines motion information of the target block based on the selected motion information candidate.

When the motion information candidate list is not used, the motion information of the target block may be derived from the motion information of the neighboring blocks. For example, the neighboring block may be a neighboring block at a predetermined position within a target picture or col picture (reference picture).

In some other embodiments, information indicating whether a motion information table for affine transformation is available may be signaled/encoded/decoded.

If a motion information table for affine transformation is available, motion information of the target block may be determined based on motion information candidates included (stored) in the motion information table.

Alternatively, the motion information table may be used to configure a motion information candidate list with at least one of neighboring block-based motion information candidates, non-neighboring block-based motion information candidates, temporal neighboring block-based motion information candidates, and default motion information candidates. The motion information of the target block may be determined based on the motion information candidates included (stored) in the motion information table.

If the motion information table for affine transformation is not available, the motion information table is not used to configure the motion information candidate list. In other words, the motion information candidate list may be configured using at least one of a motion information candidate based on a neighboring block, a motion information candidate based on a non-neighboring block, a motion information candidate based on a temporal neighboring block, and a default motion information candidate. Motion information for the target block may be derived based on the motion information candidate list.

(2) Target block prediction using affine transformation model

When a plurality of CPMV for a target block is determined, the image encoding apparatus or the image decoding apparatus may derive motion information in sub-block units or sample units using an affine transformation model based on the CPMV. The corresponding motion information in sub-block units or sample units may then be used to generate predicted samples.

As shown in fig. 26 (a), when two CPMV are used, a 4-parameter affine transformation model may be used. In the 4-parameter affine transformation model, the motion vector of a sample point at the (x, y) position of the target block or the motion vector of a sub-block including the (x, y) position can be obtained by the following equation.

[ Equation 5]

mv_x = (mv_1x - mv_0x) * x / W + (mv_1y - mv_0y) * y / W + mv_0x

mv_y = (mv_1y - mv_0y) * x / W+ (mv_1x - mv_0x) * y / W + mv_0y

As shown in fig. 26 (b), when three CPMV are used, a 6-parameter affine transformation model may be used. In the 6-parameter affine transformation model, the motion vector of a sample point at the (x, y) position of the target block or the motion vector of a sub-block including the (x, y) position can be obtained by the following equation.

[ Equation 6]

mv_x = (mv_1x - mv_0x) * x / W + (mv_2x - mv_0x) * y / H + mv_0x

mv_y = (mv_1y - mv_0y) * x / W + (mv_2y - mv_0y) * y / H + mv_0y

In equations 5 and 6, the final terms mv_0x and mv_0y may represent the translation coefficients of the affine transformation model, and the value multiplied by x and/or y may represent the non-translation coefficients of the affine transformation model.

W and H may represent the width and height of the target block.

X and y may represent pixel positions. At this time, the upper left corner of the block may be denoted as (0, 0), the upper right corner as (W-1, 0), the lower left corner as (0, H-1), and the lower right corner as (X-1, H-1).

Referring to FIG. 26, (mv_0x, mv_0y) represents,CPMV, which is the upper left corner, (mv_1x, mv_1y) represents,CPMV, the upper right corner, and (mv_2x, mv_2y) represents,Is the CPMV in the lower left corner.

5.3 Prediction in sub-block units Using motion information offset

The motion information offset may be applied to the sub-block unit prediction mode. In this case, the motion information offset is added to the basic motion information.

When the motion information offset is applied to the sub-block unit prediction mode using the motion shift, the base motion vector may correspond to the motion shift.

When the motion information offset is applied to the affine transformation mode, the basic motion information may be at least one CPMV.

(1) Prediction in sub-block units using motion information offset applied motion shifting

In this mode (sub-block unit prediction mode using motion shift to which motion information offset is applied), the motion information offset may be added to the motion shift (base motion vector) of the target block.

Information indicating whether the current mode is performed may be signaled/encoded/decoded.

Alternatively, whether to signal/encode/decode information indicating the performance of the current mode may be predefined.

For example, when the prediction mode of the target block is a sub-block unit prediction mode or a sub-block unit mode using motion shifting, signaling/encoding/decoding of information indicating whether to perform the current mode may be performed.

In another example, the sub-block unit mode or the current mode using motion shifting may be a mode performed according to whether an affine transformation combining mode is performed. In this case, signaling/encoding/decoding of information indicating whether to perform the current mode may be performed only when the target block is predicted in the affine merge mode or only when an indicator indicating whether to perform the affine merge mode is true.

The number of motion information offsets may be a predefined fixed value.

Alternatively, the number of motion information offsets may be adaptively determined based on a ratio of areas occupied by blocks to which the current mode is applied within a reference picture or col picture that is encoded or decoded before the target picture.

Alternatively, the number of motion information offsets may be adaptively determined based on POC differences between a reference picture or col picture, which is encoded or decoded before the target picture, and the target picture.

Specific details regarding the motion information offset will be described later.

(2) Affine transformation mode using motion information offset

In this mode, at least one of the CPMV may be changed based on the motion information offset.

For example, in affine transformation mode, the basic motion information may be a set of CPMV determined for the target block.

In another example, the basic motion information may be a set of CPMV that constitute affine transformation candidates included in the candidate list.

The motion information offset may be added to all or part of the CPMV constituting the basic motion information.

If the target block is in an affine mode using motion information offset, the affine transformation model of the target block may be changed based on the motion information offset. For example, the motion information offset may be added to at least one of the coefficients of the affine transformation model of the target block.

Information indicating whether to perform affine transformation mode to which motion information offset is applied to the target block may be signaled/encoded/decoded or predefined. Based on this information, affine patterns using motion information offset can be performed on the target block.

At this time, information indicating whether affine mode to which motion information offset is applied is performed on the target block may be predefined or not signaled/encoded/decoded.

For example, only when the target block is in the affine merge mode, signaling/encoding/decoding of information indicating whether affine mode using motion information offset is performed in the target block may be performed.

(3) Motion information offset

The motion information offset may be a motion vector.

For example, the motion information offset may be determined by a combination of angles in a predefined list of angles and distance offsets in a predefined list of distance offsets.

The angle may represent an angle between the motion information offset and the axial direction. Here, the axis may be an X-axis (i.e., an abscissa axis or a horizontal axis) or a Y-axis (i.e., an ordinate axis or a vertical axis).

The list of angles may be configured to include angles having a value of i _ANGLE×π/NUM_ANGLE.

I _ANGLE may be an integer value between 0 and (2×NUM _Angle -1).

NUM _ANGLE may be a predefined value, which is 4, 8, 16, or a positive integer.

The angles constituting the NUM _ANGLE and/or the predefined angle list may be determined based on at least one of motion information and encoding parameters of the target block.

The distance offset represents the magnitude of the motion information offset. In other words, the distance offset may represent a separation distance from a position indicated by the motion vector of the target block.

For example, the distance offset list may be configured to include at least one of 0,1, 2, 4, and 8.

The distance offset list may be configured to include values corresponding to multioffset pixels. multioffset may be 1/8, 1/4, 1/2, 1, 2, 4, 8, 16, 32 or an integer.

The distance offsets and/or the size of the distance offset list constituting the distance offset list may be determined based on at least one of 1) motion information of the target block and 2) encoding parameters.

Combining a specific ANGLE and a specific distance OFFSET may mean that the motion vector is determined using the ANGLE and OFFSET. Here, the ANGLE between the determined motion vector and the axis may be ANGLE. The determined magnitude of the motion vector may be OFFSET. The axis may be an X-axis or a Y-axis.

The magnitude of the motion vector may be determined based on the absolute value of the x-direction component and the absolute value of the y-direction component of the motion vector.

Alternatively, combining a specific ANGLE and a specific distance OFFSET may mean determining the position pointed to by the above determined motion vector.

Information for determining at least one of an angle list and a distance offset list in the target block may be signaled/encoded/decoded or predefined. Based on this information, at least one of an angle list and a distance offset list in the target block may be determined.

In some embodiments, multiple angle lists and/or distance offset lists may be provided.

In this case, which of the plurality of angle lists and/or the plurality of distance offset lists is selected may be determined by the list index.

Alternatively, one of the plurality of lists may be specified based on a quotient of the motion information index divided by a predetermined value (e.g., the number of candidates in each list).

For example, when the number of candidates in all the lists is the same, one of the lists may be specified by a quotient of the motion information index divided by the number of candidates. In this case, the remainder of the division may be used to specify candidates in the respective list.

In another example, a quotient of dividing the index by a predetermined value based on motion information for the target block may be used, if the quotient is odd, and a second distance offset list may be used if the quotient is even.

The predetermined value may be 1, 2, 4, 8, 16, 32, 48, 96 or a positive integer.

The component constituting the first distance offset list and the component constituting the second distance offset list may not overlap.

For example, the first distance offset list may be composed of 1/16 pixels and 1/8 pixels, and the second distance offset list may be composed of 1/4 pixels and 1/2 pixels.

(4) Application of motion information offset

In the following disclosure, MV0 and MV1 may be basic motion information of a target block in the L0 direction and the L1 direction, respectively. MV0' may be refinement motion information in the L0 direction. MV1' may be refinement motion information in the L1 direction. MV _off may be a motion information offset candidate specified from a motion information offset candidate list or a motion information offset specified for a target block.

For the target block, a plurality of basic motion information indexes and a plurality of basic motion information may be used.

The information specifying the at least one motion information offset may be signaled/encoded/decoded or predefined. Based on this information, at least one motion information offset for the target block may be specified from the motion information offset candidate list.

In some embodiments, one motion information offset list may be constructed by a combination of angles included in the angle list and distance offsets included in the distance offset list. In this case, one index (motion information offset index) for specifying the motion information offset of the target block in the motion information offset list may be signaled/encoded/decoded. The motion information offset may be determined by a combination of the angle and distance offset indicated by the index.

In another embodiment, a first index (angle index) for specifying an angle of the motion information offset of the target block from the angle list may be signaled/encoded/decoded. Further, a second index (distance offset index) for specifying the size of the motion information offset of the target block from the distance offset list may be signaled/encoded/decoded.

Alternatively, the at least one motion information offset may be determined based on the motion information of the target block, the encoding parameters of the target block, the motion information of the neighboring block, the encoding parameters of the neighboring block, the index of the motion information offset candidate, and the matching cost of the motion information offset candidate.

The number or maximum number of motion information offset candidates may be determined based on at least one of whether affine transformation is performed on the neighboring blocks of the target block and information indicating whether affine transformation is performed on the neighboring blocks of the target block.

The motion information offset index may be binarized by rice code having a specific parameter such as 1.

When the bilateral inter prediction is used for the target block, the motion information offset MV _off may be added to at least one of the base motion vector in the L0 direction and the base motion vector in the L1 direction corresponding to the base motion information of the target block.

Method 1 for applying motion information offset

MV _off can be added identically to each of the base motion vector in the L0 direction and the base motion vector in the L1 direction. In other words, the process according to the following equation 7 may be performed.

[ Equation 7]

MV0'=MV0+MV_off

MV1'=MV1+MV_off

Method 2 for applying motion information offset

MV _off may be added to the base motion vector in the L0 direction, and-MV _off may be added to the base motion vector in the L1 direction. In other words, the process according to the following equation 8 may be performed.

[ Equation 8]

MV0'=MV0+MV_off

MV1'=MV1-MV_off

Method 3 for applying motion information offset

MV _off may be added to the base motion vector in the L0 direction, and scaned_mv _off may be added to the base motion vector in the L1 direction. In other words, the process according to the following equation 9 may be performed.

[ Equation 9]

MV0'=MV0+MV_off

MV1'=MV1+SCALED_MV_off

Here, the scaled_mv _off may be a motion vector derived by applying scaling in the L1 direction to the MV _off in the L0 direction. MV _off and scaled_mv _off may have opposite directions to each other.

Method 4 for applying motion information offset

MV _off may be added to the base motion vector in the L1 direction, and scaned_mv _off may be added to the base motion vector in the L0 direction. In other words, the process according to the following equation 10 may be performed.

[ Equation 10]

MV1'=MV1+MV_off

MV0'=MV0+SCALED_MV_off

Method 5 for applying motion information offset

MV _off may be added only to the base motion vector in the LX direction. At this time, it can be considered that the zero motion vector has been added in the L (1-X) direction. For example, when X is 0, the process according to equation 11 may be performed.

[ Equation 11]

MVX'=MVX+MV_off

MVY'=MVY

Here, X may be 0 or 1, and y=1 to X.

The information representing X may be determined by signaling/encoding/decoding or may be predefined.

In some embodiments, X may be determined based on at least one of a matching cost of motion information in the L0 direction and a matching cost of motion information in the L1 direction. For example, LX may represent a direction having a lower matching cost among the L0 direction and the L1 direction. Alternatively, LX may also represent a direction having a higher matching cost among the L0 direction and the L1 direction.

In some other embodiments, X may be determined based on a base motion information index and a motion information offset index.

For example, X may be determined based on a remainder obtained by dividing the motion information offset index by a predetermined integer value (e.g., 2 or 3).

In another example, X may be determined based on a quotient generated by dividing the motion information offset index by a predetermined integer value (e.g., 2 or 3).

The method for applying the motion information offset MV _off to the basic motion information of the target block may be at least one of the methods described above.

The selection information for selecting which method is used among the plurality of methods for applying the motion information offset may be signaled/encoded/decoded or may be predefined. The motion information offset may be added to the base motion information using at least one method determined based on the selection information.

In some embodiments, the selection information may be determined based on at least one of the POC of the target picture, the POC of the L0 direction reference picture of the target block, and the POC of the L1 direction reference picture of the target block.

For example, the motion information offset added in the L0 direction and the L1 direction may be determined based on the first direction and the second direction. Here, the first direction is a direction from the target picture to the L0 direction reference picture and is determined by a sign of a POC difference (hereinafter referred to as a first POC difference) between the target picture and the L0 direction reference picture, and the second direction is a direction from the target picture to the L1 direction reference picture and is determined by a sign of a POC difference (hereinafter referred to as a second POC difference) between the target picture and the L1 direction reference picture.

For example, when the first direction and the second direction are the same, the signs of the motion information offsets added to the L0 direction and the L1 direction are the same, and when the first direction and the second direction are opposite to each other, the signs of the motion information offsets added to the L0 direction and the L1 direction may be opposite to each other.

Further, the size of the motion information offset added to the L0 direction and the L1 direction may be determined based on the first POC difference and the second POC difference.

For example, MV _off may be added to the LX direction, and the result of MV _off multiplied by (2-X) th POC interval/(1+X) th POC interval) or a value based thereon may be added to the L (1-X) direction.

For example, if the second POC interval is greater than the first POC interval, X may be 1, otherwise, X may be 0.

If single-sided inter prediction is used for the target block, the motion information offset MV _off may be added to the base motion vector in the corresponding direction of the target block.

For example, if the target block is L0 single-sided inter prediction, the first equation of equation 7 may be applied, and if the target block is L1 single-sided inter prediction, the second equation of equation 7 may be applied.

When performing a sub-block unit prediction mode using motion shifting with a motion information offset applied to a target block, the motion information offset MV _off may be added to at least one of the basic motion information of the target block.

For example, the first equation of equation 7 may be applied when the motion shift of the target block is motion information in the L0 direction, and the second equation of equation 7 may be applied when the motion shift of the target block is motion information in the L1 direction.

(5) Affine pattern using one-sided or two-sided motion information offset

The affine pattern using the motion information offset may include at least one of an affine pattern using the one-sided motion information offset and an affine pattern using the two-sided motion information offset.

When performing affine mode using one-sided motion information offset, the motion information offset may be added to the basic motion information of the target block using the above-described method 5 for applying motion information offset.

When the affine mode using the both-side motion information offset is performed, the motion information offset may be added to the basic motion information of the target block using at least one of the above-described methods 1 to 4 for applying the motion information offset.

Information indicating whether an affine pattern offset using one-sided motion information or an affine pattern offset using both-sided motion information is to be performed in the target block may be signaled/encoded/decoded or predefined. Based on the above information, an affine pattern using one-sided motion information offset or an affine pattern using two-sided motion information offset may be performed in the target block.

In some embodiments, information specifying the direction in which the motion information offset is to be added may be signaled/encoded/decoded.

In some embodiments, based on the motion information offset index for the target block, an affine pattern using one-sided motion information offset or an affine pattern using two-sided motion information offset may be performed.

For example, based on a quotient obtained by dividing the motion information offset index by a predetermined value (e.g., 8), if the quotient is a first value (e.g., 0), an affine pattern using one-sided motion information offset in the L0 direction may be performed, if the quotient is a second value (e.g., 1), an affine pattern using one-sided motion information offset in the L1 direction may be performed, and if the quotient is a third value (e.g., 2), an affine pattern using two-sided motion information offset may be performed.

(6) Motion information offset for control point motion vectors

When an affine pattern is used for the target block, adding the motion information offset MV _off in the LX direction may represent adding MV _off to at least one CPMV of the CPMV in the LX direction of the affine pattern, or adding a motion vector determined based on MV _off to the at least one CPMV.

Here, X may be 0, 1 or a positive integer.

In some embodiments, MV _off or a target CPMV to which a motion information offset determined based on MV _off is added may be selected among CPMV of basic motion information.

In this case, information indicating a target CPMV to which a motion information offset determined based on MV _off or MV _off in the CPMV of the basic motion information is added may be signaled/encoded/decoded.

Alternatively, information indicating the target CPMV may be predefined.

For example, a CPMV to which MV _off is added or a motion information offset determined based on MV _off in the CPMVs of basic motion information may be determined based on at least one of basic motion information of a target block, coding parameters of the target block, motion information of a neighboring block, coding parameters of the neighboring block, a target template of the target block, a reference template of a reference block for the target block, a template matching cost for each CPMVs of basic motion information, a template matching cost for a motion vector generated by adding MV _off to the CPMVs of basic motion information, an index for specifying basic motion information from a motion information candidate list, a motion information offset index, and a motion information offset.

In some other embodiments, the motion information offset may be added to all CPMV of the target block or all CPMV of at least one basic motion information of the target block.

(7) Configuration and ordering of motion information offset candidate list

In the following disclosure, the matching cost for a specific motion information offset may correspond to the matching cost for motion information that is the sum of basic motion information and the specific motion information offset.

In affine mode using motion information offsets, the motion information offset candidate list may be configured using up to N motion information offsets, where each of the up to N motion information offsets indicates up to N locations with the lowest matching cost relative to all possible locations of the base motion information.

The order of the motion information offset candidates in the motion information offset candidate list may be sorted in ascending order of the matching cost.

The possible positions relative to the basic motion information may be defined by two horizontal directions opposite to each other, two vertical directions opposite to each other, and a distance offset within a specific distance offset list.

For example, the particular distance offset list may be {1 pixel, 2 pixels, 4 pixels, 8 pixels, 12 pixels, 16 pixels, 24 pixels, 32 pixels, 40 pixels, 48 pixels, 56 pixels, 64 pixels, 72 pixels, 80 pixels, 88 pixels, 96 pixels, 104 pixels, 112 pixels, 120 pixels, 128 pixels }.

The possible locations may be a subset of the motion vector search area.

The motion information offset candidate list having the largest size N may be configured and/or ordered according to the following steps.

■ Step 1:

The maximum distance offset may be expressed as L pixels. L may be one of the values described in the embodiments, for example 128.

The number of directions may be D. For example, D may be 4 and the directions may be left, right, up and down directions.

The initial search interval may be M pixels. For example, M may be 8.

The maximum number of candidates in the motion information offset candidate list may be N. For example, N may be 8.

L, D, M and N are not limited to the above values, and may each be an integer greater than or equal to 1.

At least one of L, M and N may be determined differently based on at least one of D and search direction.

■ Step 2:

For each direction, a matching cost may be calculated for motion information offsets determined from distance offsets of all mth positions (i.e., M, 2×m, 3×m, etc.) that do not exceed L. At this time, the motion information offset may be added to the motion information candidate list based on the matching cost calculated for the determined motion information offset. Further, when a specific motion information offset is added to the motion information candidate list, each motion information offset candidate within the motion information candidate list may be added to be arranged in ascending order of the matching cost. The N candidates with the lowest matching costs may be maintained in the list.

■ Step 3:

For each motion information offset candidate in the list, a matching cost for two motion information offsets separated from the respective motion information offset by +M/2 and-M/2 may be calculated along the direction of the motion information offset.

Based on the calculated matching costs, the corresponding motion information offset may be added to the motion information candidate list. The motion information offset candidates may be added to the motion information candidate list in ascending order of the matching cost. The N candidates with the lowest matching costs may be maintained in the list.

■ Step 4:

step 3 may be repeated while decreasing the interval M by half until the interval M reaches a first value (e.g., 1).

The first value may be determined based on at least one of the direction D and the motion information offset candidates in the motion information offset candidate list to which step 3 is performed. For example, the first value may be determined based on a quotient of the size of the motion information offset candidate in the motion information offset candidate list performing step 3 divided by the second value (e.g., 8).

The first value may be a positive real number. The second value may be a positive integer.

As the steps proceed, the interval M may decrease. In other words, the resolution (or accuracy) of the search may gradually increase as the steps proceed.

The N candidates with the lowest matching costs in the searched motion information offset may be stored in the list. The final motion information of the target block may be determined by signaling/encoding/decoding indexes of candidates in the list.

The initial search interval may be determined based on the size of the base motion information.

In the process of configuring the motion information offset candidate list, if the motion information offset candidates are not arranged by the matching cost, the motion information offset candidates may be reordered or reconfigured in ascending order of the matching cost.

For example, the motion information offset candidate list may be reconfigured to include only N pieces of motion information having the lowest index.

If the target block is an affine pattern using two-sided motion information offset, a matching cost per motion information offset candidate may be calculated to rank or rearrange the order of the motion information offset candidates in the motion information offset candidate list of the target block. The method for applying motion information offset described above may be used to calculate a matching cost for each motion information offset candidate.

For example, if the symbols of (POC of target picture-POC of L0 direction reference picture of target block) and (POC of L1 direction reference picture of target picture) are the same, method 1 for applying motion information offset may be used, and if the symbols are different, method 2 for applying information offset may be used.

Further, in order to determine final motion information of the target block (e.g., final CPMV of the target block), a motion information offset may be added to basic motion information of the target block. In this case, if the symbols of (POC of target image-POC of L0 direction reference image of target block) and (POC of target image-POC of L1 direction reference image of target block) are the same, method 1 for applying motion information offset may be used, and if the symbols are different, method 2 for applying motion information offset may be used.

5.4 Determination of subblock size for prediction in subblock units

When prediction in sub-block units is performed on a target block, the target block is partitioned into sub-blocks of an n×m size, and motion information is provided in sub-block units.

Each of N and M may be a positive integer. Each of N and M may be 1,2, 4, 8, 16, 32 or a positive integer.

N and M may have the same value. Alternatively, N and M may have different values based on the size of the target block. For example, the ratio of N and M may be determined based on the ratio of the width and the height of the target block.

Each of N and M may always be set to the same value for all blocks.

Alternatively, each of N and M may have an adaptively determined value.

In some embodiments, N and/or M may be determined according to a predetermined rule shared by the image encoding device and the image decoding device.

For example, N and/or M may be determined based on at least one of a size of the target block, motion information of the target block, encoding parameters of the target block, motion information of neighboring blocks of the target block, or encoding parameters of neighboring blocks of the target block.

For example, a plurality of interval boundary values of length may be defined. The N or M of the target block may be determined according to which of a plurality of sections distinguished by a plurality of section boundary values the height or width of the target block belongs to.

For example, N (or M) may be determined to be a first value if the height (or width) of the block is less than a first interval boundary value, a second value if the height (or width) of the block is not less than the first interval boundary value and less than a second interval boundary value, and a third value otherwise.

In another embodiment, information about N and/or M may be signaled/encoded/decoded.

For example, the information for determining N and/or M may be signaled/encoded/decoded in at least one of a video parameter set, a decoding parameter set, a sequence parameter set, an adaptive parameter set, a picture header, a sprite header, a slice header, a parallel block group header, a parallel block header, a block, a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), a Transform Unit (TU), a Coding Block (CB), a Prediction Block (PB), or a Transform Block (TB).

In some embodiments, information for determining N and/or M of the target block in units of CUs may be signaled/encoded/decoded.

For example, information for specifying the value of N and/or M of the target block from among a plurality of candidates of the value of N and/or M may be signaled/encoded/decoded. The information may be an indicator or index for selecting one of a plurality of candidates.

The number of the plurality of candidates of the value that N and/or M may have may be determined based on at least one of a size of the target block, motion information of the target block, encoding parameters of the target block, motion information of neighboring blocks of the target block, or encoding parameters of neighboring blocks of the target block.

Further, at least one of the plurality of candidates of the value allowed by N and/or M may be determined based on at least one of a size of the target block, motion information of the target block, encoding parameters of the target block, motion information of neighboring blocks of the target block, or encoding parameters of neighboring blocks of the target block.

Furthermore, in some embodiments, an indicator may be signaled/encoded/decoded indicating whether the size of the sub-block is adaptively determined in sub-block unit prediction or whether information about the sub-block size is signaled.

For example, if a target block is predicted in a sub-block unit prediction mode (e.g., prediction in sub-block units using motion shifting or affine transformation modes), an indicator may be signaled/encoded/decoded that indicates whether the sub-block size is adaptively determined for the target block.

When prediction is performed in a sub-block unit in a target block, information for determining a sub-block size may be signaled/encoded/decoded.

When the indicator indicates that the sub-block size is not adaptively determined, the target block may always be partitioned into sub-blocks of the same size (e.g., 4×4, 8×8, or 16×16). Alternatively, the partition size of the target block may be determined based on predefined rules as described above.

5.5 Modification of prediction modes in sub-block units

The above-described sub-block unit prediction modes may be used in various modifications. Hereinafter, each of the various modifications will be described as an nth prediction mode (where N is a natural number greater than or equal to 1). Information indicating a prediction mode of the plurality of prediction modes that is applied to the target block may be signaled.

(1) First prediction mode

When bi-prediction is performed on a target block, bi-prediction may be performed in which sub-block unit prediction is performed in the LX direction and whole block unit prediction is performed in the L (1-X) direction.

The overall block unit prediction may correspond to predictions other than the sub-block unit prediction. For example, the monolithic unit prediction may correspond to a prediction other than a sub-block unit prediction using affine transformation and/or motion information offset.

For example, the monolithic unit prediction may correspond to a general inter prediction, i.e., a prediction using a translational motion vector.

Alternatively, L monolithic unit prediction may mean performing parallel translation affine transformation, and the parallel translation affine transformation may correspond to affine transformation having the same vector for all CPMV.

For example, performing the whole-block cell prediction in the L (1-X) direction may be determined to perform parallel translational affine transformation, and based on this determination, motion information of sub-blocks within the target block may be determined and/or stored.

When the first prediction mode is performed for the target block, the motion information candidate list may be configured for the L0 direction and the L1 direction, respectively.

For example, for the LX direction, a motion information candidate list including at least one of the motion information candidate type 1 to the motion information candidate type 5 may be configured.

For the L (1-X) direction, a motion information candidate list including only parallel translation candidates may be configured. Alternatively, the list may be configured in a manner similar to a conventional merge candidate list for the L (1-X) direction. In other words, the motion information candidate list may include a general motion vector (translational motion vector) instead of affine transformation candidates or motion shifts.

The motion information candidate list for each direction may include only motion information in the corresponding direction.

The size of the motion information candidate list for the L0 direction and the size of the motion information candidate list for the L1 direction may be a and B, respectively.

A and B may be 1, 2, 3, 4, 5, 6, 8, 12 or a positive integer, respectively.

A and B may always be the same value. Alternatively, a and B may be different from each other.

When the first prediction mode is performed in the target block, information about X may be signaled/encoded/decoded.

For example, the information about X may be an indicator having a value of 0 or 1.

(2) Second prediction mode

When the prediction mode of the target block is the affine transformation AMVP mode, only the motion vector difference information in the LX direction may be signaled/encoded/decoded, and the motion vector difference information in the L (1-X) direction may be derived based thereon. Here, X may be 0 or 1.

The motion vector difference information may correspond to information including at least one of size information of a motion vector difference for the at least one CPMV, a sign of a motion vector difference for the at least one CPMV, or an index (motion vector difference prediction index) for specifying a motion vector difference from a motion vector difference candidate list for the at least one CPMV.

The magnitude of the motion vector difference and/or the sign of the motion vector difference may be related to at least one component (horizontal or vertical) of the motion vector difference.

For example, the magnitude of the motion vector difference for the CPMV at a specific location in the L1 direction of the target block may be a value obtained by scaling the magnitude of the motion vector difference for the CPMV in the L0 direction at the location of the corresponding CPMV based on the first POC difference (i.e., the difference between the POC of the target picture and the POC of the L0 direction reference picture) and the second POC difference (i.e., the difference between the POC of the target picture and the POC of the L1 direction reference picture). The sign of the motion vector difference for CPMV at a specific location in the L1 direction may be opposite to the sign of the motion vector difference for CPMV in the L0 direction at the location of the corresponding CPMV.

Hereinafter, a description will be given regarding prediction of a motion vector difference.

The motion vector difference candidate list may be configured using a combination of the signs of the possible motion vector difference components and the possible combinations of the first N most significant suffix binary numbers of the magnitudes of the motion vector differences. N may be 1, 2, 3, 4,5, 6 or a positive integer. The first N most significant suffix bins may refer to the last N bins of the bins representing the magnitude of the motion vector difference.

In signaling/encoding/decoding the magnitude information of the motion vector difference, information on binary numbers other than the first N most significant suffix binary numbers among the information on the magnitude of each component of the motion vector difference may be signaled/encoded/decoded only.

Each candidate in the motion vector difference candidate list may be ranked according to the template matching cost.

From the motion vector difference candidate list, an index (motion vector difference prediction index) corresponding to the sign of the motion vector difference and the magnitude of the motion vector difference to be used in the target block can be derived and signaled/encoded/decoded. The image decoding apparatus may derive the motion vector difference in the following manner:

■ Treatment 1:

The parsing of the magnitude of each component of the motion vector difference and the context-encoded motion vector difference prediction index is performed. Here, the magnitude information of each component of the parsed motion vector difference may correspond to information on binary numbers other than the first N most significant suffix binary numbers among the information on the magnitude of each component.

■ Treatment 2:

Motion vector difference candidates are generated by generating a combination of possible symbols and possible motion vector difference values and adding a motion vector predictor (i.e. the magnitude of the resolved motion vector difference) to the motion vector difference generated by the combination. Here, the possible motion vector difference values are generated from the possible combinations of the first N most significant suffix binary numbers.

■ Treatment 3:

Template matching costs are calculated for each of the derived motion vector difference candidates and the motion vector difference candidates are ranked based on the matching costs.

■ Treatment 4:

In order to specify a motion vector difference to be used in a target block in the motion vector difference candidates, a motion vector difference prediction index is used. In other words, the motion vector difference candidate indicated by the motion vector difference prediction index among the motion vector difference candidates is set as the motion vector difference of the target block.

The motion vector difference prediction may be applied to at least one of inter AMVP or sub-block unit prediction. For example, when at least one of inter AMVP or sub-block unit prediction is performed, motion vector difference prediction may be performed in the target block.

In some embodiments, when an affine transformation mode (e.g., affine transformation AMVP mode) is applied to a target block and motion vector difference prediction is performed, the above-described motion vector difference prediction may be performed on each of at least a portion of the CPMV of the target block.

In process 1, the number of motion vector difference prediction indexes encoded in the context of signaling/encoding/decoding for the target block may be equal to the number of CPMV in the target block.

In another embodiment, motion vector difference prediction for all CPMV of the target block may be performed at once.

For example, the motion vector difference for CPMV for a target block may be derived as follows:

■ Treatment 1:

the magnitude of each component of the motion vector difference for each CPMV is resolved and a context-encoded motion vector difference prediction index is resolved.

■ Treatment 2:

Motion vector difference candidates are generated by generating a combination of possible symbols and possible motion vector difference values and adding a motion vector predictor (i.e. the magnitude of the resolved motion vector difference) to the motion vector difference generated by the combination.

Here, the "combination between possible symbols and possible motion vector difference values" may be configured in units of CPMV sets. For example, when a 4-parameter affine transformation model is used, a combination of (upper left CPMV, upper right CPMV) may be generated.

For example, when a 4-parameter affine transformation model is used, possible combinations are as follows.

Upper left corner CPMV, (+ or-) 1 st magnitude candidate, (+ or-) 2 nd magnitude candidate);

Upper right corner CPMV, (+ or-) 3 rd magnitude candidate, (+ or-) 4 th magnitude candidate).

When using a 6-parameter affine transformation model, the possible combinations are as follows:

upper right corner CPMV, (+ or-) 3 rd magnitude candidate, (+ or-) 4 th magnitude candidate);

lower left corner CPMV, (+ or-) 5 th magnitude candidate, (+ or-) 6 th magnitude candidate).

■ Treatment 3:

A template matching cost is calculated for each derived motion vector candidate and the motion vector difference candidates are ranked based on the template matching cost.

■ Treatment 4:

in order to specify the motion vector difference to be used in the target block, a motion vector difference prediction index is used. In other words, the motion vector difference candidate indicated by the motion vector difference prediction index among the motion vector difference candidates is set as the motion vector difference of the target block.

The second prediction mode is a mode in which only motion vector difference information in the LX direction is signaled/encoded/decoded and motion vector difference information in the L (1-X) direction can be derived based thereon. The above-described motion vector difference derivation process may be performed in a mode other than the second mode.

For example, if the target block is a bi-predictive block, motion vector difference prediction may be performed separately for each of the L0 direction and the L1 direction. In other words, the number of motion vector difference prediction indexes encoded in the context of signaling/encoding/decoding in the target block may be two.

In another example, if the target block is a bi-predictive block, motion vector difference prediction may be performed once for the L0 direction and the L1 direction. For example, the motion vector difference in the L0 direction and the L1 direction of the target block can be derived as follows:

■ Treatment 1:

the magnitude of each component of the motion vector difference in the L0 and L1 directions is resolved and a context-encoded motion vector difference prediction index is resolved.

■ Treatment 2:

a motion vector difference candidate is generated by generating a combination of possible symbols and possible motion vector difference values and adding a motion vector predictor to the motion vector difference generated by the combination.

Here, a "combination between possible symbols and possible motion vector difference values" may be configured for one set in the L0 direction and the L1 direction.

For example, the combination of the generated (L0 direction, L1 direction) may be configured as follows.

-L0 direction, (+ or-) 1 st magnitude candidate, (+ or-) 2 nd magnitude candidate);

-L1 direction, (+ or-) 3 rd magnitude candidate, (+ or-) 4 th magnitude candidate).

Here, the 1 st and 3 rd magnitude candidates may be the same, and the 2 nd and 4 th magnitude candidates may be the same.

■ Treatment 3:

a matching cost is calculated for each derived motion vector candidate and the motion vector difference candidates are ranked based on the matching cost.

■ Treatment 4:

in order to specify a motion vector difference for a target block, a motion vector difference prediction index is used.

(3) Third prediction mode

The third prediction mode is a mode for determining motion information in the LX direction of the target block and deriving motion information in the L (1-X) direction using the determined motion information in the LX direction.

When the third prediction mode is applied to the target block, at least one of a reference picture index for the LX direction, adaptive motion vector resolution information, an index (or indicator) for specifying motion information from a motion information candidate list, or motion vector difference information may be signaled/encoded/decoded. Then, the motion information in the LX direction may be determined based on the signaled/encoded/decoded information in the LX direction.

The information signaled/encoded/decoded to determine the motion information in the LX direction may be information signaled/encoded/decoded in the L0 direction or information signaled/encoded/decoded in the L1 direction in the affine transformation AMVP mode.

The motion information of the target block in the L (1-X) direction may be derived based on the determined motion information in the LX direction.

When the third prediction mode is applied to the target block, information about X may be signaled/encoded/decoded. For example, the information about X may be an indicator having a value of 0 or 1. When the information about X has a value of 0, X may be 0, and when the information about X has a value of 1, X may be 1.

The motion information candidate list for the target block may be configured separately for each of the L0 direction and the L1 direction. At this time, the motion information candidate list of each direction may include only motion information in the corresponding direction.

A and B may be 1, 2, 3, 4, 5, 6, 8, 12 or a positive integer, respectively.

A and B may always be the same value. Alternatively, a and B may be different values.

Alternatively, the motion information candidate list for the target block may be a unified list in which a combination of L0-direction motion information and L1-direction motion information constitutes one candidate.

When the third prediction mode is applied to the target block, prediction of the target block may be performed by including at least one of the following methods. Methods 1 and 2 are related to the case of using a motion information candidate list for each of the L0 and L1 directions, and method 3 is related to the case of using a unified motion information candidate list.

Method 1:

(1) Constructing a motion information candidate list for the L0 direction and a motion information candidate list for the L1 direction

(2) Determining motion information for LX direction

(3) Determining motion information for L (1-X) direction

Method 2:

(1) Building a candidate list of motion information for LX direction and/or determining motion information for LX direction

(2) Constructing a candidate list of motion information for the L (1-X) direction and/or determining motion information for the L (1-X) direction

Method 3:

(1) Building an integrated motion information candidate list

(2) After determining the motion information for the LX direction, determining the motion information for the L (1-X) direction

When motion information for the L (1-X) direction is determined, LX-direction motion information of all motion information candidates in the integrated motion information candidate list may be replaced with motion information in the LX direction that is predetermined based on information signaled/encoded/decoded.

Hereinafter, it is assumed that motion information (e.g., a motion vector or a reference picture) in the LX direction has been determined.

In some embodiments, when the third prediction mode is applied to the target block, the motion information in the LX direction may be used to configure or change the motion information candidate list for the L (1-X) direction.

For example, the motion information candidate list of the target block may be configured/changed based on the reference picture in the LX direction of the target block.

The direction from the target picture to the reference picture in the LX direction is assumed to be a first direction, and the direction from the target picture to the L (1-X) direction is assumed to be a second direction. Here, the first direction is determined by the sign of the POC difference between the target picture and the reference picture in the LX direction (i.e., the POC of the reference picture in the POC-LX direction of the target picture). The second direction is determined by the sign of the POC difference between the target picture and the reference picture in the L (1-X) direction, i.e. the POC of the reference picture in the POC-L (1-X) direction of the target picture.

The motion information candidate list in the L (1-X) direction of the target block may be constructed by adding only motion information candidates having reference pictures in a second direction different from the first direction to the list.

Alternatively, the motion information candidate list for the L (1-X) direction, which has used the motion information candidate configuration, may be changed. In other words, only motion information candidates having reference pictures in a second direction different from the first direction may remain in the list, and motion information candidates having no reference pictures in the second direction may be removed from the list.

In some other embodiments, when the third prediction mode is applied to the target block, the motion information in the LX direction may be used to reorder the L (1-X) direction candidates in the motion information candidate list.

For example, the L (1-X) direction candidates in the motion information candidate list may be reordered in ascending order of matching cost between each L (1-X) direction candidate in the list and the motion information in the LX direction of the target block. Here, the matching cost may be a matching cost based on a double-sided matching calculation.

The motion information of the target block in the L (1-X) direction may be determined as the candidate that provides the lowest matching cost within the candidate list of motion information for the L (1-X) direction.

Alternatively, in order to determine the motion information of the target block in the L (1-X) direction, N candidates (where N is a natural number greater than or equal to 2) that provide the lowest matching cost among candidates in the motion information candidate list for the L (1-X) direction may be selected. Candidates to be used as motion information of the target block in the L (1-X) direction may be selected among N candidates. The image encoding device may encode information indicating the selected candidates and signal the encoded information to the image decoding device. The image decoding apparatus may determine motion information of the target block in the L (1-X) direction among the N candidates based on the information received from the image encoding apparatus.

In some other embodiments, when the third prediction mode is applied to the target block, the motion vector difference information may be signaled/encoded/decoded only for the LX direction of the target block. The motion information of the target block in the LX direction may be determined based on a motion vector difference derived using the motion vector difference information. At this time, the motion vector difference derivation method mentioned in the description of the second prediction mode may be used.

On the other hand, motion vector difference information for the L (1-X) direction is not signaled/encoded/decoded. In other words, the motion vector difference in the L (1-X) direction may always be a zero vector.

For example, in the present embodiment, affine AMVP mode may be applied to LX direction, and affine merge mode may be applied to L (1-X) direction. An embodiment may be applied in which affine merge mode is applied based on motion information in the LX direction determined by affine AMVP mode.

When the third prediction mode is applied to the target block, an indicator indicating whether the third prediction mode is performed in the target block may be signaled/encoded/decoded. When the third prediction mode is performed in the target block, information about X for specifying the LX direction in the target block may be signaled/encoded/decoded.

Further, motion information of the target block in the LX direction can be signaled/encoded/decoded. At this time, the motion information may be information including reference picture index and motion vector difference information for at least one CPMV of the target block. Further, an index (or indicator) for specifying the motion information in the LX direction from the motion information candidate list in the target block may be signaled/encoded/decoded.

LX direction motion information of a target block can be determined based on signaled/encoded/decoded information.

Thereafter, the motion information candidates in the L (1-X) direction of the motion information candidate list may be configured or reordered based on the determined motion information in the LX direction. Thereafter, the motion information in the LX direction may be determined based on the motion information candidates in the L (1-X) direction of the motion information candidate list.

6. Decoder side motion information deriving method

In some embodiments, decoder-side motion information derivation methods may include methods such as template matching and bilateral matching. For example, the motion information may be generated by template matching, or the determined first motion information (or initial motion information) may be improved by template matching or bilateral matching. In other words, the decoder-side motion information derivation method may correspond to a method of deriving the second motion information by performing refinement on the first motion information.

A prediction block of the target block may be generated by performing prediction using the derived second motion information. Alternatively, at least one of the reference blocks of the target block may be specified using the second motion information.

In an embodiment, refinement of particular information may mean modification, correction, or update of particular information. In an embodiment, the terms "refine," "modify," and "correct" may be used interchangeably. Refinement information may be generated by performing refinement on specific information.

Performing refinement on the particular motion information may involve performing at least one of the following methods on the motion information.

-Changing specific information included in the motion information using a predetermined offset

-Changing the specific information by performing a specific operation on the specific information and the predetermined offset included in the motion information. Here, the specific operation may include at least one of squaring, weighted averaging, weighted summing, four-law basic arithmetic operation, and filtering.

The predetermined offset may include at least one of a motion vector, a reference picture index, an inter prediction indicator, reference picture list information, a reference picture, a motion vector candidate index, a merge candidate and merge index, a block vector candidate, and a block vector candidate index.

The predetermined offset may be specified from the offset candidate list, and information for specifying the predetermined offset may be signaled/encoded/decoded. For example, an index for specifying at least one offset from a predetermined offset candidate list may be signaled/encoded/decoded.

In another embodiment, the method for deriving decoder-side motion information may be a method for reordering or reconfiguring a motion information list using matching costs.

The reordering may be performed on at least one of the pieces of motion information in the list based on a matching cost of each of the pieces of motion information in the list consisting of the N pieces of motion information. For example, the list composed of N pieces of motion information may be a merge candidate list. For example, all or part of the motion information in the list of N pieces of motion information may be reordered in ascending order of matching cost.

For example, in reordering motion information, if the difference between the matching costs of motion information and previous motion information satisfies |d1-d2| < λ, the motion information is regarded as redundant. Here, D1 and D2 represent matching costs, and λ represents a lagrangian parameter used in encoding. In view of the above redundancy, the motion information may be reordered as follows.

The encoding device or decoding device calculates a minimum matching cost difference between one piece of motion information and its previous motion information for all the motion information in the list. If the calculated minimum cost is greater than or equal to lambda, the encoding device or decoding device determines that the reordered list is sufficiently diversified and stops the reordering process of the motion information. If the calculated minimum cost is less than lambda, the decoding device considers the motion information redundant and moves the motion information to a more distant location in the list. At this time, the farther positions may be set so that the motion information at the corresponding positions maintains sufficient diversity together with the previous motion information. The decoding device may terminate the reordering process if the minimum matching cost becomes greater than or equal to λ during a predetermined number of iterations of the calculation.

After reordering at least one piece of motion information in the list consisting of N pieces of motion information, the list may be reconfigured.

Reconfiguring the particular list of motion information may represent at least one of 1) configuring the list using only a portion of the motion information comprising the respective list of motion information, 2) removing at least one piece of motion information comprising the respective list of motion information from the list, and 3) inserting at least one new piece of motion information into the respective list of motion information.

For example, in example 3), the new motion information may be motion information determined based on at least one piece of motion information within the motion information list. For example, the above new motion information may be an average or weighted average of two pieces of motion information having the smallest index in the list.

In another example, the new motion information in example 3) above may be predefined default motion information.

The above N may be 2,3,4, 6, 8, 12 or a positive integer.

At least one piece of motion information may be specified from a list of reorientations or reconfigurations. The prediction block of the target block may be generated by performing prediction using the specified motion information. Alternatively, at least one of the reference blocks of the target block may be specified using the specified motion information.

In another embodiment, the decoder-side motion information derivation method may correspond to a method of specifying at least one position within the search area based on the matching cost or a method of specifying at least one position of a plurality of positions within the search area based on the matching cost.

In an embodiment, the position and motion information may be used interchangeably. In an embodiment, a particular location may be used interchangeably with motion information indicating a direction from a target block to the corresponding location. In an embodiment, a particular location may mean motion information indicating a direction from a target block to the corresponding location. In an embodiment, the specific motion information may mean a position from the target block indicated by the corresponding motion information.

In an embodiment, specifying a location may mean specifying a sample point at the corresponding location.

In an embodiment, designating a location may mean designating motion information indicating a direction from a target block to the corresponding location.

For example, the decoder-side motion information derivation method may specify N positions having the lowest matching cost among the positions within the search area.

The matching cost for a particular location may represent a matching cost for motion information indicating a direction from the target block to the corresponding location.

For example, the N locations within the search area with the lowest matching costs may be specified. Alternatively, for example, N positions having the lowest matching cost among a plurality of positions within the search area may be specified. N may be 1,2, 3,4, 5, 6, 8, 12 or a positive integer.

The list may be configured using the specified N locations. At least one location may be selected from the list of configurations, and prediction may be performed on the target block based on the selected location. Information indicating the position selected from the list may be encoded and signaled from the image encoding device to the image decoding device and may be decoded by the image decoding device.

The various embodiments of the decoder-side motion information derivation method described above may be applied to a merge candidate list of motion information.

For example, a prediction block for the target block may be generated through inter prediction based on at least one piece of motion information specified from the merge candidate list.

At this time, at least one piece of motion information specified from the merge candidate list may be refined by a decoder-side motion information derivation method and then used for inter prediction.

In another example, a prediction block of the target block may be generated through inter prediction using motion information obtained by adding at least one piece of motion information specified in the merge candidate list to the motion vector difference.

At this time, instead of at least one piece of motion information specified from the merge candidate list, the motion vector difference may be added to motion information refined by performing the decoder-side motion information derivation method.

Alternatively, the motion information may be refined by performing the decoder-side motion information derivation method after adding the motion vector difference to at least one piece of motion information specified from the motion information merge candidate list.

Here, as described above, in order to configure the merge candidate list, at least one of a spatial candidate (neighboring block and non-neighboring block of the target block), a temporal candidate (temporal neighboring block), a motion information table storing motion information of blocks that have been encoded or decoded by inter prediction, and at least one default motion information candidate may be used.

Since the spatial candidate, the temporal candidate, the motion information table, and the default motion information candidate can be derived in the affine transformation mode in the same manner as described above, further description thereof will be omitted.

Furthermore, the decoder-side motion information derivation method may be performed in sub-block units. In other words, by performing the decoder-side motion information derivation method in sub-block units using motion information derived based on motion information candidates for the target block, motion information can be generated in sub-block units within the target block. Therefore, the decoder-side motion information derivation method may also be classified as a subblock-based prediction mode.

The mapping model may be used to further refine motion information generated in sub-block units based on the decoder-side motion information derivation method.

The mapping model may be derived by performing a regression involving the positions (x-and/or y-coordinates) of the sub-blocks as input and the motion information of the corresponding sub-blocks as output.

In some embodiments, the CPMV may be derived at each control point of the target block using a mapping model. For example, in the case of a 4-parameter affine model, two CPMV may be derived, and in the case of a 6-parameter affine model, three CPMV may be derived.

The CPMV of the target block may be included as one candidate of a motion information candidate list for encoding and decoding of the next block. Further, the motion information of each sub-block of the target block may be derived again based on the affine transformation model using the generated CPMV.

In another embodiment, the location of each sub-block may be input into a mapping model to regenerate motion information for the sub-block at the respective location.

7. Construction of motion information candidate list

A separate motion information candidate list may be generated for the L0 direction and the L1 direction, or a motion information candidate list in which the L0 direction and the L1 direction are combined may be generated. The following description may be applied when configuring lists such as a merge candidate list, an AMVP candidate list, a motion shift candidate list, and a motion information candidate list for affine mode.

When configuring the motion information merge candidate list for the target block, both the unidirectional prediction block and the bidirectional prediction block may be used as candidate blocks.

In order to maintain interoperability, the encoding device and the decoding device must configure the candidate list in the same way. Therefore, the following description is applicable to the encoding apparatus and the decoding apparatus, however, for convenience of description, it is assumed that the decoding apparatus is the subject of execution.

In configuring the motion information merge candidate list of the target block, the decoding apparatus may configure separate motion information merge candidate lists for the L0 direction and the L1 direction. For example, in configuring the motion information merge candidate list for the LX direction, the decoding device may use only a block for which unidirectional prediction in the LX direction has been performed as a candidate block. The decoding apparatus may configure the motion information merge candidate list for the LX direction using only the blocks on which unidirectional prediction in the LX direction has been performed.

At this time, in LX, X may be 0 or 1.

Alternatively, for example, in configuring the motion information merge candidate list for the LX direction, the decoding device may use only a block on which unidirectional prediction or bidirectional prediction in the LX direction has been performed as a candidate block. For example, the decoding apparatus may configure the motion information merge candidate list for the LX direction using only blocks on which unidirectional prediction or bidirectional prediction in the LX direction has been performed. For example, when bi-directional motion information is included in the motion information merge candidate list for the LX direction, the decoding device may include only LX direction motion information among the bi-directional motion information.

The decoding apparatus may configure only the motion information merge candidate list for the LX direction of the target block if the target block is a unidirectional prediction block in the LX direction.

The decoding apparatus may configure a motion information merge candidate list for an L0 direction and a motion information merge candidate list for an L1 direction of the target block if the target block is a bi-prediction block.

The decoding apparatus may specify the motion information of the target block from the motion information merge candidate list for the LX direction if the target block is a unidirectional prediction block in the LX direction.

For example, information for specifying at least one LX direction motion information may be signaled/encoded/decoded from a motion information merge candidate list for LX direction. At this time, the information for specifying the at least one LX direction movement information may be an index.

In another example, N pieces of motion information having the lowest index in the motion information merge candidate list for the LX direction may be used as LX direction motion information of the target block.

In another example, N pieces of motion information having the lowest matching cost in the motion information merge candidate list for the LX direction may be used as the LX direction motion information of the target block.

If the target block is a bi-predictive block, the decoding apparatus may specify motion information for both directions from the motion information merge candidate list for the L0 direction and the motion information merge candidate list for the L1 direction.

In the above embodiment, the number N of pieces of motion information may be 1,2, 3, 4, 5, 6, 8, 10, 12 or a positive integer.

N may be a value determined independently of the target block. Alternatively, N may be a value determined based on at least one of motion information in an L0/L1 direction for the target block, coding parameters, size, prediction mode, and at least one piece of motion information in a motion information merge candidate list.

Further, if the target block is a BI-prediction block, the decoding apparatus may generate a combined motion information merge list { mi_bi _1,1, MI_BI_1,2, …, MI_BI_M1,M2 } by combining M1 pieces of motion information within the motion information merge candidate list { mi_l0 ₁, MI_L0₂, …, MI_L0_M1 } for the L0 direction with M2 pieces of motion information within the motion information merge candidate list { mi_l1 ₁, MI_L1₂, …, MI_L1_M2 } for the L1 direction.

Mi_lx _i may mean the i-th motion information specified in the LX direction motion information merge candidate list. Mi_bi _i,j may mean BI-directional motion information having mi_l0 _i as motion information in the L0 direction and mi_l1 _j as motion information in the L1 direction. Here, i and j may have values from 1 to M1 and 1 to M2, respectively. Alternatively, i and j may have values from 0 to M1-1 and 0 to M2-1, respectively.

M1 and M2 may be 1, 2, 3, 4, 5, 6, 8, 10, 12 or a positive integer, respectively.

M1 and M2 may be values determined independently of the target block, respectively. Alternatively, M1 and M2 may be values determined based on at least one of motion information of the target block, encoding parameters, size, prediction mode, and at least one piece of motion information in the motion information merge candidate list for the L0/L1 direction, respectively.

The decoding apparatus may specify at least one motion information candidate for use in predicting the target block from the combined motion information merge candidate list.

For example, information for specifying at least one motion information candidate from the combined motion information merge candidate list may be signaled/encoded/decoded.

In another example, M motion information candidates having the lowest matching cost among candidates in the combined motion information merge list may be used as the motion information of the target block.

8. Weighted sum of prediction blocks

In the above embodiment, a plurality of candidates may be selected from the motion information candidate list, and motion information of each candidate may be used to generate a prediction block of the target block. In this case, the final predicted block of the target block may be obtained by a weighted average or a weighted sum.

In order to maintain interoperability, the encoding device and the decoding device must generate the prediction block in the same manner. Therefore, the following description applies to both the encoding apparatus and the decoding apparatus, however, for convenience of description, it is assumed that the decoding apparatus is the subject of execution.

The decoding apparatus may generate a plurality of reference blocks (or prediction blocks) of the target block using each of the plurality of motion information candidates. Then, by performing a weighted sum or weighted average on the reference block, a final prediction block of the target block may be generated.

The plurality of reference blocks may be reference blocks indicated by a plurality of motion information candidates. Each reference block may be a block indicated by different motion information.

For example, the decoding apparatus may generate the final prediction block by calculating a weighted sum of N reference blocks for the target block. The number N of reference blocks (or prediction blocks) may be 2,3, 4, 5, 6 or a positive integer.

In some embodiments, information about N may be signaled/encoded/decoded.

In another embodiment, the decoding apparatus may determine the number N of reference blocks according to the number of positions (or motion information) in the list configured by the decoder-side motion information derivation method in the target block.

For example, the decoding apparatus may determine N as the smaller one of the predetermined value and the number of positions in the list configured by the decoder-side motion information deriving method. Here, the predetermined value may be 2, 3,4, 5 or a positive integer.

In another embodiment, the decoding apparatus may determine N based on a matching cost of at least one position in the list configured by the decoder-side motion information derivation method in the target block.

For example, after calculating the matching cost for each position in the list configured by the decoder-side motion information derivation method, the decoding apparatus may determine N positions in the calculated matching costs, the difference from the minimum matching cost being smaller than a certain threshold.

At this time, the specific threshold may be a fixed constant.

Alternatively, the specific threshold may be adaptively determined.

For example, the decoding device may determine the specific threshold based on the lowest value of matching costs of positions in the list configured by the decoder-side motion information derivation method.

For example, the decoding apparatus may determine the specific threshold as a result value obtained by performing a specific operation on the lowest value in the matching costs of the positions in the list and the predetermined value.

At this time, the predetermined value may be-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,1, 2,4, 8, 16, 32, 64, 128, 256, 512, or real.

The predetermined value may be a fixed constant.

Alternatively, the predetermined value may be an adaptively determined value.

For example, the predetermined value may be a value determined based on at least one of a prediction mode of the target block, motion information of the target block, coding parameters of the target block, a size of the target block, a range of values allowed for a luminance component of the target block, a range of values allowed for a chrominance component of the target block, availability of neighboring blocks of the target block, coding parameters of neighboring blocks of the target block, neighboring samples of the target block, and motion information.

Further, the specific operation may include a four-base arithmetic operation.

In another embodiment, the decoding apparatus may determine the number N of reference blocks (or prediction blocks) based on the motion information of the target block and the number of merge candidates in the candidate list. The decoding apparatus may determine N as the smaller one of the predetermined value and the number of merging candidates in the motion information merging candidate list. At this time, the predetermined value may be 2,3,4, 5 or a positive integer.

In another embodiment, the decoding apparatus may determine N based on a matching cost of at least one of the merge candidates within the motion information merge candidate list in the target block.

For example, after calculating the matching cost for each merging candidate within the motion information merging candidate list, the decoding apparatus may determine N as the number of matching costs whose difference from the lowest matching cost among the calculated matching costs is smaller than a specific threshold.

At this time, the specific threshold may be a fixed constant.

Alternatively, the specific threshold may be adaptively determined.

For example, the decoding apparatus may determine the specific threshold based on the lowest value among matching costs of the merging candidates within the motion information merging candidate list.

For example, the decoding apparatus may determine the specific threshold as a result value obtained by performing a specific operation on the lowest value and a predetermined value in the matching costs of the merging candidates within the motion information merging candidate list.

The predetermined value may be a fixed constant.

Alternatively, the predetermined value may be an adaptively determined value.

Information on whether at least one of a plurality of reference blocks (or a plurality of prediction blocks) and the number of reference blocks (or prediction blocks) used in the target block is used may be signaled/encoded/decoded. For example, the information on whether to use the plurality of reference blocks (or the plurality of prediction blocks) in the target block may be a 1-bit flag.

If the above information indicates that a plurality of reference blocks (or prediction blocks) are used in the target block, the decoding apparatus may inherit motion information from the merging candidates, the motion information indicating at least one of the weighted and summed reference blocks (or prediction blocks) in the target block.

For example, the decoding apparatus may apply each motion information to a target block to generate a block, and perform weighted summation on the generated block to generate a prediction block of the target block. At this point, the weighted sum may include an average value. The weights applied to each block may be real numbers greater than 0, or negative real numbers less than 0.

For example, the decoding apparatus may determine a plurality of pieces of motion information as N merging candidates having the lowest index in the motion information merging candidate list. The decoding apparatus may determine the pieces of motion information as N merging candidates having the lowest matching cost in the motion information merging candidate list. The decoding apparatus may determine the pieces of motion information as N positions having the lowest index in the list configured by the decoder-side motion information derivation method. Alternatively, the decoding apparatus may determine the pieces of motion information as N positions having the lowest mapping cost in the list configured by the decoder-side motion information derivation method. At this time, N representing the number of pieces of motion information may be 2, 3, or a positive integer.

The decoding apparatus may derive new motion information from the pieces of motion information. For example, the decoding apparatus may derive new motion information by calculating a weighted sum of pieces of motion information. The decoding apparatus may perform prediction on the target block based on the derived motion information. The decoding apparatus may add the derived motion information to the motion information merge candidate list.

In performing the weighted sum for two or more blocks, the decoding apparatus may determine weights for the weighted sum in units of blocks, in units of predetermined areas within a block, or in units of samples within a block.

For example, the decoding device may determine weights for the blocks based on the motion information. Alternatively, the decoding apparatus may determine weights for samples belonging to a block based on the motion information.

Alternatively, the decoding apparatus may determine weights for the reference block (or prediction block) indicated by the motion information based on the matching cost for each motion information.

The weight may be determined to be inversely proportional to the matching cost.

For example, when the matching Cost of the n reference blocks is represented as Cost ₁、Cost₂、...、Cost_n, the decoding apparatus may determine the weight for the i-th reference block (or prediction block) using the following equation.

[ Equation 12]

The decoding device may determine weights for the blocks and/or weights for samples within the blocks at the same time. Alternatively, the decoding device may sequentially determine weights for the blocks and/or weights for samples within the blocks.

The encoding/decoding process of the target block is not limited to any particular embodiment of the present disclosure, and a particular embodiment or at least one combination of the above embodiments may be applied to the encoding/decoding process of the target block.

The encoder may determine and/or signal at least one of determining a size and/or a position of the at least one reference region, whether to perform sub-block based prediction, information related to prediction in sub-block units, performance of prediction in sub-block units, a method for storing and referencing motion information, a method for predicting/reconstructing/decoding a target block, information on performance of a decoder side motion information derivation method, information related to prediction in sub-block units, whether to perform prediction in sub-block units, information related to prediction in sub-block units, performance of prediction in sub-block units, a method for storing and referencing motion information, a method for predicting/reconstructing/decoding a target block, and information on performance of a decoder side motion information derivation method during at least one of the embodiments.

Furthermore, by using at least one embodiment described during the above-described processing, the decoder may determine and/or signal/encode/decode at least one of the size and/or position of at least one reference region, whether to perform sub-block based prediction, information related to prediction in sub-block units, performance in prediction in sub-block units, methods for storing and referencing motion information, and performance of decoder-side motion information derivation methods.

The above embodiment describes the case of using two reference picture lists, but the inter-picture prediction to which the present disclosure is applied is not limited to the case of using two reference picture lists, and the embodiment of the present disclosure may be applied even when num_ REFPICLIST reference picture lists are used. NUM_ REFPICLIST can be 1, 2,3, or a positive integer.

The embodiments of the present disclosure describe a case of using one or two motion information candidate lists, but the inter-picture prediction to which the present disclosure is applied is not limited to a case of using one or two motion information candidate lists, and may be applied even when num_ MILIST motion information candidate lists are used. NUM_ MILIST can be 1,2,3, or a positive integer.

Further, embodiments of the present disclosure may be applied only when the target block is greater than or equal to a minimum size and less than or equal to a maximum size, wherein the minimum size and the maximum size may correspond to a size of one of the block and the unit, respectively. In other words, the block having the smallest size and the block having the largest size may be different from each other. For example, the embodiments of the present disclosure may be applied only when the current block size is greater than or equal to the minimum block size and less than or equal to the maximum block size.

Furthermore, inter-coding mode, IBC mode and intra template matching prediction mode share a common feature, i.e. these modes use vectors to indicate the reference blocks for the prediction target block within the reconstruction region for the prediction target block.

Thus, in embodiments, the description of inter-coding modes above may apply to IBC modes or intra template matching modes. In addition, information related to inter-coding modes may be used as information related to IBC modes or intra-template matching modes.

Furthermore, in an embodiment, the Motion Vector (MV) in inter coding mode may be replaced with the Block Vector (BV) of IBC. The description of the case where the MV is used for the target block may also be applied to the case where the BV is used for the target block, and the MV may be replaced with the BV. Further, the MV-related information may be regarded as BV-related information, and the description of the MV-related information may be applied to BV-related information.

In an embodiment, the method may be described based on a flowchart comprising a series of steps or units. The method is not limited to the order in which the steps are described, and some steps may be performed in a different order than described, or may be performed concurrently with other steps. Furthermore, the steps described in the flowcharts and the like may not be exclusive. Additional steps may be inserted between the steps described in the flowcharts or the like. One or more steps described in the flowcharts or the like may be deleted or skipped.

Embodiments may include examples of the various aspects. Although not all possible combinations for indicating various aspects are described, one of ordinary skill in the art will recognize that additional combinations other than the explicitly described combinations are possible. Accordingly, it is to be understood that the present disclosure includes all other substitutions, alterations, and modifications that fall within the scope of the appended claims.

The above-described embodiments according to the present disclosure may be implemented as program instructions executable by various computer components and may be recorded on a computer-readable storage medium.

The computer readable storage medium may include a non-transitory computer readable recording medium. Examples of computer readable storage media include all types of hardware devices that are specially configured to record and perform program instructions, such as magnetic media (such as hard disks, floppy disks, and magnetic tape), optical media (such as Compact Discs (CD) -ROMs, and Digital Versatile Discs (DVDs)), magneto-optical media (such as floppy disks, ROMs, RAMs, and flash memory). The hardware devices may be configured to operate as one or more software modules in order to perform the operations of the present disclosure, and vice versa.

The computer readable storage medium may include program instructions, data files, and data structures, alone or in combination. The program instructions recorded on the computer-readable storage medium may be specially designed and configured for the embodiments, or may be disclosed and available to those having skill in the computer software arts.

The program instructions may include machine code, such as produced by a compiler, and may include high-level language code that may be executed by a computer using an interpreter or the like. Program instructions may be specified as computer executable code or a program. In an embodiment, program instructions, computer-executable code, and programs may be used interchangeably with each other.

The computer readable storage medium may include information used in the embodiments. For example, the computer-readable storage medium may include a bitstream, and the bitstream may include information described in the embodiments. The information described in the embodiments may include syntax elements. The information of the syntax elements and the like described in the embodiments can be understood as computer-executable code from the standpoint of the encoding apparatus and the decoding apparatus running the code to perform specific processing.

The bitstream may include computer executable code. The computer executable code may include pieces of information described in the embodiments, such as syntax elements. In other words, the pieces of information, such as syntax elements, described in the embodiments may be regarded as computer executable code or as part of computer executable code in the bitstream.

As described above, while the present disclosure has been described based on specific details such as detailed components and a limited number of embodiments and drawings, these are provided only for easy understanding of the entire disclosure, the present disclosure is not limited to these embodiments, and various changes and modifications will be practiced by those skilled in the art in light of the above description.

It is noted, therefore, that the spirit of the present embodiment is not limited to the above-described embodiment, and that the appended claims and equivalents and modifications thereof fall within the scope of the present disclosure.

Claims

1. An image decoding method for predicting target blocks within a target image using sub-block units, the method comprising:

The motion information offset is determined by decoding the offset information from the bitstream;

The first motion shift is derived based on one or more pre-reconstructed blocks that have been decoded before the target block;

The second motion shift is determined based on the first motion shift and the motion information offset;

Determine the reference block at the position indicated by the second motion shift within the col screen selected from the reference screen list; and

Use the motion information of the corresponding sub-block in the reference block to derive the motion information of each sub-block within the target block.

2. The method according to claim 1, wherein deriving the first motion displacement comprises:

Use one or more pre-reconstructed blocks to configure the motion displacement candidate list; and

Select at least one candidate from the list of motion displacement candidates, and determine the first motion displacement based on the selected candidate.

3. The method of claim 2, wherein the at least one candidate is selected based on the candidate index decoded from the bitstream.

4. The method of claim 2, wherein the at least one candidate is selected based on the template matching cost for each candidate in the motion shift candidate list.

5. The method according to claim 1, wherein the col screen is selected based on the difference in screen order count (POC) between the reference screen and the target screen in the reference screen list.

6. The method of claim 1, wherein the col screen is selected from the reference screens in the reference screen list based on information decoded from the bitstream.

7. The method according to claim 1, further comprising:

An indicator from the bitstream decoding indicates whether a prediction mode for sub-block units is applied, the prediction mode using motion shifting with the motion information offset.

Specifically, when the indicator indicates that the prediction mode is applied, the offset information is decoded.

8. The method of claim 1, wherein the offset information is an offset index for selecting the motion information offset from at least one predefined list of motion information offset candidates.

9. The method of claim 8, wherein the at least one predefined motion information offset list comprises:

Angle list, including one or more angles allowed by the motion information offset; and

The distance offset list includes one or more distances allowed by the motion information offset.

10. The method of claim 8, wherein the number of motion information offset candidates defined by the motion information offset candidate list is determined based on the ratio of the area occupied by blocks in a frame decoded before the target frame or in a col frame to which the prediction mode is applied.

11. The method of claim 8, wherein the number of motion information offset candidates defined by the motion information offset candidate list is determined based on the difference in frame order count (POC) between the frame decoded before the target frame or the col frame and the target frame.

12. An image coding method for predicting target blocks within a target image using sub-block units, the method comprising:

Determine the motion information offset and encode the offset information indicating the motion information offset;

13. A method for transmitting a bitstream comprising encoded image data, the method comprising:

A bitstream is generated using an image coding method for predicting target blocks within a target image in sub-block units; and

Send the bitstream to the image decoding device.

The image encoding method includes: