[go: up one dir, main page]

WO2019072422A1 - Espace de recherche à chevauchement pour affinement de vecteur de mouvement bi-prédictif - Google Patents

Espace de recherche à chevauchement pour affinement de vecteur de mouvement bi-prédictif Download PDF

Info

Publication number
WO2019072422A1
WO2019072422A1 PCT/EP2018/057892 EP2018057892W WO2019072422A1 WO 2019072422 A1 WO2019072422 A1 WO 2019072422A1 EP 2018057892 W EP2018057892 W EP 2018057892W WO 2019072422 A1 WO2019072422 A1 WO 2019072422A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion vector
template
search space
image
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2018/057892
Other languages
English (en)
Inventor
Semih Esenlik
Zhijie Zhao
Han GAO
Anand Meher KOTRA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2019072422A1 publication Critical patent/WO2019072422A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/533Motion estimation using multistep search, e.g. 2D-log search or one-at-a-time search [OTS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/563Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/57Motion estimation characterised by a search window with variable size or shape
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction

Definitions

  • the present invention relates to the field of video coding and in particular to motion vector refinement applicable in an inter-prediction.
  • a picture of a video sequence is subdivided into blocks of pixels and these blocks are then coded. Instead of coding a block pixel by pixel, the entire block is predicted using already encoded pixels in the spatial or temporal proximity of the block.
  • the encoder further processes only the differences between the block and its prediction.
  • the further processing typically includes a transformation of the block pixels into coefficients in a transformation domain.
  • the coefficients may then be further compressed by means of quantization and further compacted by entropy coding to form a bitstream.
  • the bitstream further includes any signaling information which enables the decoder to decode the encoded video.
  • the signaling may include settings concerning the encoder settings such as size of the input picture, frame rate, quantization step indication, prediction applied to the blocks of the pictures, or the like.
  • Temporal prediction exploits temporal correlation between pictures, also referred to as frames, of a video.
  • the temporal prediction is also called inter-prediction, as it is a prediction using the dependencies between (inter) different video frames.
  • a block being encoded also referred to as a current block
  • a reference picture is not necessarily a picture preceding the current picture in which the current block is located in the displaying order of the video sequence.
  • the encoder may encode the pictures in a coding order different from the displaying order.
  • a co-located block in a reference picture may be determined.
  • the co-located block is a block which is located in the reference picture on the same position as is the current block in the current picture.
  • Such prediction is accurate for motionless picture regions, i.e. picture regions without movement from one picture to another.
  • motion estimation is typically employed when determining the prediction of the current block.
  • the current block is predicted by a block in the reference picture, which is located in a distance given by a motion vector from the position of the co-located block.
  • the motion vector may be signaled in the bitstream.
  • the motion vector itself may be estimated. The motion vector estimation may be performed based on the motion vectors of the neighboring blocks in spatial and/or temporal domain.
  • the prediction of the current block may be computed using one reference picture or by weighting predictions obtained from two or more reference pictures.
  • the reference picture may be an adjacent picture, i.e. a picture immediately preceding and/or the picture immediately following the current picture in the display order since adjacent pictures are most likely to be similar to the current picture.
  • the reference picture may be also any other picture preceding or following the current picture in the displaying order and preceding the current picture in the bitstream (decoding order). This may provide advantages for instance in case of occlusions and/or non-linear movement in the video content.
  • the reference picture identification may thus be also signaled in the bitstream.
  • a special mode of the inter-prediction is a so-called bi-prediction in which two reference pictures are used in generating the prediction of the current block.
  • two predictions determined in the respective two reference pictures are combined into a prediction signal of the current block.
  • the bi-prediction may result in a more accurate prediction of the current block than the uni- prediction, i.e. prediction only using a single reference picture.
  • the more accurate prediction leads to smaller differences between the pixels of the current block and the prediction (referred to also as "residuals"), which may be encoded more efficiently, i.e. compressed to a shorter bitstream.
  • more than two reference pictures may be used to find respective more than two reference blocks to predict the current block, i.e. a multi-reference inter prediction can be applied.
  • the term multi-reference prediction thus includes bi-prediction as well as predictions using more than two reference pictures.
  • the resolution of the reference picture may be enhanced by interpolating samples between pixels.
  • Fractional pixel interpolation can be performed by weighted averaging of the closest pixels. In case of half-pixel resolution, for instance a bilinear interpolation is typically used.
  • Other fractional pixels are calculated as an average of the closest pixels weighted by the inverse of the distance between the respective closest pixels to the pixel being predicted.
  • the motion vector estimation is a computationally complex task in which a similarity is calculated between the current block and the corresponding prediction blocks pointed to by candidate motion vectors in the reference picture.
  • the search region includes M x M samples of the image and each of the sample position of the M x M candidate positions is tested.
  • the test includes calculation of a similarity measure between the N x N reference block C and a block R, located at the tested candidate position of the search region.
  • SAD sum of absolute differences
  • x and y define the candidate position within the search region, while indices i and j denote samples within the reference block C and candidate block R.
  • the candidate position is often referred to as block displacement or offset, which reflects the representation of the block matching as shifting of the reference block within the search region and calculating a similarity between the reference block C and the overlapped portion of the search region.
  • the number of candidate motion vectors is usually reduced by limiting the candidate motion vectors to a certain search space.
  • the search space may be, for instance, defined by a number and/or positions of pixels surrounding the position in the reference picture corresponding to the position of the current block in the current image.
  • the candidate motion vectors may be defined by a list of candidate motion vectors formed by motion vectors of neighboring blocks. Motion vectors are usually at least partially determined at the encoder side and signaled to the decoder within the coded bitstream. However, the motion vectors may also be derived at the decoder. In such case, the current block is not available at the decoder and cannot be used for calculating the similarity to the blocks to which the candidate motion vectors point in the reference picture.
  • a template is used which is constructed out of pixels of already decoded blocks. For instance, already decoded pixels adjacent to the current block may be used.
  • Such motion estimation provides an advantage of reducing the signaling: the motion vector is derived in the same way at both the encoder and the decoder and thus, no signaling is needed. On the other hand, the accuracy of such motion estimation may be lower.
  • a motion vector derivation may include selection of a motion vector from the list of candidates.
  • Such a selected motion vector may be further refined for instance by a search within a search space.
  • the search in the search space is based on calculating cost function for each candidate motion vector, i.e. for each candidate position of block to which the candidate motion vector points.
  • the present disclosure provides motion vector prediction, based on updating the template used for inter prediction and in particular for bi- prediction or, in general, multi-frame prediction, wherein the search spaces searched with different templates overlap.
  • an apparatus for determination of a motion vectorfor an image block including a processing circuitry configured to: determine a first motion vector as a first refinement of an initial motion vector for the image prediction block by template matching with a first template in a first search space including a first plurality of candidate motion vector positions; generate a second template based on the first motion vector; determine a second motion vector as a second refinement of the initial motion vector for the image block by template matching with the second template in a second search space including a second plurality of candidate motion vector positions, wherein the first search space and the second search space overlap in one or more candidate motion vector position.
  • the one or more candidate motion vector positions present in both the first search space and the second search space include the position pointed to by the first motion vector.
  • the one or more candidate motion vector positions present in both the first search space and the second search space includes a position different from the position pointed to by the first motion vector.
  • the first search space and the second search space is formed by nine candidate motion vector positions arranged in a three times three square.
  • the processing circuitry is further configured to: determine whether or not the first template and the second template are identical; if the first template and the second template are determined to be identical, perform the template matching with the second template only on those candidate motion vector positions which are not included in both the first search space and the second search space; and if the first template and the second template are determined not to be identical, perform the template matching with the second template on all candidate motion vector positions included in the second search space.
  • processing circuitry may be further configured to determine whether or not the first template and the second template are identical according to at least one of:
  • the N-th search space in N-th iteration overlaps with M preceding search spaces of the (N-i)th respective iterations with i being an integer from 1 to M, and does not overlap for i greater than M.
  • the processing circuitry is configured to terminate the iterations if a position pointed to by an m-th refined motion vector is the same as a position pointed to by an n-th refined motion vector, wherein n ⁇ m and both m and n are non-zero integers different from each other.
  • the processing circuitry is configured to terminate the iterations if a position pointed to by an m-th refined motion vector is included in the n-th search space of the n- the iteration, wherein n ⁇ m and both m and n are non-zero integers different from each other.
  • an apparatus for encoding a video image comprising: the apparatus for determination of a motion vector for an image block as described in the above examples and embodiments, and an image coding circuitry configured to perform video image coding of the image block based on predictive coding using the determined motion vector and generating a bitstream including the coded image block.
  • an apparatus for decoding a video image from a bitstream comprising: a bitstream parser for extracting from the bitstream portions corresponding to a compressed video image including compressed image block to be decoded; the apparatus for determination of a motion vector for the image block as described in the above examples and embodiments, and an image reconstruction circuitry configured to perform image reconstruction of the image block based on the motion vector.
  • a method for determination of a motion vector for an image block comprising: determining a first motion vector as a first refinement of an initial motion vector for the image prediction block by template matching with a first template in a first search space including a first plurality of candidate motion vector positions; generating a second template based on the image samples pointed to by the first motion vector; determining a second motion vector as a second refinement of the initial motion vector for the image block by template matching with the second template in a second search space including a second plurality of candidate motion vector positions, wherein the first search space and the second search space overlap in one or more candidate motion vector position.
  • the one or more candidate motion vector positions present in both the first search space and the second search space include the position pointed to by the first motion vector and/or a position different from the position pointed to by the first motion vector.
  • the first search space and the second search space is formed by nine candidate motion vector positions arranged in a three times three square.
  • the following steps are also included in the method: determining whether or not the first template and the second template are identical; if the first template and the second template are determined to be identical, performing the template matching with the second template only on those candidate motion vector positions which are not included in both the first search space and the second search space; and if the first template and the second template are determined not to be identical, performing the template matching with the second template on all candidate motion vector positions included in the second search space.
  • the determination whether or not the first template and the second template are identical may be performed according to at least one of: - Comparing samples of the first template with the respective samples in the same position of the second template;
  • the steps of determining a k-th motion vector as a k-th refinement of an initial motion vector for the image prediction block by template matching with a k-th template in a k-th search space including a k-th plurality of candidate motion vector positions; and generating a (k+1 )th template based on the image samples pointed to by the k-th motion vector, with k being an integer larger than 0, are iteratively repeated.
  • the N-th search space in N-th iteration overlaps with M preceding search spaces of the (N-i)-th respective iterations with i being an integer from 1 to M, and does not overlap for i greater than M.
  • the termination of iterations may be performed if the following one or two conditions are fulfilled: if a position pointed to by an m-th refined motion vector is the same as a position pointed to by an n-th refined motion vector, wherein n ⁇ m and both m and n are non-zero integers different from each other; and/or if a position pointed to by an m-th refined motion vector is included in the n-th search space of the n-the iteration, wherein n ⁇ m and both m and n are non-zero integers different from each other.
  • a method for encoding a video image comprising: the method for determination of a motion vector for an image block as described above, and performing video image coding of the image block based on predictive coding using the determined motion vector and generating a bitstream including the coded image block.
  • a method for decoding a video image from a bitstream comprising: extracting from the bitstream portions corresponding to a compressed video image including compressed image block to be decoded; the method for determination of a motion vector for the image block as described above, and performing image reconstruction of the image block based on the motion vector.
  • a non-transitory computer-readable storage medium is provided storing instructions which when executed by a processor / processing circuitry perform the steps according to any of the above aspects or embodiments or their combinations.
  • Figure 1 is a block diagram showing an exemplary structure of an encoder in which the motion vector derivation and refinement may be employed
  • Figure 2 is a block diagram showing an exemplary structure of a decoder in which the
  • Figure 3a is a schematic drawing illustrating a method for determining motion vectors
  • Figure 3b is a schematic drawing illustrating a method for determining motion vectors
  • Figure 4a is a schematic drawing illustrating a method for determining image samples and motion vectors according to embodiment 1 ;
  • Figure 4b is a schematic drawing illustrating a method for determining image samples and motion vectors according to embodiment 2;
  • Figures 5a to 5c are schematic drawings illustrating the three method steps for determining image samples and motion vectors according to embodiment 1 ;
  • Figures 6a to 6b are schematic drawings illustrating the two method steps for determining image samples and motion vectors according to embodiment 2;
  • Figures 7a and 7b are schematic drawings illustrating bi-prediction with motion vector
  • Figure 8 is a block diagram illustrating a processing circuitry for performing the motion vector refinement and template update
  • Figure 9 is a schematic drawing illustrating overlapping search spaces in the same reference picture
  • Figure 10 is a flow diagram showing motion vector refinement with template update and overlapping search spaces
  • Figure 1 1 is a flow diagram showing motion vector refinement with template update and overlapping search spaces with a condition
  • Figure 12 is a schematic drawing illustrating overlapping search spaces in the same reference picture.
  • the present disclosure relates to iteratively refined determination of template and motion vectors for an inter prediction. It may provide an improved inter prediction and may be advantageously employed in motion estimation performed during encoding and decoding of video.
  • exemplary encoder and decoder which may implement the motion estimation with the iterative refinement of the template matching are described.
  • the search spaces in the same reference picture but different iterations (applying different templates) overlap. This approach, even though counter-intuitive, may provide better refinement results as it also takes into account the changing template.
  • the present disclosure enables template matching possibly leading to higher similarity of the prediction block to the current block of original samples compared with the prior art because the interactive refinement is less susceptible for local minima. It rather tends towards global minima of the cost function, i.e. to higher similarity.
  • Fig. 1 shows an encoder 100 which comprises an input for receiving input image samples of frames or pictures of a video stream and an output for generating an encoded video bitstream.
  • the term "frame" in this disclosure is used as a synonym for picture.
  • the present disclosure is also applicable to fields in case interlacing is applied.
  • a picture includes m times n pixels. This corresponds to image samples and may comprise one or more color components. For the sake of simplicity, the following description refers to pixels meaning samples of luminance.
  • the motion vector search of the invention can be applied to any color component including chrominance or components of a search space such as RGB or the like.
  • the input blocks to be coded do not necessarily have the same size.
  • One picture may include blocks of different sizes and the block raster of different pictures may also differ.
  • the encoder 100 is configured to apply prediction, transformation, quantization, and entropy coding to the video stream.
  • the transformation, quantization, and entropy coding are carried out respectively by a transform unit 106, a quantization unit 108 and an entropy encoding unit 170 so as to generate as an output the encoded video bitstream.
  • the video stream may include a plurality of frames, wherein each frame is divided into blocks of a certain size that are either intra or inter coded.
  • the blocks of for example the first frame of the video stream are intra coded by means of an intra prediction unit 154.
  • An intra frame is coded using only the information within the same frame, so that it can be independently decoded and it can provide an entry point in the bitstream for random access.
  • Blocks of other frames of the video stream may be inter coded by means of an inter prediction unit 144: information from previously coded frames (reference frames) is used to reduce the temporal redundancy, so that each block of an inter-coded frame is predicted from a block in a reference frame.
  • a mode selection unit 160 is configured to select whether a block of a frame is to be processed by the intra prediction unit 154 or the inter prediction unit 144. This mode selection unit 160 also controls the parameters of intra or inter prediction. In order to enable refreshing of the image information, intra-coded blocks may be provided within inter-coded frames.
  • intra-frames which contain only intra-coded blocks may be regularly inserted into the video sequence in order to provide entry points for decoding, i.e. points where the decoder can start decoding without having information from the previously coded frames.
  • the intra estimation unit 152 and the intra prediction unit 154 are units which perform the intra prediction.
  • the intra estimation unit 152 may derive the prediction mode based also on the knowledge of the original image while intra prediction unit 154 provides the corresponding predictor, i.e. samples predicted using the selected prediction mode, for the difference coding.
  • the coded blocks may be further processed by an inverse quantization unit 1 10, and an inverse transform unit 1 12.
  • a loop filtering unit 120 is applied to further improve the quality of the decoded image.
  • the filtered blocks then form the reference frames that are then stored in a decoded picture buffer 130.
  • Such decoding loop (decoder) at the encoder side provides the advantage of producing reference frames which are the same as the reference pictures reconstructed at the decoder side. Accordingly, the encoder and decoder side operate in a corresponding manner.
  • reconstruction here refers to obtaining the reconstructed block by adding to the decoded residual block the prediction block.
  • the inter estimation unit 142 receives as an input a block of a current frame or picture to be inter coded and one or several reference frames from the decoded picture buffer 130. Motion estimation is performed by the inter estimation unit 142 whereas motion compensation is applied by the inter prediction unit 144. The motion estimation is used to obtain a motion vector and a reference frame based on certain cost function, for instance using also the original image to be coded. For example, the motion estimation unit 142 may provide initial motion vector estimation. The initial motion vector may then be signaled within the bitstream in form of the vector directly or as an index referring to a motion vector candidate within a list of candidates constructed based on a predetermined rule in the same way at the encoder and the decoder.
  • the motion compensation then derives a predictor of the current block as a translation of a block co-located with the current block in the reference frame to the reference block in the reference frame, i.e. by a motion vector.
  • the inter prediction unit 144 outputs the prediction block for the current block, wherein said prediction block minimizes the cost function.
  • the cost function may be a difference between the current block to be coded and its prediction block, i.e. the cost function minimizes the residual block.
  • the minimization of the residual block is based e.g. on calculating a sum of absolute differences (SAD) between all pixels (samples) of the current block and the candidate block in the candidate reference picture.
  • SAD sum of absolute differences
  • any other similarity metric may be employed, such as mean square error (MSE) or structural similarity metric (SSIM).
  • the rate-distortion optimization procedure may be used to decide on the motion vector selection and/or in general on the encoding parameters such as whether to use inter or intra prediction for a block and with which settings.
  • the intra estimation unit 152 and inter prediction unit 154 receive as an input a block of a current frame or picture to be intra coded and one or several reference samples from an already reconstructed area of the current frame.
  • the intra prediction then describes pixels of a current block of the current frame in terms of a function of reference samples of the current frame.
  • the intra prediction unit 154 outputs a prediction block for the current block, wherein said prediction block advantageously minimizes the difference between the current block to be coded and its prediction block, i.e., it minimizes the residual block.
  • the minimization of the residual block can be based e.g. on a rate-distortion optimization procedure.
  • the prediction block is obtained as a directional interpolation of the reference samples. The direction may be determined by the rate-distortion optimization and/or by calculating a similarity measure as mentioned above in connection with inter-prediction.
  • the inter estimation unit 142 receives as an input a block or a more universal-formed image sample of a current frame or picture to be inter coded and two or more already decoded pictures 231 .
  • the inter prediction then describes a current image sample of the current frame in terms of motion vectors to reference image samples of the reference pictures.
  • the inter prediction unit 142 outputs one or more motion vectors for the current image sample, wherein said reference image samples pointed to by the motion vectors advantageously minimize the difference between the current image sample to be coded and its reference image samples, i.e., it minimizes the residual image sample.
  • the predictor for the current block is then provided by the inter prediction unit 144 for the difference coding.
  • the difference between the current block and its prediction i.e.
  • the transform unit 106 is then transformed by the transform unit 106.
  • the transform coefficients 107 are quantized by the quantization unit 108 and entropy coded by the entropy encoding unit 170.
  • the thus generated encoded picture data 171 i.e. encoded video bitstream, comprises intra coded blocks and inter coded blocks and the corresponding signaling (such as the mode indication, indication of the motion vector, and/or intra-prediction direction).
  • the transform unit 106 may apply a linear transformation such as a Fourier or Discrete Cosine Transformation (DFT/FFT or DCT). Such transformation into the spatial frequency domain provides the advantage that the resulting coefficients 107 have typically higher values in the lower frequencies.
  • DFT/FFT or DCT Discrete Cosine Transformation
  • Fig. 2 shows a video decoder 200.
  • the video decoder 200 comprises particularly a decoded picture buffer 230, an inter prediction unit 244 and an intra prediction unit 254, which is a block prediction unit.
  • the decoded picture buffer 230 is configured to store at least one (for uni- prediction) or at least two (for bi-prediction) reference frames reconstructed from the encoded video bitstream, said reference frames being different from a current frame (currently decoded frame) of the encoded video bitstream.
  • the intra prediction unit 254 is configured to generate a prediction block, which is an estimate of the block to be decoded.
  • the intra prediction unit 254 is configured to generate this prediction based on reference samples that are obtained from the decoded picture buffer 230.
  • the decoder 200 is configured to decode the encoded video bitstream generated by the video encoder 100, and preferably both the decoder 200 and the encoder 100 generate identical predictions for the respective block to be encoded / decoded.
  • the features of the decoded picture buffer 230 and the intra prediction unit 254 are similar to the features of the decoded picture buffer 130 and the intra prediction unit 154 of Fig. 1.
  • the video decoder 200 comprises further units that are also present in the video encoder 100 like e.g. an inverse quantization unit 210, an inverse transform unit 212, and a loop filtering unit 220, which respectively correspond to the inverse quantization unit 1 10, the inverse transform unit 1 12, and the loop filtering unit 120 of the video coder 100.
  • An entropy decoding unit 204 is configured to decode the received encoded video bitstream and to correspondingly obtain quantized residual transform coefficients 209 and signaling information.
  • the quantized residual transform coefficients 209 are fed to the inverse quantization unit 210 and an inverse transform unit 212 to generate a residual block.
  • the residual block is added to a prediction block 265 and the addition is fed to the loop filtering unit 220 to obtain the decoded video.
  • Frames of the decoded video can be stored in the decoded picture buffer 230 and serve as a decoded picture 231 for inter prediction.
  • the intra prediction units 154 and 254 of Figs. 1 and 2 can use reference samples from an already encoded area to generate prediction signals for blocks that need to be encoded or need to be decoded.
  • the entropy decoding unit 204 receives as its input the encoded bitstream 171 .
  • the bitstream is at first parsed, i.e. the signaling parameters and the residuals are extracted from the bitstream.
  • the syntax and semantic of the bitstream is defined by a standard so that the encoders and decoders may work in an interoperable manner.
  • the encoded bitstream does not only include the prediction residuals.
  • a motion vector indication is also coded in the bitstream and parsed therefrom at the decoder.
  • the motion vector indication may be given by means of a reference picture in which the motion vector is provided and by means of the motion vector coordinates. So far, coding the complete motion vectors was considered. However, also only the difference between the current motion vector and the previous motion vector in the bitstream may be encoded. This approach allows exploiting the redundancy between motion vectors of neighboring blocks.
  • H.265 codec In order to efficiently code the reference picture, H.265 codec (ITU-T, H265, Series H: Audiovisual and multimedia systems: High Efficient Video Coding) provides a list of reference pictures assigning to list indices respective reference frames. The reference frame is then signaled in the bitstream by including therein the corresponding assigned list index. Such list may be defined in the standard or signaled at the beginning of the video or a set of a number of frames. It is noted that in H.265 there are two lists of reference pictures defined, called L0 and L1 . The reference picture is then signaled in the bitstream by indicating the list (L0 or L1 ) and indicating an index in that list associated with the desired reference picture. Providing two or more lists may have advantages for better compression.
  • LO may be used for both uni-directionally inter- predicted slices and bi-directionally inter-predicted slices while L1 may only be used for bi- directionally inter-predicted slices.
  • the lists LO and L1 may be defined in the standard and fixed. However, more flexibility in coding/decoding may be achieved by signaling them at the beginning of the video sequence. Accordingly, the encoder may configure the lists LO and L1 with particular reference pictures ordered according to the index.
  • the LO and L1 lists may have the same fixed size. There may be more than two lists in general.
  • the motion vector may be signaled directly by the coordinates in the reference picture. Alternatively, as also specified in H.265, a list of candidate motion vectors may be constructed and an index associated in the list with the particular motion vector can be transmitted.
  • Motion vectors of the current block are usually correlated with the motion vectors of neighboring blocks in the current picture or in the earlier coded pictures. This is because neighboring blocks are likely to correspond to the same moving object with similar motion and the motion of the object is not likely to change abruptly over time. Consequently, using the motion vectors in neighboring blocks as predictors reduces the size of the signaled motion vector difference.
  • the Motion Vector Predictors are usually derived from already encoded/decoded motion vectors from spatial neighboring blocks or from temporally neighboring blocks in the co-located picture. In H.264/AVC, this is done by doing a component wise median of three spatially neighboring motion vectors. Using this approach, no signaling of the predictor is required.
  • Temporal MVPs from a co-located picture are only considered in the so called temporal direct mode of H.264/AVC.
  • the H.264/AVC direct modes are also used to derive other motion data than the motion vectors. Hence, they relate more to the block merging concept in HEVC.
  • motion vector competition which explicitly signals which MVP from a list of MVPs, is used for motion vector derivation.
  • the variable coding quad-tree block structure in HEVC can result in one block having several neighboring blocks with motion vectors as potential MVP candidates.
  • a 64x64 luma prediction block could have 16 4x4 luma prediction blocks to the left when a 64x64 luma coding tree block is not further split and the left one is split to the maximum depth.
  • AMVP Advanced Motion Vector Prediction
  • the final design of the AMVP candidate list construction includes the following two MVP candidates: a) up to two spatial candidate MVPs that are derived from five spatial neighboring blocks; b) one temporal candidate MVPs derived from two temporal, co-located blocks when both spatial candidate MVPs are not available or they are identical; and c) zero motion vectors when the spatial, the temporal or both candidates are not available. Details on motion vector determination can be found in the book by V. Sze et al (Ed.), High Efficiency Video Coding (HEVC): Algorithms and Architectures, Springer, 2014, in particular in Chapter 5, incorporated herein by reference.
  • HEVC High Efficiency Video Coding
  • the motion vector refinement may be performed at the decoder without assistance from the encoder.
  • the encoder in its decoder loop may employ the same refinement to obtain corresponding motion vectors.
  • Motion vector refinement is performed in a search space which includes integer pixel positions and fractional pixel positions of a reference picture.
  • the fractional pixel positions may be half-pixel positions or quarter-pixel or further fractional positions.
  • the fractional pixel positions may be obtained from the integer (full-pixel) positions by interpolation such as bi-linear interpolation.
  • a bi-prediction of current block two prediction blocks obtained using the respective first motion vector of list L0 and the second motion vector of list L1 , are combined to a single prediction signal, which can provide a better adaptation to the original signal than uni-prediction, resulting in less residual information and possibly a more efficient compression.
  • a template is used, which is an estimate of the current block and which is constructed based on the already processed (i.e. coded at the encoder side and decoded at the decoder side) image portions.
  • an estimate of the first motion vector MV0 and an estimate of the second motion vector MV1 are received as input at the decoder 200.
  • the motion vector estimates MV0 and MV1 may be obtained by block matching and/or by search in a list of candidates (such as merge list) formed by motion vectors of the blocks neighboring to the current block (in the same picture or in adjacent pictures).
  • MV0 and MV1 are then advantageously signaled to the decoder side within the bitstream.
  • the first determination stage at the encoder could be performed by template matching which would provide the advantage of reducing signaling overhead.
  • the motion vectors MVO and MV1 are advantageously obtained based on information in the bitstream.
  • the MVO and MV1 are either directly signaled, or differentially signaled, and/or an index in the list of motion vector (merge list) is signaled.
  • the present disclosure is not limited to signaling motion vectors in the bitstream.
  • the motion vector may be determined by template matching already in the first stage, correspondingly to the operation of the encoder.
  • the template matching of the first stage may be performed based on a search space different from the search space of the second, refinement stage. In particular, the refinement may be performed on a search space with higher resolution (i.e. shorter distance between the search positions).
  • An indication of the two reference pictures RefPicO and RefPicl , to which respective MVO and MV1 point, are provided to the decoder as well.
  • the reference pictures are stored in the decoded picture buffer at the encoder and decoder side as a result of previous processing, i.e. respective encoding and decoding.
  • One of these reference pictures is selected for motion vector refinement by search.
  • a reference picture selection unit of the apparatus for the determination of motion vectors is configured to select the first reference picture to which MVO points and the second reference picture to which MV1 points. Following the selection, the reference picture selection unit determines whether the first reference picture or the second reference picture is used for performing of motion vector refinement.
  • the search region in the first reference picture is defined around the candidate position to which motion vector MVO points.
  • the candidate search space positions within the search region are analyzed to find a block most similar to a template block by performing template matching within the search space and determining a similarity metric such as the sum of absolute differences (SAD).
  • the positions of the search space denote the positions on which the top left corner of the template is matched. As already mentioned above, the top left corner is a mere convention and any point of the search space such as the central point can in general be used to denote the matching position.
  • the document JVET-D0029 describes a specific implementation of the motion vector refinement.
  • the specific implementation is called Decoder-Side Motion Vector Refinement (DMVR).
  • DMVR has as an input the initial motion vectors MVO and MV1 which point into two respective reference pictures RefPictO and RefPictl . These initial motion vectors are used for determining the respective search spaces in the RefPictO and RefPictl .
  • the function may be sample clipping operation in combination with weighted summation.
  • the cost function for determining the best template match in the respective search spaces is SAD(Template, Block A'), where block A is the coding block which is pointed by the candidate MV in the search space spanned on a position given by the MVO.
  • Figure 7a illustrates the determination of the best matching block A and the resulting refined motion vector MVO'.
  • the same template is used to find best matching block B' and the corresponding motion vector MVV which points to block B' as shown in Figure 7b.
  • the refined motion vectors MVO' and MVV are found via search on RefPicO and RefPid with the template.
  • the template is updated at least once, as will be described below.
  • a block template is calculated by adding together the blocks that are referred by MVO and MV1 .
  • the block template is used to find a refined MVO' and/or MVV.
  • the refinement process is divided into steps. After each step, the block template is constructed / updated based on the refined motion vectors that were obtained in the previous step.
  • the step may include refinement of both MVO and MVV Alternatively, the template update may be performed after each of refinements of the respective MVO and MVV
  • the number of template updates and the cost function for template matching could be pre-defined or signaled in the bitstream.
  • search space for the template matching with the initial template overlaps with the search space for the template matching with the updated template.
  • the initial motion vector was signaled in the bitstream. Accordingly, it may be determined in the encoder based on the original image and obtained at the decoder based on the signaled quantity.
  • the signaled quantity is an indication of the motion vector. This may be the motion vector itself defined by the coordinates (offset from the co-located block to the initial predictor block). However, it may be more efficient to construct a list of candidate motion vectors based on the motion vectors of the neighboring blocks (temporally and/or spatially) and/or some predefined values and to signal only an index to a candidate within such list.
  • the initial motion vector may still be also performed by template matching.
  • the term "refinement” refers to operation in which the initial motion vector defines a search space in which a template matching is used to test candidate positions in the surroundings of the initial motion vector to find a possibly better match.
  • the result may be the same, initial motion vector or a motion vector on one of the search space candidate positions.
  • the step of generating the updated template based on the image samples pointed to by the refined motion vector may also be performed in various different ways. It is noted that the term "updated” does not necessarily means that the updated template is determined as a function of the initial template. In one option, this may be the case. In another option, the updated template is newly constructed based on the samples pointed to by the refined motion vector. In other words, the updated template may be generated as a function of the block pointed to by the refined motion vector. In addition, the updated template may be generated as a function of the previous template and/or initial template and/or block pointed to by initial motion vector as will be described in detail in some selected examples below.
  • processing circuitry 800 is illustrates in Figure 8.
  • the processing circuitry may include any hardware and the configuration may be implemented by any kind of programming or hardware design of a combination of both.
  • the processing circuitry may be formed by a single processor such as general purpose processor with the corresponding software implementing the above steps.
  • the processing circuitry may be implemented by a specialized hardware such as an ASIC (Application-Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array) of a DSP (Digital Signal Processor) or the like.
  • ASIC Application-Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the processing circuitry may include one or more of the above mentioned hardware components interconnected for performing the above motion vector refinement including template update.
  • the processing circuitry 800 implements two functionalities: performance of the template construction 810 and motion vector refinement 820 performed with the updated template. These two functionalities may be implemented on the same piece of hardware or may be performed by separate units of hardware such as a template determination unit 810 and motion vector refinement unit 820.
  • the present disclosure provides at least one update of the template and refinement of the motion vector by applying template matching with the updated template.
  • further refinement may be achieved by iteratively updating the template and iteratively performing the refinements with the so updated respective templates.
  • the refinement may be performed iteratively, over i being an integer larger than 1 , by repeating the following steps of i-th iteration:
  • the iterative approach is illustrated also in the block diagram of the processing circuitry in Figure 8, in which the arrow between the template updating functionality / unit 810 and the motion vector refinement functionality / unit 820 allows for iterative updating and refinement.
  • the template update is based on the refinement result and the refinement result is depending on the updated template.
  • the motion vector refinement starts similarly to the motion vector refinement described above with reference to Figure 7a and 7b.
  • the motion vector refinement is started with obtaining a first initial motion vector MV0 (0) and a second initial motion vector MV1 (0) pointing to a first initial image sample block A (0) and a second initial image sample block B (0) , respectively.
  • the first and the second initial image sample blocks may be in two different reference (already reconstructed) pictures RefPicO and RefPid , both being different from the current picture.
  • the first and the second initial image sample blocks may be in the same decoded picture, which is different from the current picture.
  • an initial template C (0) is generated. It is noted that in general, the motion vector points to a particular sample.
  • a block is in general any group of samples with a predefined size and shape.
  • a typical example is a rectangle or a square of samples.
  • the size may correspond to the coding tree unit or coding unit. For instance, blocks of 2 x 2 pixels or 4 x 4 pixels or 8 x 8 pixels or any size such as 128 x 128 pixels may be provided. Rectangular sizes of 4 x 8 or 8 x 16 or any other may also be defined.
  • the template follows the size and shape of the prediction blocks. However, this is not necessarily the case.
  • the template may be smaller than the current / prediction block and may be formed by a subset of the sample positions of the current / prediction block. In the following, two exemplary embodiments will be described.
  • Embodiment 1 starts from two initial motion vectors and performs the updating of the template following the refinement of either of the two initial motion vectors.
  • the processing circuitry of the present disclosure in Embodiment 1 is configured to:
  • the template update is performed after each motion vector refinement, i.e. after refining any of the two initial motion vectors. It is noted that the most current applications employ bi-prediction including two reference pictures. However, the present disclosure is also employable to approaches which use prediction referring to more than two reference pictures, i.e. starting from more than two initial motion vectors.
  • the template updating / motion refinement may be performed iteratively.
  • the processing circuitry may be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration:
  • the first-direction update is an update of the template based on the refined motion vector in one of the two reference pictures, i.e. motion vector pointing in one, first direction.
  • the second- direction update is an update based on refined motion vector in the other one of the two reference pictures, i.e. refinement of the motion vector pointing in a different, second direction.
  • the template matching includes a search for a best match in a search space which may be constructed around the position given by the refined / updated motion vector to be updated in the current iteration.
  • the search space includes at least two candidate positions to be tested by the template matching to find the best matching position to become pointed to by the updated motion vector.
  • Fig. 4a illustrates the processing steps of Embodiment 1 in a schematic drawing.
  • the motion vector refinement of the present embodiment contains the following processing steps.
  • the function determining the template may be a combination such as a weighted average of the first and the second initial image samples, though the invention is not limited to this function. Additional steps, including clipping, filtering and shifting, may follow the weighted averaging to determine the template.
  • a search space is defined in the first decoded picture RefPicO and in the search space a reference picture portion best matching the template C (0) is determined.
  • the best matching portion defines the first updated image sample block A (1) and its position determines the first refined motion vector MV0 (1) as shown in Fig. 5a.
  • the first updated image sample block A (1) and the second initial image sample block B (0) are used to determine a first-direction update of the template generating template C (1) .
  • the template update may be performed as follows:
  • C ⁇ 1 > function (A ⁇ 1 >, B ⁇ °>)
  • the function is a sample-vise combination of sample blocks A (1) pointed to by the refined motion vector and B (0) pointed to by the initial motion vector.
  • the function may be an average or a weighted average.
  • a and b may be determined inverse proportional to the distance between the current frame including the current block and the respective blocks A and B.
  • a simple average may be employed in general, for instance in cases where the refPictO and RefPictl have the same temporal distance from the current frame. However, for simplicity reasons, the average may also be used for other cases.
  • a weighted average may be used with weights determined based on the distance of the respective refPictO and RefPictl from the current picture. In particular, the block (A or B) closer to the current block has higher weight than the block (B or A) farther from the current block.
  • the weighting factors can be derived by other means, e.g. they might be signalled in the bitstream.
  • the function might also include sample clipping as an example, where the intensity of each sample of weighted average of A and B is restricted to an intensity range defined by [minimum Intensity, maximum intensity], where the values of minimum intensity and maximum intensity can be signalled, predefined or derived.
  • Another example function is the rounding operation that would be implemented after the weighted averaging of blocks A and B.
  • averaging operation can be implemented as (A+B+1 )»1 , where "»" corresponds to shifting of bits to the right and discarding the least significant bit, for finite sample precision computations.
  • the operations on blocks A and B are performed sample-wise, meaning that they are performed for each of the elements of A and the corresponding respective element of B.
  • refinement of the second initial motion vector MV1 (0) is performed.
  • a search space is defined in the second decoded picture RefPid and in this search space a reference picture portion best matching the template C (1) is determined.
  • the best matching portion defines the second updated image sample block B (1) and its position determines the second refined motion vector MV1 (1) as shown in Fig. 5b.
  • the first updated image sample block A (1) and the second updated image sample block B (1) are used to further update the template generating template C (2) .
  • the updated template may be determined as follows:
  • templates C (1) and C (2) are updated by constructed them based on the most up-to date blocks in the respective two reference pictures.
  • T ⁇ 2 > function (V 1 B ⁇ 1 >).
  • the template may be updated in any way using the samples pointed to by the most recently refined motion vector.
  • the functions for updating the templates C (1) and C (2) may be but are not necessarily the same.
  • this processing includes the following steps:
  • image sample B (i+1) and the second refined motion vector MV1 (i+1) are determined.
  • the template update may also be performed in a different manner.
  • the present disclosure is not limited to a plurality of iterations.
  • already a single update may provide for improved motion estimation.
  • already the update of the refined motion vector based on the updated first-direction template C (i+1> ' 1 may provide the advance without further template updates.
  • the resolution of the search space may change to a finer resolution with an increasing number of repetitions.
  • the search space spanned by the initial motion vector may include candidate positions in an integer-sample distance from the initial motion vector while the search space spanned after the refinement for the matching with the updated template may include positions in half-sample distances from the refined / updated motion vector.
  • search spaces including quad-sample positions may be tested and so on.
  • the present disclosure is not limited to this approach and the search spaces may maintain the same resolution or be defined with positions of mixed integer, half, quad or other sample resolution.
  • Embodiment 2 performs the updating of the template following the refinement of both motion vectors, i.e. motion vectors pointing in the two respective reference pictures.
  • the processing circuitry is configured to: - determine, for the image prediction block, a refinement of a first initial motion vector pointing to a first picture and a refinement of a second initial motion vector pointing to a second picture by template matching with an initial template to generate a respective first refined motion vector and second refined motion vector; - generate an updated template as a function of image samples pointed to by the first refined motion vector and second refined motion vector;
  • Fig. 4b illustrates the processing steps of embodiment 2 in a schematic drawing.
  • the motion vector refinement of the present embodiment contains the following processing steps.
  • Refinement of the first initial motion vector MV0 (0) is performed by determining a template C (0) based on the first and the second initial image sample blocks A (0) and B (0) .
  • the function determining the template may be a weighted average of image samples pointed by the first and the second initial motion vectors, though the invention is not limited to this function. Additional steps, including clipping, filtering and shifting, may follow the weighted averaging to determine the template.
  • a search space is defined in the first decoded picture RefPicO on a position given by the sample position pointed by the corresponding initial motion vector.
  • a reference picture portion best matching the template C (0) is determined.
  • the best matching portion defines the first updated image sample block A (1) and its position determines the first refined motion vector MV0 (1) .
  • another search space is defined in the second decoded picture RefPid based on the position of the corresponding initial motion vector and in this search space a reference picture portion best matching the same template C (0) is determined as well.
  • the best matching portion defines the second updated image sample block B (1) and its position determines the second refined motion vector MV1 (1) . This processing step is shown in Fig. 6a.
  • the first updated image sample A (1) and the second updated image sample B (1) are used to further update the template generating template C (1) .
  • the updated template is obtained as follows:
  • C ⁇ 1 > function (A ⁇ 1 >, B ⁇ 1 >).
  • processing steps of motion vector refinement and template update discussed above are repeated a number of times, until a maximum number of times imax is reached.
  • the maximum number of times i ma x is pre-defined or signaled in the bitstream.
  • the processing step following the second repetition is shown in Fig. 6b.
  • this processing includes the following steps:
  • sample A (i+1) and the first refined motion vector MV0 (i+1) are determined.
  • image sample B (i+1) and the second refined motion vector MV1 (i+1) are determined.
  • the update of the template C (i+1) is generated based on the image samples A (i+1) and B ⁇ i+1 >.
  • the resolution of the search space may change to a finer resolution with an increasing number of repetitions as already noted above for Embodiment 1 .
  • Embodiment 2 One of the advantages of Embodiment 2 over Embodiment 1 is that the refinement on reference picture 0 and 1 can be performed in parallel. However coding gain may be reduced in comparison to Embodiment 1 in which a more accurate result may be achieved.
  • the number of template updates can be predefined or signaled in the bitstream. For instance, the number of iterations can be signaled as a sequence level parameter down to CU level parameter. In other words, the number of iterations may be signaled in a set of parameters applicable for one or more video frames such as picture parameter set or sequence parameter set known from H.264/H.265. This kind of signaling does not require much overhead.
  • a finer adaption may be achieved if the number of iterations is signaled on a picture portion basis.
  • Such picture portion may be a slice or any portion such as tile. Signaling in a CTU / CU basis may be too fine but is applicable with the present disclosure.
  • maximum number of template updates can be predefined, i.e. fixed to a certain number such as 1 , 2, or 3. This may be defined by the standard and may differ for different image resolutions or based on other coding settings.
  • the number of iterations is derived at the encoder and decoder in the same way based on the encoding settings, i.e. based on coding parameters signaled in the bitstream (block size, partitioning information, search space configuration, resolution, etc.).
  • the iterations can be terminated before the maximum number is reached according to one of the following conditions:
  • the iterations may be terminated if the motion vector in iteration i is the same as the initial motion vector.
  • the iterations may be terminated if the updated motion vector is the same as the motion vector from the previous iteration, i.e. motion vector in iteration i is equal to motion vector in iteration i+1.
  • the equality may be defined after a clipping or rounding of the motion vector.
  • this condition is fulfilled if the difference between the refined and non-refined MV exceeds a certain threshold. Accordingly, if the improvement is rather small, the iterations are terminated.
  • the motion vector of the current iteration i may be compared either with the initial motion vector or with the motion vector of the immediately preceding iteration i-1.
  • the iterations may be terminated if the updated motion vector is the difference between the motion vector in iteration i and motion vector in iteration i+1 exceeded a threshold.
  • This threshold is different and larger than the threshold from condition A mentioned above. This condition should reduce the risk of divergence of the refinement iterations.
  • N may be an integer or a fractional number larger than zero.
  • the length of motion vector refers to the absolute value irrespectively of the direction.
  • the condition may be fulfilled if the length of motion vector in x or y axes plus block width/height exceeds a certain threshold (at frame, slice, tile or other segmentation boundaries).
  • a certain threshold at frame, slice, tile or other segmentation boundaries.
  • the condition is fulfilled if the sum of the length of the motion vector in x axis and the top left coordinate of the prediction unit exceeds the right frame boundary.
  • the condition is fulfilled if the sum of the length of the motion vector in x axis, the top left coordinate of the prediction unit and the width of the prediction unit exceeds the left frame boundary.
  • the 2 specific examples apply similarly in the horizontal direction as well, where the comparison is performed w.r.t. the top and the bottom frame boundaries.
  • the magnitude of the x- or y-component of the motion vector exceeds a maximum value (which might be predefined, signalled, or derived), then the condition is satisfied.
  • This condition is satisfied if the difference between the cost function value calculated for the motion vectors in two following iterations is too small. In other words, if the refinement only results in a negligible improvement, the iterations are stopped. For example, if SAD for the block pointed to by motion vector from iteration i is same or only slightly lower than the SAD for the block pointed to by motion vector from iteration i-1 , the iterations are stopped. The condition may also compare improvement in certain iteration i with the SAD corresponding to the initial motion vector. As also mentioned above, SA Dis only an example and in general, a different cost function may be used. It is noted that the cost function used for template matching could be also signaled in the bitstream.
  • An index may be used to indicate the selected template matching function according to a pre-defined cost function table.
  • the index could be signaled as a sequence or picture level parameter.
  • the number of iterations is a predefined number and the processing circuitry is further configured to stop the interactive refinement of the motion vectors and block templates before the predefined number is reached if a predefined condition is met, the predefined condition being one or a combination of the following:
  • Thr2 The difference between the updated motion vector after iteration K and the updated motion vector after iteration L exceeds a predetermined second threshold, Thr2.
  • Thr3, j being an integer 1 or larger.
  • Thr6 the length of motion vector after iteration j along the y axis (i.e. in vertical direction) exceeds a predetermined fourth threshold, Thr4.
  • a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit and to a block height (i.e. size in y direction, of the prediction unit) is below a seventh threshold, Thr7.
  • a result of adding the length of the motion vector after iteration j along the y the top-left coordinate of a prediction unit exceeds a eight threshold, Thr8.
  • the third to eight threshold may be selected so that the predictor resulting from the motion vector after the j-th iteration does not cross frame or slice or tile boundary, i.e. a boundary of a picture portion which is to be decodable without spatial dependency on other parts of the same picture.
  • coordinate refers to the x or y coordinate of a sample within the picture, such as sample pointed to by the motion vector or a sample of the search space.
  • the thresholds may be defined in standard, determined based on coding parameters signaled in the bitstream or signaled separately in the bitstream.
  • the third and fourth thresholds may have the same value. However, assuming that the movement may be larger in x axis, it may also make sense to set them differently.
  • the conditions may be combined. For instance, the difference between the motion vectors in iterations i and i+1 may be conditioned on a lower (first) and higher (second) threshold. The iterations are terminated if the difference is below the first or above the second threshold. Moreover, a combined conditions on motion vector size / difference and on cost function value / difference may be provided.
  • iterations can be terminated if the difference between the updated motion vector after iteration i and the initial motion vector along the x-axis is greater than a certain threshold, where the threshold can be signalled, predefined or derived depending on quantization parameter, frame width, etc.
  • an encoding apparatus may be provided for encoding a video image, the apparatus comprising motion vector determining device with the processing circuitry as described above for determination of a motion vector for an image prediction block.
  • the encoding apparatus may further include an image coding circuitry configured to perform video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
  • the encoding apparatus may be further configured to encode into the bitstream the initial motion vector determined as described above with reference to Fig. 1 .
  • the encoding apparatus may further be configured to encode into the bitstream further configuration parameters such as an enabling flag for enabling the use of the iterative template update as described above, number of iterations, some of the thresholds mentioned above or the like.
  • a decoding apparatus for decoding a video image from a bitstream, the apparatus comprising a bitstream parser for extracting from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded.
  • the decoding apparatus may further comprise the motion vector determining apparatus as described above for determination of a motion vector for the image prediction block including the refinement and template update.
  • the decoder may further include an image reconstruction circuitry configured to perform image reconstruction of the image prediction block based on the motion vector and other parts as described with reference to Figure 2 above.
  • the present disclosure provides the corresponding methods.
  • a method is provided for determination of a motion vector for an image prediction block.
  • the method includes a step of determining a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector and then generating an updated template based on the image samples pointed to by the refined motion vector.
  • An updated motion vector is then determined for the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
  • a method for encoding a video image comprising determining of the motion vector for an image prediction block according to the above mentioned method as well as then performing video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
  • a further method is provided for decoding a video image from a bitstream comprising the steps of parsing from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded; determining of a motion vector for the image prediction block according to the method described above, and performing image reconstruction of the image prediction block based on the motion vector.
  • the initial motion vectors MV0(0) and MV1 (0) are provided.
  • a template is determined based on them, namely as a function of the block samples of the respective blocks A and B pointed to by the two initial motion vectors.
  • the motion refinement is performed with the template to obtain refined motion vectors MV0(1 ) and MV1 (1 ).
  • the refined motion vectors may be used to define a further search space such as a fractional search space to determine the final motion vector refinement MV0(2) and MV1 (2).
  • the initial motion vectors MV0(0) and MV1 (0) are provided.
  • a template is determined in step 310 based on them, namely as a function of the block samples of the respective blocks A(0) and B(0) pointed to by the two initial motion vectors.
  • the motion refinement is performed in step 320 with the template to obtain refined motion vectors MV0(1 ) and MV1 (1 ).
  • the template is updated as a function of the determined updated motion vectors MV0 (1) and MV1 (1) namely as a function of the block samples of the blocks pointed to by the respective updated motion vectors.
  • the refined motion vectors may be used to define a further search space such as a fractional search space to determine the final motion vector refinement MV0 (2) and MV1 (2) .
  • the updated motion vectors MV0 (1) and MV1 (1) may be further updated by template matching with the template updated in step 330 in one or more iterations.
  • the search space refinement 340 may be performed after the iterations are performed.
  • the iterations may also include the additional refinement of the search space so that the next search based on the template updated in step 330 may be performed based on the refined motion vectors MV0 (2) and MV1 (2) rather than MV0 (1) and MV1 (1) .
  • the steps may be repeated as follows 310 -> 320 -> 330 -> 340 -> 310 -> 320 -> 330... etc. Search space overlapping
  • an apparatus for determination of a motion vector for an image block (a current block in a current picture).
  • the apparatus includes a processing circuitry as also described above with reference to Figure 8.
  • the processing circuitry is configured to:
  • a first motion vector MVO' (as the best matching point 915) as a first refinement of an initial motion vector MVO for the image block by template matching with a first template T1 in a first search space S1 including a first plurality of candidate motion vector positions;
  • a second motion vector MVO as a second refinement of the initial motion vector MVO for the image block by template matching with the second template T2 in a second search space S2 including a second plurality of candidate motion vector positions, wherein the first search space S1 and the second search space S2 overlap in one or more candidate motion vector position.
  • a search space e.g. S1 or S2
  • the cost function e.g. SAD
  • the best of the candidate position corresponds to the best matching motion vector, i.e. motion vector of the current block.
  • the generation of the second template T2 based on the first motion vector MVO' is achieved, for instance, by taking the position pointed to by motion vector MVO' as well as further positions surrounding the position pointed to by MVO' and generating the template by using the sample (pixel) values on these positions.
  • the template may be generated as an average between these positions pointed to by MVO' and the positions pointed to by the initial motion vector MVO or by a motion vector MV1 pointing to another reference picture or the like.
  • Some updating approaches are also mentioned above in the Embodiments 1 and 2 and may be used in this embodiment, too.
  • the second template T2 might be generated solely based on a motion vector MV1 pointing to another reference picture other than the reference picture pointed by MVO'.
  • Search space S1 includes nine candidate positions and is considered to be located around the central point of the nine positions. In this example the positions correspond to a square of 3 x 3 positions mutually equidistantly placed in vertical as well as horizontal direction.
  • the best matching motion vector corresponds to the position 915 in the top right corner of the search space S1.
  • the template is updated and the second search space is determined.
  • the determination of the second search space may be performed in various different ways.
  • the second search space S2 has a shape and size same as the first search space but is located around the position pointed to by best matching motion vector 915.
  • Search spaces S1 and S2 overlap in four points.
  • overlap means that the template matching is performed twice for the four overlapping points: in iteration 1 as well as in iteration 2. This may still provide different results, since the template used for matching may differ in iteration 1 and 2. Alternatively, or in addition, the reference picture may differ for different iterations in overlapping search spaces.
  • the first search space S1 and the second search space S2 are both located in the same reference picture 0 in iteration 1 and iteration 2.
  • the present invention is not limited thereto. In different iterations, different reference pictures may be used to find the best motion vector.
  • the feature of overlapping search spaces may be implemented in any of the embodiments described above, irrespectively of an iteration changes reference picture and/or template.
  • the amount of overlap may range from 1 point to all points, meaning that the first search space and the second search space may be located at identical positions within the reference picture or the respective reference pictures.
  • the one or more candidate motion vector positions present in both the first search space S1 and the second search space S2 include the position 915 pointed to by the first motion vector MVO'. This is illustrated in Figure 9 as well as in Figure 12.
  • the one or more candidate motion vector positions present in both the first search space S1 and the second search space S2 includes a position different from the position pointed to by the first motion vector.
  • the first search space S1 and the second search space S2 is formed by nine candidate motion vector positions arranged in a three times three square.
  • the two search spaces may be ful-pel or sub-pel, may differ from each other by size and/or shape.
  • the search space of DMVR can overlap between the iterations. This is a counterintuitive approach, since a previously searched point is searched at least once more.
  • the reason for allowing overlapped search spaces is the template updating process, which modifies the template to be used in template matching operation of refinement search process or the switch to another reference picture from one iteration to another iteration. Since template of iteration 2 is different from iteration 1 , same points in the search space of iteration 1 would result in a different matching outcome in iteration 2. Even though only two iterations are shown in Figures 9 and 12, there may be any number of iterations as those two described above.
  • Figure 10 illustrates a method of preforming the motion vector refinement.
  • motion vector MVO initial motion vector
  • This inputting may correspond to determination by block matching with the current block at the encoder side and then including MVO into the bitstream of the coded video including the encoded current block.
  • the initial motion vector MVO may be extracted (parsed) from the bitstream.
  • the initial motion vector may also be determined by a different way (previous motion vector refinement).
  • the template is generated (or updated). For example, the template is generated in one of the ways mentioned above with reference to the Embodiment 1 and 2.
  • the first search space is determined.
  • 9 search points are used in each iteration (center point plus 8 surrounding points).
  • the center point is the position pointed to by the initial motion vector MVO.
  • Mvdiffx may include different values, it may include one or more (e.g. also all) fractional pixel positions.
  • the number of positions in a search space may be smaller or larger than 9 (in general the number of search space positions may be 2 or more).
  • the number of positions in a search space may vary for different iterations. Search space size and form can be different between iterations. For example, in early iteration(s), integer distance between positions in a search space may be taken, while the later iteration(s) may use search space with fractional positions, thus gradually increasing search resolution.
  • the search space determined in step 1020 is used in step 1030 for motion vector refinement with template matching.
  • the best refined motion vector MVO' within the search space determined in step 1020 is found by template matching with the template generated / updated in step 1010. The first iteration ends.
  • step 1040 the template is updated.
  • Template updating process does not necessarily change the template - this depends on the image content. It is possible that for some blocks and some iterations template before updating is equal to template after updating, although in general it is expected to change.
  • the processing circuitry is further configured to determine whether or not the first template and the second template are identical. If the first template and the second template are determined to be identical, perform the template matching with the second template only on those candidate motion vector positions which are not included in both the first search space and the second search space. If the first template and the second template are determined not to be identical, perform the template matching with the second template on all candidate motion vector positions included in the second search space.
  • This implementation enables saving some computational power in cases when the only source of different matching result is the updated template. It is noted that if the first and the second search space were in different reference pictures, the matching may still be performed even if the updated template is equal to the template before updating.
  • the processing circuitry may be further configured to determine whether or not the first template and the second template are identical (or very close to each other) according to at least one of: - Comparing samples of the first template with the respective samples in the same
  • the comparison may be performed considering all samples of the templates or only a subset of the samples in the templates to reduce the complexity.
  • This condition may be, for instance a check
  • the condition may be whether or not the motion vector refinement has changed the motion vector in last iteration (for example, if motion vector is unchanged, the template remains unchanged too if it is calculated based on the best matching block).
  • the present disclosure is not limited to these exemplary conditions.
  • This may be, for instance, a flag switching template updating on/off.
  • a search space is determined around the first motion vector MV0'.
  • step 1060 motion vector refinement is performed by searching for the best motion vector MVO" in the second search space S2.
  • the two search spaces are in the same reference picture.
  • the motion vector refinement does not necessarily ends after 2 iterations, i.e. two searches for the best motion vector in two search spaces.
  • a further refinement may be performed starting from the motion vector MVO" around which the third search space may be constructed in the same or a different way as for the first and second search spaces.
  • steps of c) determining a k-th motion vector as a k-th refinement of an initial motion vector for the image prediction block by template matching with a k-th template in a k-th search space including a k-th plurality of candidate motion vector positions;
  • the iterations may be terminated after a pre-set number of iterations W, W being an integer larger than 1.
  • the number of iterations W may be set depending on the application and/or processing circuitry on which the iterations are running to meet some delay constraints. Alternatively or in addition, it may be given by the standard for video coding.
  • the number of iterations may be unlimited or limited to the predetermined number W and, in addition, there may be some further one or more conditions on fulfillment of which the iterations stop irrespectively of whether or not W was reached.
  • the processing circuitry is configured to terminate the iterations:
  • the N-th search space in N-th iteration overlaps with M preceding search spaces of the (N-i)th respective iterations with i being an integer from 1 to M, and does not overlap for i greater than M.
  • Figure 1 1 illustrates another exemplary flow diagram.
  • the right hand side branch if the center point is not checked twice in the second iteration. If the template is not changed, the template matching cost (which was already obtained in the first iteration) of the center point is not expected to change. Therefore, the template matching cost obtained in first iteration can be reused here without actual computation of the cost. This reduces the computational complexity.
  • the left hand side corresponds to Figure 10.
  • condition 1 190 which decides whether the operations 1040 and 1050 of Figure 10 are performed or alternative operation 1 150 on the right hand side.
  • Step 1 150 corresponds to determining a search space for the next iteration in case the template is not updated.
  • the second search space is formed by the positions ⁇ MV0' + MVDiff x ⁇ without the center position pointed to by the updated motion vector MV0'.
  • Mvdiffx ⁇ (-1 ,0), (-1 ,0), (0,1 ), (0,-1 ), (-1 ,-1 ), (- 1 ,+1 ), (+1 ,-1 ), (+1 ,+1 ) ⁇ .
  • positions which were already included in the first search space i.e.
  • Figure 12 illustrates an example, in which at least one point in addition to the center point of the iteration 2 (the outcome of iteration 1 , i.e. the position to which the motion vector refined in iteration 1 points) overlaps with the search space of a previous iteration, if there is a template updating process in between the iterations.
  • the iterations are performed in the same reference picture (Ref pic 0 in the figure). However, as mentioned previously, the iterations may also go over different reference pictures.
  • the center point of the second iteration is considered to be the starting point of the second iteration.
  • the template matching with the overlapping search spaces may be used in an encoder as well as decoder.
  • an apparatus for encoding a video image comprising: the apparatus for determination of a motion vector for an image block as described above; and an image coding circuitry configured to perform video image coding of the image block based on predictive coding using the determined motion vector and generating a bitstream including the coded image block.
  • the image coding circuitry may include one or more (e.g. also all) blocks of Figure 1.
  • the apparatus comprising: a bitstream parserfor extracting from the bitstream portions corresponding to a compressed video image including compressed image block to be decoded; the apparatus for determination of a motion vector for the image block as described above; and an image reconstruction circuitry configured to perform image reconstruction of the image block based on the motion vector.
  • the image coding circuitry may include one or more (e.g. also all) blocks of Figure 2.
  • the present disclosure also provides the respective methods corresponding to the steps performed by the processing circuitry as described above.
  • the motion vector determination with template adaption as described above can be implemented as a part of encoding and/or decoding of a video signal (motion picture).
  • the motion vector determination may also be used for other purposes in image processing such as movement detection, movement analysis, or the like without limitation to be employed for encoding / decoding.
  • the motion vector determination may be implemented as an apparatus.
  • Such apparatus may be a combination of a software and hardware.
  • the motion vector determination may be performed by a chip such as a general purpose processor, or a digital signal processor (DSP), or a field programmable gate array (FPGA), or the like.
  • DSP digital signal processor
  • FPGA field programmable gate array
  • the present invention is not limited to implementation on a programmable hardware. It may be implemented on an application- specific integrated circuit (ASIC) or by a combination of the above mentioned hardware components.
  • ASIC application- specific integrated circuit
  • the motion vector determination may also be implemented by program instructions stored on a computer readable medium.
  • the program when executed, causes the computer to perform the steps of the above described methods.
  • the computer readable medium can be any medium on which the program is stored such as a DVD, CD, USB (flash) drive, hard disc, server storage available via a network, etc.
  • the encoder and/or decoder may be implemented in various devices including a TV set, set top box, PC, tablet, smartphone, or the like, i.e. any recording, coding, transcoding, decoding or playback device. It may be a software or an app implementing the method steps and stored / run on a processor included in an electronic device as those mentioned above.
  • the present disclosure provides a technique in which a motion vector for a prediction block is determined. Based on a provided initial motion vector and a provided template, the motion vector is refined by template matching with a template. The refined motion vector points to image samples, which are used to update the template. Using the refined motion vector and the updated template, the motion vector is further refined by another iteration of template matching.
  • an apparatus for determination of a motion vector for an image prediction block including a processing circuitry configured to: determine a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector; generate an updated template based on the image samples pointed to by the refined motion vector; determine an updated motion vector for the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
  • a processing circuitry configured to: determine a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector; generate an updated template based on the image samples pointed to by the refined motion vector; determine an updated motion vector for the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
  • the processing circuitry may be any combination of hardware and/or software including one or more hardware pieces.
  • the processing circuit may be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a refined motion vector obtained in the (i-1 )-th iteration; and determining an i-th update of the refined motion vector for the image prediction block by template matching with the i-th update of the template.
  • the processing circuitry is configured to determine, for the image prediction block, a refinement of a first initial motion vector pointing to a first picture and a refinement of a second initial motion vector pointing to a second picture by template matching with an initial template to generate a respective first refined motion vector and second refined motion vector; generate an updated template as a function of image samples pointed to by the first refined motion vector and second refined motion vector; and determine, for the image prediction block, a first updated motion vector and a second updated motion vector by template matching with the updated template in the respective first picture and second picture.
  • the processing circuit may be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a first refined motion vector and the second refined motion vector obtained in the (i-1 )-th iteration; and determining an i-th update of the first refined motion vector and the second refined motion vector for the image prediction block by template matching with the i-th update of the template.
  • the processing circuitry is configured to: obtain a first initial motion vector and a second initial motion vector; determine, for the image prediction block, a refinement of the first initial motion vector pointing to a first picture by template matching with an initial template in the first picture to generate a respective first refined motion vector; generate an updated template as a function of image samples pointed to by the first refined motion vector and the second initial motion vector; determine, for the image prediction block, a refinement of the second initial motion vector pointing to a second picture by template matching with the updated template in the second picture to generate a second refined motion vector; and generate the updated template as a function of image samples pointed to by the first refined motion vector and the second refined motion vector.
  • the processing circuit can be configured to iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: determine, for the image prediction block, an i- th refinement of the first motion vector pointing to the first picture by template matching with the updated template in the first picture; generate an i-th first-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the (i-1 )- th refinement of the second motion vector; determine, for the image prediction block, an i-th refinement of the second initial motion vector pointing to the second picture by template matching with the i-th first-direction-update of the template in the second picture; and generate the i-th second-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the i-th refinement of the second refined motion vector.
  • the updated template may be generated as a
  • the template may have the shape and size of the image prediction block, the image prediction block being a rectangle of a preconfigured size.
  • the number of iterations may be a predefined number and the processing circuitry is, according to an exemplary implementation, further configured to stop the interactive refinement of the motion vectors and block templates before the predefined number is reached if a predefined condition is met, the predefined condition being one or a combination of the following:
  • - a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit and to a block width is below a fifth threshold.
  • - a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit exceeds a sixth threshold.
  • - a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit and to a block height is below a seventh threshold.
  • - a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit exceeds a eight threshold.
  • an apparatus for encoding a video image comprising: the apparatus according to any of the above embodiments and examples for determination of a motion vector for an image prediction block and an image coding circuitry configured to perform video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
  • the predictive coding may include using motion vector to generate predictor for the currently coded prediction block.
  • the predictor is determined as a picture portion pointed to by the determined motion vector corresponding in size and form to the prediction block. Then a difference is formed between the current prediction block and the predictor. The difference is further coded.
  • the further coding may include linear transformation, quantization and entropy coding to generate the bitstream.
  • an apparatus for decoding a video image from a bitstream comprising: a bitstream parser for extracting from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded; and the apparatus according to any of the above embodiments and examples for determination of a motion vectorfor the image prediction block; as well as an image reconstruction circuitry configured to perform image reconstruction of the image prediction block based on the motion vector.
  • the image reconstruction may include adding the differences to the predictor.
  • the predictor may be obtained based on the refined motion vector.
  • a method for determination of a motion vector for an image prediction block including: determining a refinement of an initial motion vector for the image prediction block by template matching with an initial template to generate a refined motion vector; generating an updated template based on the image samples pointed to by the refined motion vector; and determining an updated motion vector for the image prediction block by template matching with the updated template in a search space including a plurality of candidate motion vector positions.
  • the method above may operate iteratively, over i being an integer larger than 1 , repeating the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a refined motion vector obtained in the (i-1 )-th iteration; determining an i-th update of the refined motion vector for the image prediction block by template matching with the i-th update of the template.
  • the method according to an embodiment may include the steps of: determining, for the image prediction block, a refinement of a first initial motion vector pointing to a first picture and a refinement of a second initial motion vector pointing to a second picture by template matching with an initial template to generate a respective first refined motion vector and second refined motion vector; generating an updated template as a function of image samples pointed to by the first refined motion vector and second refined motion vector; and determining, for the image prediction block, a first updated motion vector and a second updated motion vector by template matching with the updated template in the respective first picture and second picture.
  • the method can also iteratively, over i being an integer larger than 1 , repeat the following steps of i-th iteration: generating an i-th update of the template based on image samples pointed to by a first refined motion vector and the second refined motion vector obtained in the (i-1 )-th iteration; and determining an i-th update of the first refined motion vector and the second refined motion vector for the image prediction block by template matching with the i-th update of the template.
  • the method may further include the following steps: obtaining a first initial motion vector and a second initial motion vector; determining, for the image prediction block, a refinement of the first initial motion vector pointing to a first picture by template matching with an initial template in the first picture to generate a respective first refined motion vector; generating an updated template as a function of image samples pointed to by the first refined motion vector and the second initial motion vector; determining, for the image prediction block, a refinement of the second initial motion vector pointing to a second picture by template matching with the updated template in the second picture to generate a second refined motion vector; and generating the updated template as a function of image samples pointed to by the first refined motion vector and the second refined motion vector.
  • the method of an embodiment iteratively, over i being an integer larger than 1 , repeats the following steps of i-th iteration: determining, for the image prediction block, an i-th refinement of the first motion vector pointing to the first picture by template matching with the updated template in the first picture; generating an i-th first-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the (i-1 )-th refinement of the second motion vector; determining, for the image prediction block, an i-th refinement of the second initial motion vector pointing to the second picture by template matching with the i-th first-direction-update of the template in the second picture; and generating the i-th second-direction update of the template as a function of image samples pointed to by the i-th refinement of the first motion vector and the i-th refinement of the second refined motion vector.
  • the updated template is generated as a function of image samples in a block pointed to by the refined motion vector and/or updated motion vector and the function includes a weighted average of the image samples.
  • the template has the shape and size of the image prediction block, the image prediction block being a rectangle of a preconfigured size.
  • the number of iterations is advantageously a predefined number and the processing circuitry is further configured to stop the interactive refinement of the motion vectors and block templates before the predefined number is reached if a predefined condition is met, the predefined condition being one or a combination of the following:
  • - the length of motion vector after iteration i along the x axis exceeds a predetermined third threshold
  • - the length of motion vector after iteration i along the y axis exceeds a predetermined fourth threshold
  • - a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit and to a block width is below a fifth threshold.
  • - a result of adding the length of the motion vector after iteration i along the x axis to the top-left coordinate of a prediction unit exceeds a sixth threshold.
  • - a result of adding the length of the motion vector after iteration i along the y axis to the top-left coordinate of a prediction unit and to a block height is below a seventh threshold.
  • a method for encoding a video image comprising: determining of a motion vector for an image prediction block according to any of the above methods; and performing video image coding of the image prediction block based on predictive coding using the determined motion vector and generating a bitstream including the coded image prediction block.
  • a method for decoding a video image from a bitstream comprising: extracting from the bitstream portions corresponding to a compressed video image including compressed image prediction block to be decoded; determining of a motion vector for the image prediction block according to any of the above the methods; and performing image reconstruction of the image prediction block based on the motion vector.
  • a refined motion vector is determined based on initial motion vector using template matching in a certain search space around the initial motion vector.
  • the template is determined based on the samples pointed to by the initial motion vectors.
  • the template is then updated based on the refined motion vector determined and used to further refine the motion vectors. This may be performed iteratively.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Certains modes de réalisation de la présente invention concernent la détermination et l'affinement de vecteurs de mouvement. En particulier, un vecteur de mouvement affiné est déterminé sur la base d'un vecteur de mouvement initial à l'aide d'une mise en correspondance de modèle dans un certain espace de recherche autour du vecteur de mouvement initial. Le modèle est déterminé sur la base des échantillons pointés par les vecteurs de mouvement initiaux. Le modèle est ensuite mis à jour sur la base du vecteur de mouvement affiné déterminé et utilisé pour affiner davantage les vecteurs de mouvement. Cette méthode peut être appliquée de manière itérative. Les espaces de recherche dans différentes itérations à l'aide d'un affinement de vecteur de mouvement dans la même image de référence mais avec un chevauchement de modèle différent dans au moins une position.
PCT/EP2018/057892 2017-10-09 2018-03-28 Espace de recherche à chevauchement pour affinement de vecteur de mouvement bi-prédictif Ceased WO2019072422A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EPPCT/EP2017/075715 2017-10-09
PCT/EP2017/075715 WO2019072373A1 (fr) 2017-10-09 2017-10-09 Mise à jour de modèles pour raffinement de vecteurs de mouvement

Publications (1)

Publication Number Publication Date
WO2019072422A1 true WO2019072422A1 (fr) 2019-04-18

Family

ID=60043213

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2017/075715 Ceased WO2019072373A1 (fr) 2017-10-09 2017-10-09 Mise à jour de modèles pour raffinement de vecteurs de mouvement
PCT/EP2018/057892 Ceased WO2019072422A1 (fr) 2017-10-09 2018-03-28 Espace de recherche à chevauchement pour affinement de vecteur de mouvement bi-prédictif

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/075715 Ceased WO2019072373A1 (fr) 2017-10-09 2017-10-09 Mise à jour de modèles pour raffinement de vecteurs de mouvement

Country Status (2)

Country Link
EP (1) EP3685583A1 (fr)
WO (2) WO2019072373A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020257787A1 (fr) * 2019-06-21 2020-12-24 Beijing Dajia Internet Information Technology Co., Ltd. Procédés et dispositifs de mise à l'échelle résiduelle dépendant d'une prédiction destinés à un codage vidéo
CN112449193A (zh) * 2019-09-03 2021-03-05 腾讯美国有限责任公司 对视频数据进行编解码的方法、装置、计算机设备和存储介质
US11146810B2 (en) 2018-11-27 2021-10-12 Qualcomm Incorporated Decoder-side motion vector refinement
KR20220064950A (ko) * 2019-09-24 2022-05-19 광동 오포 모바일 텔레커뮤니케이션즈 코포레이션 리미티드 프레임 간 예측 방법 및 장치, 기기, 저장 매체

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019191717A1 (fr) 2018-03-30 2019-10-03 Hulu, LLC Bi-prédiction à modèle affiné pour codage vidéo
US11252431B2 (en) 2019-01-02 2022-02-15 Telefonaktiebolaget Lm Ericsson (Publ) Side motion refinement in video encoding/decoding systems
CN115941970B (zh) 2019-06-17 2024-02-20 北京达佳互联信息技术有限公司 用于视频编解码中的解码器侧运动矢量细化的方法和装置
CN114051732A (zh) * 2019-07-27 2022-02-15 北京达佳互联信息技术有限公司 用于视频编解码中的解码器侧运动矢量细化的方法和装置
US11936877B2 (en) * 2021-04-12 2024-03-19 Qualcomm Incorporated Template matching based affine prediction for video coding
CN114666606A (zh) * 2022-02-07 2022-06-24 杭州未名信科科技有限公司 一种仿射运动估计方法、装置、存储介质及终端
CN121264047A (zh) * 2023-06-07 2026-01-02 抖音视界有限公司 用于视频处理的方法、装置和介质
CN116612157A (zh) * 2023-07-21 2023-08-18 云南大学 视频单目标跟踪方法、装置及电子设备
WO2025056412A1 (fr) * 2023-09-12 2025-03-20 Interdigital Ce Patent Holdings, Sas Procédés et appareils de correction de mouvement côté décodeur pour la prédiction d'attributs de nuage de points

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011071514A2 (fr) * 2009-12-08 2011-06-16 Thomson Licensing Procédés et appareil pour la mise à jour résiduelle adaptative de la prédiction d'appariement par forme de référence dans le codage et le décodage de vidéos
US20160286230A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Motion information derivation mode determination in video coding
WO2017036414A1 (fr) * 2015-09-02 2017-03-09 Mediatek Inc. Procédé et appareil de dérivation de mouvement sur le côté décodeur pour un codage vidéo

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011071514A2 (fr) * 2009-12-08 2011-06-16 Thomson Licensing Procédés et appareil pour la mise à jour résiduelle adaptative de la prédiction d'appariement par forme de référence dans le codage et le décodage de vidéos
US20160286230A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Motion information derivation mode determination in video coding
WO2017036414A1 (fr) * 2015-09-02 2017-03-09 Mediatek Inc. Procédé et appareil de dérivation de mouvement sur le côté décodeur pour un codage vidéo

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHEN J ET AL: "Algorithm description of Joint Exploration Test Model 7 (JEM7)", 7. JVET MEETING; 13-7-2017 - 21-7-2017; TORINO; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JVET/,, no. JVET-G1001, 19 August 2017 (2017-08-19), XP030150980 *
CHIU YI-JEN ET AL: "Decoder-side Motion Estimation and Wiener filter for HEVC", 2013 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), IEEE, 17 November 2013 (2013-11-17), pages 1 - 6, XP032543658, DOI: 10.1109/VCIP.2013.6706446 *
LIN Y ET AL: "Enhanced Template Matching in FRUC Mode", 5. JVET MEETING; 12-1-2017 - 20-1-2017; GENEVA; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JVET/,, no. JVET-E0035-v2, 11 January 2017 (2017-01-11), XP030150503 *
MING LI ET AL: "Rate-Distortion Criterion Based Picture Padding for Arbitrary Resolution Video Coding Using H.264/MPEG-4 AVC", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, USA, vol. 20, no. 9, 1 September 2010 (2010-09-01), pages 1233 - 1241, XP011315559, ISSN: 1051-8215 *
STEFFEN KAMP ET AL: "Decoder-Side Motion Vector Derivation for Block-Based Video Coding", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, USA, vol. 22, no. 12, 1 December 2012 (2012-12-01), pages 1732 - 1745, XP011487149, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2012.2221528 *
X. CHEN; J. AN; J. ZHENG, JVET-D0029: DECODER-SIDE MOTION VECTOR REFINEMENT BASED ON BILATERAL TEMPLATE MATCHING, Retrieved from the Internet <URL:http://phenix.itsudparis.eu/ivet/>
Y-JEN CHIU ET AL: "TE1: Fast techniques to improve self derivation of motion estimation", 2. JCT-VC MEETING; 21-7-2010 - 28-7-2010; GENEVA; (JOINT COLLABORATIVETEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL:HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/,, no. JCTVC-B047, 28 July 2010 (2010-07-28), XP030007627 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11146810B2 (en) 2018-11-27 2021-10-12 Qualcomm Incorporated Decoder-side motion vector refinement
WO2020257787A1 (fr) * 2019-06-21 2020-12-24 Beijing Dajia Internet Information Technology Co., Ltd. Procédés et dispositifs de mise à l'échelle résiduelle dépendant d'une prédiction destinés à un codage vidéo
US11979575B2 (en) 2019-06-21 2024-05-07 Beijing Dajia Internet Information Technology Co., Ltd. Methods and devices for prediction dependent residual scaling for video coding
CN112449193A (zh) * 2019-09-03 2021-03-05 腾讯美国有限责任公司 对视频数据进行编解码的方法、装置、计算机设备和存储介质
KR20220064950A (ko) * 2019-09-24 2022-05-19 광동 오포 모바일 텔레커뮤니케이션즈 코포레이션 리미티드 프레임 간 예측 방법 및 장치, 기기, 저장 매체
US12284382B2 (en) 2019-09-24 2025-04-22 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for inter prediction method, video picture encoder and decoder
KR102895854B1 (ko) * 2019-09-24 2025-12-03 광동 오포 모바일 텔레커뮤니케이션즈 코포레이션 리미티드 프레임 간 예측 방법 및 장치, 기기, 저장 매체

Also Published As

Publication number Publication date
WO2019072373A1 (fr) 2019-04-18
EP3685583A1 (fr) 2020-07-29

Similar Documents

Publication Publication Date Title
US12200246B2 (en) Motion vector refinement for multi-reference prediction
CA3104570C (fr) Fenetre d&#39;acces a la memoire et remplissage d&#39;affinement de vecteur de mouvement et de compensation de mouvement
US12526444B2 (en) Limited memory access window for motion vector refinement
WO2019072422A1 (fr) Espace de recherche à chevauchement pour affinement de vecteur de mouvement bi-prédictif
US11153595B2 (en) Memory access window and padding for motion vector refinement
US11159820B2 (en) Motion vector refinement of a motion vector pointing to a fractional sample position
WO2019072371A1 (fr) Fenêtre d&#39;accès à la mémoire pour calcul de vecteur de mouvement de sous-bloc de prédiction
WO2019072369A1 (fr) Élagage de liste de vecteur de mouvement
NZ760521B2 (en) Motion vector refinement for multi-reference prediction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18713254

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18713254

Country of ref document: EP

Kind code of ref document: A1