WO2019077197A1 - Procédé, appareil et produit programme d'ordinateur destinés au codage et au décodage vidéo - Google Patents
Procédé, appareil et produit programme d'ordinateur destinés au codage et au décodage vidéo Download PDFInfo
- Publication number
- WO2019077197A1 WO2019077197A1 PCT/FI2018/050724 FI2018050724W WO2019077197A1 WO 2019077197 A1 WO2019077197 A1 WO 2019077197A1 FI 2018050724 W FI2018050724 W FI 2018050724W WO 2019077197 A1 WO2019077197 A1 WO 2019077197A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- motion vector
- block
- model
- prediction
- motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/56—Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
Definitions
- the present solution generally relates to video encoding and decoding. Background This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that
- a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
- the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
- a method for video coding comprising obtaining motion information comprising at least one motion vector and a location of at least one neighboring block of a video frame; determining parameters of a model using the obtained motion information; and determining a predicted motion vector using the model and a location of a current block.
- an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following obtain motion information comprising a motion vector and locations of neighboring blocks of a video frame; determine parameters of a model using the obtained motion information; and determine a predicted motion vector using the model and a location of a current block.
- a computer program product embodied on a non-transitory computer readable medium, comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to obtain motion information comprising a motion vector and locations of neighboring blocks of a video frame; determine parameters of a model using the obtained motion information; and determine a predicted motion vector using the model and a location of a current block.
- the model is a function that relates the motion vector of a block to the location of that block in the frame with a weight and an offset matrix.
- the parameters comprise coefficients of the weight and the offset matrix of said model.
- the predicted motion vector is added to a candidate list. According to an embodiment, the predicted motion vector is used as an initial search point for motion estimation.
- the candidate list is an advanced motion vector predictor list.
- the candidate list is a merge list.
- the neighboring blocks are on a same layer than the current block.
- the at least some of the neighboring blocks are on a different layer(s) or view(s)than the current block.
- the method is executed at an encoder and/or in a decoder.
- Fig. 1 shows an encoder according to an embodiment
- Fig. 2 shows a decoder according to an embodiment
- Fig. 3a,3b show examples of motion vector candidate positions
- Fig. 4 is a flowchart illustrating a method according to an embodiment
- Fig. 5 shows an example of a block to be encoded/decoded
- Fig. 6 shows an example of locally adaptive motion vector prediction for multilayer (scalable) prediction
- Fig. 7 shows an example of locally adaptive motion vector prediction for multiview prediction
- Fig. 8 shows an apparatus according to an embodiment in a simplified block chart
- Fig. 9 shows a layout of an apparatus according to an embodiment.
- hybrid video codecs including H.264/AVC and HEVC, encode video information in two phases.
- predictive coding is applied for example as so-called sample prediction and/or so-called syntax prediction.
- syntax prediction is coding the error between the predicted block of pixels or samples and the original block of pixels or samples.
- pixel or sample values in a certain picture area or "block" are predicted. These pixel or sample values can be predicted, for example, using one or more of the following ways:
- Motion compensation mechanisms (which may also be referred to as temporal prediction or motion compensated temporal prediction or motion-compensated prediction or MCP), which involve finding and indicating an area in one of the previously encoded video frames that corresponds closely to the block being coded;
- syntax prediction relating to the first phase which may also be referred to as parameter prediction
- syntax elements and/or syntax element values and/or variable derived from syntax elements are predicted from syntax elements (de)coded earlier and/or variables derived earlier.
- motion vectors are coded e.g. for inter and/or interview prediction.
- the block partitioning e.g. from CTU (Coding Tree Unit) to CUs (Coding Unit) and down to PUs (Prediction Unit), may be predicted.
- CTU Coding Tree Unit
- CUs Coding Unit
- Prediction Unit Prediction Unit
- the filtering parameters e.g. for sample adaptive offset may be predicted.
- a video codec comprises an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form.
- An image codec or a picture codec is similar to a video codec, but it encodes each input picture independently from other input pictures and decodes each coded picture independently from other coded pictures. It needs to be understood that whenever a video codec, video encoding or encoder, or video decoder or decoding is referred below, the text similarly applies to an image codec, image encoding or encoder, or image decoder or decoding, respectively.
- Prediction approaches using image information from a previously coded image can also be called as inter prediction methods which may also be referred to as temporal prediction and motion compensation.
- Prediction approaches using image information within the same image can also be called as intra prediction methods.
- the second phase is one of coding the error between the predicted block of pixels or samples and the original block of pixels or samples.
- This may be accomplished by transforming the difference in pixel or sample values using a specified transform.
- This transform may be e.g. a Discrete Cosine Transform (DCT) or a variant thereof.
- DCT Discrete Cosine Transform
- the transformed difference may be quantized and entropy coded.
- the encoder can control the balance between the accuracy of the pixel or sample representation (i.e. the visual quality of the picture) and the size of the resulting encoded video representation (i.e. the file size or transmission bit rate).
- the decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a predicted representation of the pixel or sample blocks (using the motion or spatial information created by the encoder and included in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized prediction error signal in the spatial domain).
- the decoder After applying pixel or sample prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the pixel or sample values) to form the output video frame.
- the decoder may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming pictures in the video sequence.
- Figure 1 illustrates an image to be encoded (l n ); a predicted representation of an image block (P' n ); a prediction error signal (D n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (l' n ); a final reconstructed image (R' n ); a transform (T) and inverse transform (T _1 ); a quantization (Q) and inverse quantization (Cr 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).
- An example of a decoding process is illustrated in Figure 2.
- Figure 2 illustrates a predicted representation of an image block (P'n); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (l' n ); a final reconstructed image (R' n ); an inverse transform (T ⁇ 1 ); an inverse quantization (Cr 1 ); an entropy decoding (E ⁇ 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
- a picture given as an input to an encoder may also referred to as a source picture, and a picture decoded by a decoded may be referred to as a decoded picture.
- the source and decoded pictures are each comprised of one or more sample arrays, such as one of the following sets of sample arrays:
- RGB Green, Blue and Red
- Term pixel may refer to the set of spatially collocating samples of the sample arrays of the color components. Sometimes, depending on the context, term pixel may refer to a sample of one sample array only.
- a picture may either be a frame or a field, while in some coding systems a picture may be constrained to be a frame.
- a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
- a field is a set of alternate sample rows of a frame and may be used as encoder input, when the source signal is interlaced.
- motion information is indicated by motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder), and the prediction sources block in one of the previously coded or decoded images (or picture).
- H.264/AVC and HEVC divide a picture into a mesh of rectangles, for each of which a similar block in one of the reference pictures is indicated for inter prediction.
- the location of the prediction block is coded as a motion vector that indicates the position of the prediction block relative to the block being coded.
- H.264/AVC and HEVC include a concept of picture order count (POC).
- a value of POC is derived for each picture and is non-decreasing with increasing picture position in output order. POC therefore indicates the output order or pictures.
- POC may be used in the decoding process, for example for implicit scaling of motion vectors in the temporal direct mode, for implicitly derived weights in weighted prediction, and for reference picture list initialization. Furthernnore, POC may be used in the verification of output order conformance.
- Inter prediction process may use one or more of the following factors.
- motion vectors may be of quarter-pixel accuracy, and sample values in fractional-pixel positions may be obtained using a finite impulse response (FIR) filter.
- FIR finite impulse response
- the accuracy of motion vector, motion vector prediction and motion vector difference may be different for each block, and they may vary for different blocks.
- H.264/AVC and HEVC allow selection of the size and shape of the block for which a motion vector is applied for motion-compensated prediction in the encoder, and indicating the selected size and shape in the bitstream so that decoders can reproduce the motion-compensated prediction done in the encoder.
- reference pictures for inter prediction.
- the sources of inter prediction are previously decoded pictures.
- Many coding standards including H.264/AVC and HEVC, enable storage of multiple reference pictures for inter prediction and selection of the used reference picture on a block basis. For example, reference pictures may be selected on macroblock or macroblock partition basis in
- H.264/AVC and on PU or CU basis in HEVC.
- Many coding standards such as H.264/AVC and HEVC, include syntax structures in the bitstream that enable decoders to create one or more reference picture lists.
- a reference picture index to a reference picture list may be used to indicate which one of the multiple reference pictures is used for inter prediction for a particular block.
- a reference picture index may be coded by an encoder into the bitstream in some inter coding modes or it may be derived (by an encoder and a decoder) for example using neighboring blocks in some other inter coding modes.
- motion vectors may be coded differentially with respect to a block- specific predicted motion vector.
- the predicted motion vectors are created in a predefined way, for example by calculating the media of the encoded or decoded motion vectors of the adjacent blocks.
- Another way to create motion vectors prediction is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures, and co-located block in the other layers or views, and signaling the chosen candidate as the motion vector predictors.
- the motion vector candidate positions (MVP) are as shown by means of an example in Figures 3a-b.
- Figures 3a-b black dots indicate sample positions directly adjacent to a block X defining positions of possible MVPs.
- Figure 3a illustrates spatial MVPs positions
- Figure 3b illustrates temporal MVPs (TMVP) positions, where Y is a collocated block of X in a reference picture. Positions CO and C1 are candidates for the TMVP.
- the reference index values can be predicted or obtained from previously coded/decoded blocks and pictures. The reference index may be predicted e.g. from adjacent blocks and/or co-located blocks in temporal reference picture. Differential coding of motion vectors may be disabled across slice or tile boundaries.
- all the motion field information which includes motion vector(s) and corresponding reference picture index(es) for each available reference picture list, is predicted and used without any modification/correction.
- this is performed by generating a merge list which includes some candidates from the motion information of the neighboring blocks. Then the index to the proper candidate is indicated in the bitstream.
- Each candidate in the merge list may include the prediction type (uni- prediction or bi-prediction), reference indices and motion vectors.
- H.264/AVC and HEVC enable the use of a single prediction block in P slices (herein referred to as uni- predictive slices) or a linear combination of two motion-compensated prediction blocks for bi-predictive slices, which are also referred to as B slices.
- Overlapped block motion Compensation (OBMC) method may use prediction from more reference frames.
- Individual blocks in B slices may be bi-predicted, uni- predicted, or intra-predicted, and individual blocks in P slices may be uni- predicted or intra-predicted.
- the reference pictures for a bi-predictive picture may not be limited to be the subsequent picture and the previous picture in output order, but rather any reference pictures may be used.
- reference picture list 0 In many coding standards, such as H.264/AVC and HEVC, one reference picture list, referred to as reference picture list 0, is constructed for P slices, and two reference picture lists, list 0 and list 1 , are constructed for B slices.
- B slices when prediction in forward direction may refer to prediction from a reference picture in reference picture list 0, and prediction in backward direction may refer to prediction from a reference picture in reference picture list 1 , even though the reference pictures for prediction may have any decoding or output order relation to each other or to the current picture.
- H.264/AVC allows weighted prediction for both P and B slices.
- the weights are proportional to weighted prediction for both P and B slices.
- the weights are proportional to picture order counts (POC), while in explicit weighted prediction, prediction weights are explicitly indicated.
- the motion vectors may be coded using the spatial/temporal candidates. However, this is not found efficient in the some types of video content because of the following:
- the video may include object/scene deformations, due to image format, and different sampling in various regions when representing 360-degree videos on 2D representation (e.g. equirectangular, cubemap projection).
- the video content may have zooming or rotation, caused by an object or a camera.
- the video content may be deformed due to the characteristics of the capturing device (e.g. fisheye lenses).
- the characteristics of the capturing device e.g. fisheye lenses.
- magnitude and direction of motion vectors change gradually by the location of the block.
- intra prediction may be chosen by the encoder instead of inter prediction, which results higher bitrate compared to inter prediction.
- the problem has been treated by using motion vectors from neighboring blocks for predicting the motion vector of the current block using linear function.
- Parameters of linear function are fixed, and are calculated using a training process on several training test sequences.
- such solution considers only the motion vector values for the modeling process with some fixed parameters and hence is not able to achieve a proper model for the motion.
- the problem is solved by modelling the motion vector locally based on the location (e.g. a center point or top-left corner) of the blocks in regions of a video frame.
- the parameters of the model are calculated based on the motion vectors and locations of at least one neighboring block comprising a set of motion vectors. Then a predicted motion vector is calculated using the model and the location of the current block.
- This predicted motion vector is added to a candidate list, which can be an advanced motion vector prediction list or merge list, as one of the candidates which can be selected by the encoder.
- the process may be executed at an encoder side or a decoder side when generating the candidate list.
- a method according to an embodiment is shown as a flowchart in Figure 4. The method comprises
- the method illustrated in Figure 4 can be executed in an encoder or a decoder. When the method is executed at the encoder, the method further comprises adding the predicted motion vector to a candidate list 440.
- the method of Figure 4 may be applied to a single layer coding, where all the neighboring blocks are on a same layer, or to a multilayer or multiview coding, where neighboring blocks are on the same layer/view, but may also be on other layers/views.
- the model referred in the method of flowchart of Figure 4 may be defined as follows. It is assumed that the motion information of different blocks and/or regions of a video frame is modelled with a specific function. This function relates a motion vector (x and y components) of a block to its location (e.g. x and y coordinates of the center of the block). Different models can be used for this purpose. In one sense, the model can be a cross component, i.e. x and y component of motion vector are related to both x and y coordinates of block location. Alternatively, the model can be limited to each component, i.e.
- x and y components of motion vectors are only related to x and y coordinates of the block location, respectively, or vice versa (i.e., x to y, and y to x).
- the model can be linear, polynomial, or any other general function.
- the model can be used to model the motion only in small region of the video, so linear model should efficiently work for this purpose.
- MVx and MVy are x and y components of motion vector of a block
- X and Y are the location (e.g. center coordinates) of the block
- f(.) can be any function like sinusoidal, exponential or logarithmic functions
- X0, Y0, MVOx, MVOy are fixed values that, for example, may be calculated based on the motion vectors and locations of neighboring blocks
- a, b, c are parameters that are obtained based on training process using neighboring blocks.
- the function relates the motion vector of a block to the location of that block in the frame with a weight and an offset matrix.
- the parameters used in the model comprise coefficients of the weight and the offset matrix of the model. It is appreciated that the models presented below are examples, and any other model can be used instead.
- Each model has several parameters that are calculated for each block using the motion information (i.e. a set of motion vectors from each block and location of each block) of at least one neighboring blocks.
- a neighboring block can be of the following: a block on a top of the current block, a block on left of the current block, a block on top-left of the current block, a block on bottom-left of the current block, a block on top-right of the current block.
- motion vectors and location (e.g. center point) of said at least one neighboring block are collected and a training process (e.g. linear regression, polynomial regression, logical regression, RANSAC (Random Sample Consensus) etc.) may be used to calculate the parameters.
- a training process e.g. linear regression, polynomial regression, logical regression, RANSAC (Random Sample Consensus) etc.
- Neighboring region relates to blocks locating at certain distance from the current block, wherein the distance may be defined by two or more blocks in a certain direction, e.g. up or left.
- the size of the neighboring region may be selected based on the size of the current block. For example, the blocks in the top, left, top-left, bottom-left and top-right regions of the current block may be used to train the model.
- the size of the neighboring block may be considered in training process. For example, the information of the larger blocks may have more influence on the model's parameters. In HEVC standard, for example, motion information may be stored in 4x4 block accuracy.
- the motion vector of a block when it is larger than 4x4, may be considered several times in the training process.
- the influence of the motion vector of each block in model extraction becomes proportional to the size of that block.
- the parameters of the model are calculated based on the motion vectors and locations of at least one neighboring block comprising a set of motion vectors. The more there are neighboring blocks, the more accurate the predicted motion vector will be.
- Each model that is used in the present solution has several parameters that should be calculated based on the neighboring block(s). If there are enough (i.e. more than number of model's parameters) motion vectors in the neighboring blocks, a fitting can be performed, e.g. by minimizing the mean square error. On the other hand, if there is exactly the same number of motion vectors in the neighboring blocks as the number of the parameters, "a system of equations" can be build, and the exact value of parameters can be calculated.
- each motion vector has two components, including x and y directions, which is counted as two values for training. So, for example, if a neighboring block is coded in uni-prediction mode, it has one motion vector, hence two values for the training, and if a neighboring block is coded in bi-prediction mode, it has two motion vectors, hence four values for the training process. Other information of the neighboring block (e.g. if it is coded using adaptive MVP mode or not, affine parameters) may be used in the training process.
- the calculated motion vector may be quantize to the precision of motion vector or motion vector difference. As another way, the predicted motion vector may be kept in higher precision.
- some of the neighboring blocks may be related to different object than the object to which the current block belongs to. Therefore the blocks whose motion information are not in harmony with the current block or other neighboring blocks can be removed. This outlier removal can be done for example by running the training process twice. Instead, other simple methods can be used to eliminate outliers, for example by classifying the neighboring blocks into, for example, two classes, and for each one, to extract a separate model and motion vector predictor.
- each P-frame may have several reference frames, so each block may be predicted from different reference frames. Because of the motion in temporal domain, a block may have different motion vectors for different reference frames. So this issue needs to be carefully considered in training phase, and in motion vector prediction.
- AMVP list motion vector of a block is predicted for a given reference frame (with specific POC).
- merge list motion vector may be calculated for one reference frame (like uni-prediction) or for two/more reference frames (e.g. bi-predicted or OBMC).
- the neighboring blocks are predicted from different reference frame(s) than the current reference frame, their motion information should be either eliminated from the training process, or should be scaled according to the POC numbers.
- priority may be given to one of the reference frames which is mostly used in the neighboring blocks.
- the proposed process may be executed twice, one for each reference frame. And in each execution, the training process may be executed using the related neighboring motion information.
- the motion vector may be calculated by scaling the motion vectors using POC values of the current frame and reference frames. This approach can reduce the computational complexity of the method.
- neighboring motion vectors come from the same layer and/or view. It is, however, also possible, particularly in a case of multilayer coding that neighboring motion vectors may come from other layers and views as well. For example, motion information of the blocks in the right and bottom side of the current block which are not coded yet, or the blocks that are coded in Intra mode, may come from previous layer(s) (after proper scaling) or other view(s). In such a case, two candidates may be added to the candidate list. One of the candidates may be calculated based on the neighboring blocks which are predicted from the same reference frame, and the other one may be calculated from the neighboring blocks of the co-located block in the other layer(s) or view(s).
- Figures 6 and 7 illustrate the motion vector information collecting in case of multilayer (scalable) and multiview prediction, respectively.
- the additional motion vector information that are collected from other layer(s) or view(s) are not limited to the illustrated ones, but can include any candidates that are not available in the current frame.
- the reference layer 610 could be in the same or different resolution as the current layer 600.
- the quality can be different to the current layer 600 video.
- two step scaling could be applied as following: scaling the motion vectors according the POC number and scaling the motion vectors according the resolution of the reference layer 610.
- the method - when executed at an encoder - comprises adding the predicted motion information to a candidate list, e.g. to an advanced motion vector prediction list (AMVP) or to a merge list.
- Adding the motion vector to the list and its location in the list can be controlled by, for example, picture/frame/slice level flag. This flag, for example, can be disabled for videos with less motion activity or based on the video type (e.g. non 360-degree videos).
- the proposed motion vector candidate may be added to merge or AMVP list, at the beginning, middle or end of the list. It may be added to list by increasing the number of candidates, or may replace one of the candidates by fixing the length of the list. Based on some conditions (e.g. when neighboring blocks has very diverse motion vectors, or model parameters cannot be calculated), the proposed one may not be added to the list.
- the solution presented in this description can be executed at an encoder or a decoder.
- the proposed motion vector may be used as the initial point for motion search.
- An apparatus comprises means for obtaining motion information comprising a motion vector and locations of neighboring blocks of a video frame; means for determining parameters of a model using the obtained motion information; and means for determining a predicted motion vector using the model and a location of a current block
- the apparatus further comprises means for adding the predicted motion vector to a candidate list.
- These means comprises at least one processor, and a memory including computer program code comprising one or more operational characteristics.
- FIG. 8 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an electronic device 50, which may incorporate a codec. In some embodiments the electronic device may comprise an encoder or a decoder.
- Figure 9 shows a layout of an apparatus according to an embodiment.
- the electronic device 50 may for example be a mobile terminal or a user equipment of a wireless communication system or a camera device.
- the electronic device 50 may be also comprised at a local or a remote server or a graphics processing unit of a computer.
- the device may be also comprised as part of a head-mounted display device.
- the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
- the apparatus 50 may further comprise a display 32 in the form of a liquid crystal display.
- the display may be any suitable display technology suitable to display an image 30 or video.
- the apparatus 50 may further comprise a keypad 34.
- any suitable data or user interface mechanism may be employed.
- the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
- the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
- the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
- the apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
- the apparatus may further comprise a camera 42 capable of recording or capturing images and/or video.
- the camera 42 is a multi-lens camera system having at least two camera sensors.
- the camera is capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing.
- the apparatus may receive the video and/or image data for processing from another device prior to transmission and/or storage.
- the apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices.
- the apparatus may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB (Universal Serial Bus)/firewire wired connection.
- the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
- the apparatus or the controller 56 may comprise one or more processors or processor circuitry and be connected to memory 58 which may store data in the form of image, video and/or audio data, and/or may also store instructions for implementation on the controller 56 or to be executed by the processors or the processor circuitry.
- the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of image, video and/or audio data or assisting in coding and decoding carried out by the controller.
- the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC (Universal Integrated Circuit Card) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
- the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
- the 30 apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
- the apparatus may comprise one or more wired interfaces configured to transmit and/or receive data over a wired connection, for example an electrical cable or an optical fiber connection.
- wired interface may be configured to operate according to one or more digital display interface standards, such as for example High-Definition Multimedia Interface (HDMI), Mobile High-definition Link (MHL), or Digital Visual Interface (DVI).
- HDMI High-Definition Multimedia Interface
- MHL Mobile High-definition Link
- DVI Digital Visual Interface
- the various embodiments may provide advantages. For example, changes are local, with minimum changes in adding new candidate to the AMVP/Merge lists, and no change in the bitstream syntax.
- the model is generic and locally adaptive. It can model different changes in motion or deformation in objects in different ways. For example, it can support zooming in and out, rotation, and object deformation for example in 360- degree video formats.
- the various embodiments of the invention can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the invention.
- a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
- a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
L'invention concerne un procédé de codage vidéo, comprenant l'obtention d'informations de mouvement comprenant au moins un vecteur de mouvement et un emplacement d'au moins un bloc voisin d'une trame vidéo (410), la détermination des paramètres d'un modèle à l'aide des informations obtenues de mouvement (420), la détermination d'un vecteur de mouvement prédit à l'aide du modèle et d'un emplacement d'un bloc actuel (430) et l'ajout du vecteur de mouvement prédit à une liste de candidats (440). La présente invention concerne également un appareil et un produit programme d'ordinateur.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FI20175906 | 2017-10-16 | ||
| FI20175906 | 2017-10-16 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019077197A1 true WO2019077197A1 (fr) | 2019-04-25 |
Family
ID=66174325
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/FI2018/050724 Ceased WO2019077197A1 (fr) | 2017-10-16 | 2018-10-09 | Procédé, appareil et produit programme d'ordinateur destinés au codage et au décodage vidéo |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2019077197A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112738521A (zh) * | 2020-12-03 | 2021-04-30 | 深圳万兴软件有限公司 | 视频编码方法和装置、电子设备、存储介质 |
| WO2021120122A1 (fr) * | 2019-12-19 | 2021-06-24 | Oppo广东移动通信有限公司 | Procédé de prédiction de composante d'image, codeur, décodeur et support de mémoire |
| CN113966616A (zh) * | 2019-06-04 | 2022-01-21 | 北京字节跳动网络技术有限公司 | 使用临近块信息的运动候选列表构建 |
| US12328432B2 (en) | 2020-03-07 | 2025-06-10 | Beijing Bytedance Network Technology Co., Ltd. | Implicit multiple transform set signaling in video coding |
| US12495141B2 (en) | 2019-07-14 | 2025-12-09 | Beijing Bytedance Network Technology Co., Ltd. | Transform block size restriction in video coding |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130163668A1 (en) * | 2011-12-22 | 2013-06-27 | Qualcomm Incorporated | Performing motion vector prediction for video coding |
| US20140010306A1 (en) * | 2012-07-04 | 2014-01-09 | Thomson Licensing | Method for coding and decoding a block of pixels from a motion model |
| US20140354771A1 (en) * | 2013-05-29 | 2014-12-04 | Ati Technologies Ulc | Efficient motion estimation for 3d stereo video encoding |
-
2018
- 2018-10-09 WO PCT/FI2018/050724 patent/WO2019077197A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130163668A1 (en) * | 2011-12-22 | 2013-06-27 | Qualcomm Incorporated | Performing motion vector prediction for video coding |
| US20140010306A1 (en) * | 2012-07-04 | 2014-01-09 | Thomson Licensing | Method for coding and decoding a block of pixels from a motion model |
| US20140354771A1 (en) * | 2013-05-29 | 2014-12-04 | Ati Technologies Ulc | Efficient motion estimation for 3d stereo video encoding |
Non-Patent Citations (5)
| Title |
|---|
| LI, LI ET AL.: "An Efficient Four-Parameter Affine Motion Model for Video Coding", ARXIV.ORG, 21 February 2017 (2017-02-21), XP080747890, Retrieved from the Internet <URL:https://arxiv.org/abs/1702.06297> [retrieved on 20190201] * |
| PARKER, SARAH ET AL.: "Global and locally adaptive warped motion compensation in video compression", 2017 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 19 September 2017 (2017-09-19), pages 275 - 279, XP033322582, ISSN: 2381-8549, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/document/8296286> [retrieved on 20190201] * |
| SIXIN, LIN ET AL.: "Affine transform prediction for next generation video coding", ISO/IEC JTC1/SC29/WG11 MPEG2015/M37525. MPEG DOCUMENT MANAGEMENT SYSTEM, 26 October 2015 (2015-10-26), XP030065892, Retrieved from the Internet <URL:http://phenix.it-sudparis.eu/mpeg> [retrieved on 20190201] * |
| SULLIVAN, GARY J. ET AL.: "Overview of the High Efficiency Video Coding (HEVC) Standard", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 22, no. 12, 28 September 2012 (2012-09-28), pages 1649 - 1668, XP011486324, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/document/6316136> [retrieved on 20190201] * |
| YUAN, HUI ET AL.: "Affine Model Based Motion Compensation Prediction for Zoom", IEEE TRANSACTIONS ON MULTIMEDIA, vol. 14, no. 4, 8 March 2012 (2012-03-08), pages 1370 - 1375, XP011452912, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/document/6166365> [retrieved on 20190201] * |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113966616A (zh) * | 2019-06-04 | 2022-01-21 | 北京字节跳动网络技术有限公司 | 使用临近块信息的运动候选列表构建 |
| CN113966616B (zh) * | 2019-06-04 | 2023-11-14 | 北京字节跳动网络技术有限公司 | 使用临近块信息的运动候选列表构建 |
| US12120314B2 (en) | 2019-06-04 | 2024-10-15 | Beijing Bytedance Network Technology Co., Ltd. | Motion candidate list construction using neighboring block information |
| US12495141B2 (en) | 2019-07-14 | 2025-12-09 | Beijing Bytedance Network Technology Co., Ltd. | Transform block size restriction in video coding |
| WO2021120122A1 (fr) * | 2019-12-19 | 2021-06-24 | Oppo广东移动通信有限公司 | Procédé de prédiction de composante d'image, codeur, décodeur et support de mémoire |
| US11477465B2 (en) | 2019-12-19 | 2022-10-18 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Colour component prediction method, encoder, decoder, and storage medium |
| US11770542B2 (en) | 2019-12-19 | 2023-09-26 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Colour component prediction method, encoder, and decoder |
| US12328432B2 (en) | 2020-03-07 | 2025-06-10 | Beijing Bytedance Network Technology Co., Ltd. | Implicit multiple transform set signaling in video coding |
| CN112738521A (zh) * | 2020-12-03 | 2021-04-30 | 深圳万兴软件有限公司 | 视频编码方法和装置、电子设备、存储介质 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7596457B2 (ja) | 映像コーディングシステムにおけるインター予測による映像デコーディング方法及び装置 | |
| JP7331095B2 (ja) | 補間フィルタトレーニング方法及び装置、ビデオピクチャエンコーディング及びデコーディング方法、並びに、エンコーダ及びデコーダ | |
| KR102873107B1 (ko) | 영상 코딩 시스템에서 인터 예측 방법 및 장치 | |
| JP7279154B2 (ja) | アフィン動きモデルに基づく動きベクトル予測方法および装置 | |
| JP7738145B2 (ja) | 画像コーディングシステムにおけるサブブロック単位の動き予測に基づく画像デコーディング方法及び装置 | |
| US12212777B2 (en) | Optical flow based video inter prediction | |
| WO2019077197A1 (fr) | Procédé, appareil et produit programme d'ordinateur destinés au codage et au décodage vidéo | |
| CN114375573B (zh) | 使用合并候选推导预测样本的图像解码方法及其装置 | |
| US12015762B2 (en) | DMVR using decimated prediction block | |
| CN114208171B (zh) | 用于推导用于生成预测样本的权重索引信息的图像解码方法和装置 | |
| CN111107373A (zh) | 基于仿射预测模式的帧间预测的方法及相关装置 | |
| US11729424B2 (en) | Visual quality assessment-based affine transformation | |
| CN119922328A (zh) | 解码和编码设备以及数据存储和发送设备 | |
| US11949874B2 (en) | Image encoding/decoding method and device for performing prof, and method for transmitting bitstream | |
| JP2024125405A (ja) | Bdofを行う画像符号化/復号化方法、装置、及びビットストリームを伝送する方法 | |
| US12388977B2 (en) | Affine models use in affine bilateral matching | |
| WO2017093604A1 (fr) | Procédé, appareil et produit-programme informatique destinés au codage et au décodage vidéo | |
| CN118339828A (zh) | 用于仿射运动细化的方法和装置 | |
| WO2020234512A2 (fr) | Procédé, appareil et produit-programme informatique pour codage et décodage vidéo | |
| HK40058476B (en) | Optical flow based video inter prediction | |
| HK40058476A (en) | Optical flow based video inter prediction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18867806 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18867806 Country of ref document: EP Kind code of ref document: A1 |