WO2019059646A1 - Procédé et dispositif de traitement de signal vidéo - Google Patents
Procédé et dispositif de traitement de signal vidéo Download PDFInfo
- Publication number
- WO2019059646A1 WO2019059646A1 PCT/KR2018/011069 KR2018011069W WO2019059646A1 WO 2019059646 A1 WO2019059646 A1 WO 2019059646A1 KR 2018011069 W KR2018011069 W KR 2018011069W WO 2019059646 A1 WO2019059646 A1 WO 2019059646A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- boundary
- padding
- block
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/282—Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/332—Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
- H04N19/122—Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Definitions
- the present invention relates to a video signal processing method and apparatus.
- HD image and UHD image are increasing in various applications.
- HD image and UHD image are increasing in various applications.
- the image data has high resolution and high quality, the amount of data increases relative to the existing image data. Therefore, when the image data is transmitted using a medium such as a wired / wireless broadband line or stored using an existing storage medium, The storage cost is increased.
- High-efficiency image compression techniques can be utilized to solve such problems as image data becomes high-resolution and high-quality.
- An inter picture prediction technique for predicting a pixel value included in a current picture from a previous or a subsequent picture of a current picture by an image compression technique an intra picture prediction technique for predicting a pixel value included in a current picture using pixel information in the current picture
- an entropy encoding technique in which a short code is assigned to a value having a high appearance frequency and a long code is assigned to a value having a low appearance frequency.
- Image data can be effectively compressed and transmitted or stored using such an image compression technique.
- An object of the present invention is to provide a projection transformation method of a 360 degree image using a face having a curved surface shape.
- a method of encoding an image according to the present invention includes the steps of generating a 360 degree projection image including a face at least one side of which is a curved surface by projectively transforming a 360 degree image approximated by a three dimensional figure onto a two dimensional plane, And encoding the encoded data.
- an area between the curved surface boundary of the face and the boundary of the 360 degree projection image is set as a rendering padding area which is not used for rendering the 360 degree image, and the sample value of the rendering padding area is And may be determined based on data of the face and neighboring neighboring faces.
- the image decoding method may include a step of deciphering a face at least one side of which is a curved surface, and a step of reversely projecting the 360 degree projection image including the decoded face into a stereoscopic graphic form.
- a step of deciphering a face at least one side of which is a curved surface and a step of reversely projecting the 360 degree projection image including the decoded face into a stereoscopic graphic form.
- an area between the curved surface boundary of the face and the boundary of the 360 degree projection image is set as a rendering padding area that is not used for rendering a 360 degree image, and the sample value of the rendering padding area is Can be determined based on data of the face and neighboring neighboring faces.
- the rendering padding region may be generated by copying data of the neighboring paces.
- the sample value of the rendering padding area may be determined as an average value of samples adjacent to the boundary of the neighboring face.
- the sample value of the rendering padding region may be determined on the basis of an average or weighted sum operation of samples adjacent to the boundary of the face and the sample adjacent to the boundary of the face.
- the 360-degree projection image is generated based on a RSP (Rotated Sphere Projection) technique, and the 360-degree projection image includes a top face and a bottom face, .
- RSP Ratated Sphere Projection
- the upper face corresponds to a predefined area of the 360 degree image
- the lower face corresponds to the predefined area of the 360 degree image rotated by a predefined angle .
- the encoding / decoding efficiency can be improved by projectively transforming the 360 degree image into two dimensions.
- a coding / decoding efficiency can be improved by adding a padding area to a border or face boundary of a 360-degree image.
- padding is performed using a neighboring face neighboring the current face in a three-dimensional space, thereby preventing image deterioration of the image.
- the present invention it is possible to determine whether to add a padding area to the boundary of the current face in consideration of the continuity in the three-dimensional space, and there is an advantage that the coding / decoding efficiency can be increased.
- the encoding / decoding efficiency can be improved by projectively transforming a 360-degree image using a face having a curved surface shape.
- encoding / decoding efficiency can be improved by assigning a predefined value or a value derived from an adjacent sample to the inactive area between the boundary of the 360 degree projection image and the face boundary of the face.
- FIG. 1 is a block diagram illustrating an image encoding apparatus according to an embodiment of the present invention.
- FIG. 2 is a block diagram illustrating an image decoding apparatus according to an embodiment of the present invention.
- FIG. 3 is a diagram illustrating a partition mode that can be applied to a coding block when a coding block is coded by inter-picture prediction.
- FIGS. 4 to 6 are views illustrating a camera apparatus for generating a panoramic image.
- FIG. 7 is a block diagram of a 360-degree video data generation apparatus and a 360-degree video play apparatus.
- FIG. 8 is a flowchart showing the operation of a 360-degree video data generation apparatus and a 360-degree video play apparatus.
- Figure 9 shows a 2D projection method using the isometric quadrature method.
- FIG. 11 shows a 2D projection method using a bipartite projection technique.
- FIG. 13 shows a 2D projection method using a cutting pyramid projection technique.
- Fig. 15 is a diagram illustrating the conversion between the face 2D coordinate and the three-dimensional coordinate.
- 16 is a diagram for explaining an example in which padding is performed in an ERP projected image.
- 17 is a view for explaining an example in which the lengths of the padding regions in the horizontal direction and the vertical direction are differently set in the ERP projection image.
- 18 is a diagram showing an example in which padding is performed at the boundary of the face.
- 19 is a diagram showing an example of determining a sample value of a padding area between paces.
- 20 is a view illustrating a CMP-based 360 degree projection image.
- 21 is a diagram showing an example in which a plurality of data is included in one face.
- each face is configured to include a plurality of faces.
- 23 is a view showing an example in which padding is performed only at a partial boundary of the face in CMP.
- 24 is a diagram showing an example of converting the upper circle and the lower circle of the cylinder into a rectangular shape.
- 25 is a view showing a 360-degree projection image based on ECP.
- 26 is a view showing an example in which padding is performed only at a part of the boundary of the face in the ECP.
- FIG. 27 is a view showing an example in which frame packing is performed in a state where an upper circle and a lower circle of a cylinder are converted into an arc shape.
- FIGS. 28 and 29 are diagrams illustrating an example in which padding is performed only on a partial boundary of a face in a modified ECP.
- FIG. 30 is a diagram showing two faces of a 360-degree projection image based on RSP.
- 31 is a diagram illustrating a rendering padding area in a 360-degree projection image based on an RSP.
- 32 is a diagram showing 2D data that replaces the rendering padding area.
- 33 is a flowchart showing an inter prediction method according to an embodiment to which the present invention is applied.
- FIG. 34 illustrates a process of deriving motion information of a current block when a merge mode is applied to the current block.
- 35 is a diagram showing an example of a spatial neighboring block.
- 36 is a diagram for explaining an example of deriving a motion vector of a temporal merging candidate.
- 37 is a diagram showing the positions of candidate blocks that can be used as a collocated block.
- FIG. 38 shows a process of deriving motion information of a current block when the AMVP mode is applied to the current block.
- 39 is a view showing an example in which the symmetric mode is applied under the merge mode.
- 40 is a view showing an example in which the symmetric mode is applied under the AMVP mode.
- first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.
- the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.
- / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.
- FIG. 1 is a block diagram illustrating an image encoding apparatus according to an embodiment of the present invention.
- the image encoding apparatus 100 includes a picture division unit 110, prediction units 120 and 125, a transform unit 130, a quantization unit 135, a reordering unit 160, an entropy encoding unit An inverse quantization unit 140, an inverse transform unit 145, a filter unit 150, and a memory 155.
- each of the components shown in FIG. 1 is shown independently to represent different characteristic functions in the image encoding apparatus, and does not mean that each component is composed of separate hardware or one software configuration unit. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of the constituent units may be combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function.
- the integrated embodiments and separate embodiments of the components are also included within the scope of the present invention, unless they depart from the essence of the present invention.
- the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance.
- the present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.
- the picture division unit 110 may divide the input picture into at least one processing unit.
- the processing unit may be a prediction unit (PU), a transform unit (TU), or a coding unit (CU).
- the picture division unit 110 divides one picture into a plurality of coding units, a prediction unit, and a combination of conversion units, and generates a coding unit, a prediction unit, and a conversion unit combination So that the picture can be encoded.
- one picture may be divided into a plurality of coding units.
- a recursive tree structure such as a quad tree structure can be used.
- a unit can be divided with as many child nodes as the number of divided coding units. Under certain constraints, an encoding unit that is no longer segmented becomes a leaf node. That is, when it is assumed that only one square division is possible for one coding unit, one coding unit can be divided into a maximum of four different coding units.
- a coding unit may be used as a unit for performing coding, or may be used as a unit for performing decoding.
- the prediction unit may be one divided into at least one square or rectangular shape having the same size in one coding unit, and one of the prediction units in one coding unit may be divided into another prediction Or may have a shape and / or size different from the unit.
- intraprediction can be performed without dividing the prediction unit into a plurality of prediction units NxN.
- the prediction units 120 and 125 may include an inter prediction unit 120 for performing inter prediction and an intra prediction unit 125 for performing intra prediction. It is possible to determine whether to use inter prediction or intra prediction for a prediction unit and to determine concrete information (e.g., intra prediction mode, motion vector, reference picture, etc.) according to each prediction method.
- the processing unit in which the prediction is performed may be different from the processing unit in which the prediction method and the concrete contents are determined. For example, the method of prediction, the prediction mode and the like are determined as a prediction unit, and the execution of the prediction may be performed in a conversion unit.
- the residual value (residual block) between the generated prediction block and the original block can be input to the conversion unit 130.
- the prediction mode information, motion vector information, and the like used for prediction can be encoded by the entropy encoding unit 165 together with the residual value and transmitted to the decoder.
- the entropy encoding unit 165 When a particular encoding mode is used, it is also possible to directly encode the original block and transmit it to the decoding unit without generating a prediction block through the prediction units 120 and 125.
- the inter-prediction unit 120 may predict a prediction unit based on information of at least one of a previous picture or a following picture of the current picture, and may predict a prediction unit based on information of a partially- Unit may be predicted.
- the inter prediction unit 120 may include a reference picture interpolation unit, a motion prediction unit, and a motion compensation unit.
- the reference picture information is supplied from the memory 155 and pixel information of an integer pixel or less can be generated in the reference picture.
- a DCT-based interpolation filter having a different filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of quarter pixels.
- a DCT-based 4-tap interpolation filter having a different filter coefficient may be used to generate pixel information of an integer number of pixels or less in units of 1/8 pixel.
- the motion prediction unit may perform motion prediction based on the reference picture interpolated by the reference picture interpolating unit.
- Various methods such as Full Search-based Block Matching Algorithm (FBMA), Three Step Search (TSS), and New Three-Step Search Algorithm (NTS) can be used as methods for calculating motion vectors.
- the motion vector may have a motion vector value of 1/2 or 1/4 pixel unit based on the interpolated pixel.
- the motion prediction unit can predict the current prediction unit by making the motion prediction method different.
- Various methods such as a skip method, a merge method, an AMVP (Advanced Motion Vector Prediction) method, and an Intra Block Copy method can be used as the motion prediction method.
- AMVP Advanced Motion Vector Prediction
- the intra prediction unit 125 can generate a prediction unit based on reference pixel information around the current block which is pixel information in the current picture.
- the reference pixel included in the block in which the inter prediction is performed is referred to as the reference pixel Information. That is, when the reference pixel is not available, the reference pixel information that is not available may be replaced by at least one reference pixel among the available reference pixels.
- the prediction mode may have a directional prediction mode in which reference pixel information is used according to a prediction direction, and a non-directional mode in which direction information is not used in prediction.
- the mode for predicting the luminance information may be different from the mode for predicting the chrominance information and the intra prediction mode information or predicted luminance signal information used for predicting the luminance information may be utilized to predict the chrominance information.
- intraprediction when the size of the prediction unit is the same as the size of the conversion unit, intra prediction is performed on the prediction unit based on pixels existing on the left side of the prediction unit, pixels existing on the upper left side, Can be performed.
- intra prediction when the size of the prediction unit differs from the size of the conversion unit, intraprediction can be performed using the reference pixel based on the conversion unit. It is also possible to use intraprediction using NxN partitioning only for the minimum encoding unit.
- the intra prediction method can generate a prediction block after applying an AIS (Adaptive Intra Smoothing) filter to the reference pixel according to the prediction mode.
- the type of the AIS filter applied to the reference pixel may be different.
- the intra prediction mode of the current prediction unit can be predicted from the intra prediction mode of the prediction unit existing around the current prediction unit.
- the prediction mode of the current prediction unit is predicted using the mode information predicted from the peripheral prediction unit, if the intra prediction mode of the current prediction unit is the same as the intra prediction mode of the current prediction unit,
- the prediction mode information of the current block can be encoded by performing entropy encoding if the prediction mode of the current prediction unit is different from the prediction mode of the neighbor prediction unit.
- a residual block including a prediction unit that has been predicted based on the prediction unit generated by the prediction units 120 and 125 and a residual value that is a difference value from the original block of the prediction unit may be generated.
- the generated residual block may be input to the transform unit 130.
- the transform unit 130 transforms the residual block including the residual information of the prediction unit generated through the original block and the predictors 120 and 125 into a DCT (Discrete Cosine Transform), a DST (Discrete Sine Transform), a KLT You can convert using the same conversion method.
- the decision to apply the DCT, DST, or KLT to transform the residual block may be based on the intra prediction mode information of the prediction unit used to generate the residual block.
- the quantization unit 135 may quantize the values converted into the frequency domain by the conversion unit 130. [ The quantization factor may vary depending on the block or the importance of the image. The values calculated by the quantization unit 135 may be provided to the inverse quantization unit 140 and the reorder unit 160.
- the reordering unit 160 can reorder the coefficient values with respect to the quantized residual values.
- the reordering unit 160 may change the two-dimensional block type coefficient to a one-dimensional vector form through a coefficient scanning method.
- the rearranging unit 160 may scan a DC coefficient to a coefficient in a high frequency region using a Zig-Zag scan method, and change the DC coefficient to a one-dimensional vector form.
- a vertical scan may be used to scan two-dimensional block type coefficients in a column direction, and a horizontal scan to scan a two-dimensional block type coefficient in a row direction depending on the size of the conversion unit and the intra prediction mode. That is, it is possible to determine whether any scanning method among the jig-jag scan, the vertical direction scan and the horizontal direction scan is used according to the size of the conversion unit and the intra prediction mode.
- the entropy encoding unit 165 may perform entropy encoding based on the values calculated by the reordering unit 160.
- various encoding methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) may be used.
- the entropy encoding unit 165 receives the residual value count information of the encoding unit, the block type information, the prediction mode information, the division unit information, the prediction unit information and the transmission unit information, and the motion information of the motion unit from the reordering unit 160 and the prediction units 120 and 125 Vector information, reference frame information, interpolation information of a block, filtering information, and the like.
- the entropy encoding unit 165 can entropy-encode the coefficient value of the encoding unit input by the reordering unit 160.
- the inverse quantization unit 140 and the inverse transformation unit 145 inverse quantize the quantized values in the quantization unit 135 and inversely transform the converted values in the conversion unit 130.
- the residual value generated by the inverse quantization unit 140 and the inverse transform unit 145 is combined with the prediction unit predicted through the motion estimation unit, the motion compensation unit and the intra prediction unit included in the prediction units 120 and 125, A block (Reconstructed Block) can be generated.
- the filter unit 150 may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).
- a deblocking filter may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).
- ALF adaptive loop filter
- the deblocking filter can remove block distortion caused by the boundary between the blocks in the reconstructed picture. It may be determined whether to apply a deblocking filter to the current block based on pixels included in a few columns or rows included in the block to determine whether to perform deblocking. When a deblocking filter is applied to a block, a strong filter or a weak filter may be applied according to the deblocking filtering strength required. In applying the deblocking filter, horizontal filtering and vertical filtering may be performed concurrently in performing vertical filtering and horizontal filtering.
- the offset correction unit may correct the offset of the deblocked image with respect to the original image in units of pixels.
- pixels included in an image are divided into a predetermined number of areas, and then an area to be offset is determined and an offset is applied to the area.
- Adaptive Loop Filtering can be performed based on a comparison between the filtered reconstructed image and the original image. After dividing the pixels included in the image into a predetermined group, one filter to be applied to the group may be determined and different filtering may be performed for each group.
- the information related to whether to apply the ALF may be transmitted for each coding unit (CU), and the shape and the filter coefficient of the ALF filter to be applied may be changed according to each block. Also, an ALF filter of the same type (fixed form) may be applied irrespective of the characteristics of the application target block.
- the memory 155 may store the reconstructed block or picture calculated through the filter unit 150 and the reconstructed block or picture stored therein may be provided to the predictor 120 or 125 when the inter prediction is performed.
- FIG. 2 is a block diagram illustrating an image decoding apparatus according to an embodiment of the present invention.
- the image decoder 200 includes an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225, prediction units 230 and 235, 240, and a memory 245 may be included.
- the input bitstream may be decoded in a procedure opposite to that of the image encoder.
- the entropy decoding unit 210 can perform entropy decoding in a procedure opposite to that in which entropy encoding is performed in the entropy encoding unit of the image encoder. For example, various methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) may be applied in accordance with the method performed by the image encoder.
- various methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) may be applied in accordance with the method performed by the image encoder.
- CAVLC Context-Adaptive Variable Length Coding
- CABAC Context-Adaptive Binary Arithmetic Coding
- the entropy decoding unit 210 may decode information related to intra prediction and inter prediction performed in the encoder.
- the reordering unit 215 can perform reordering based on a method in which the entropy decoding unit 210 rearranges the entropy-decoded bitstreams in the encoding unit.
- the coefficients represented by the one-dimensional vector form can be rearranged by restoring the coefficients of the two-dimensional block form again.
- the reordering unit 215 can perform reordering by receiving information related to the coefficient scanning performed by the encoding unit and performing a reverse scanning based on the scanning order performed by the encoding unit.
- the inverse quantization unit 220 can perform inverse quantization based on the quantization parameters provided by the encoder and the coefficient values of the re-arranged blocks.
- the inverse transform unit 225 may perform an inverse DCT, an inverse DST, and an inverse KLT on the DCT, DST, and KLT transformations performed by the transform unit on the quantization result performed by the image encoder.
- the inverse transform can be performed based on the transmission unit determined by the image encoder.
- a transform technique e.g., DCT, DST, KLT
- the prediction units 230 and 235 can generate a prediction block based on the prediction block generation related information provided by the entropy decoding unit 210 and the previously decoded block or picture information provided in the memory 245.
- intraprediction is performed using a reference pixel based on the conversion unit . It is also possible to use intra prediction using NxN division only for the minimum coding unit.
- the prediction units 230 and 235 may include a prediction unit determination unit, an inter prediction unit, and an intra prediction unit.
- the prediction unit determination unit receives various information such as prediction unit information input from the entropy decoding unit 210, prediction mode information of the intra prediction method, motion prediction related information of the inter prediction method, and identifies prediction units in the current coding unit. It is possible to determine whether the unit performs inter prediction or intra prediction.
- the inter prediction unit 230 predicts the current prediction based on the information included in at least one of the previous picture of the current picture or the following picture including the current prediction unit by using information necessary for inter prediction of the current prediction unit provided by the image encoder, Unit can be performed. Alternatively, the inter prediction may be performed on the basis of the information of the partial region previously reconstructed in the current picture including the current prediction unit.
- a motion prediction method of a prediction unit included in a corresponding encoding unit on the basis of an encoding unit includes a skip mode, a merge mode, an AMVP mode, and an intra block copy mode It is possible to judge whether or not it is any method.
- the intra prediction unit 235 can generate a prediction block based on the pixel information in the current picture. If the prediction unit is a prediction unit that performs intra prediction, the intra prediction can be performed based on the intra prediction mode information of the prediction unit provided by the image encoder.
- the intraprediction unit 235 may include an AIS (Adaptive Intra Smoothing) filter, a reference pixel interpolator, and a DC filter.
- the AIS filter performs filtering on the reference pixels of the current block and can determine whether to apply the filter according to the prediction mode of the current prediction unit.
- the AIS filtering can be performed on the reference pixel of the current block using the prediction mode of the prediction unit provided in the image encoder and the AIS filter information. When the prediction mode of the current block is a mode in which AIS filtering is not performed, the AIS filter may not be applied.
- the reference pixel interpolator may interpolate the reference pixels to generate reference pixels in units of pixels less than or equal to an integer value when the prediction mode of the prediction unit is a prediction unit that performs intra prediction based on pixel values obtained by interpolating reference pixels.
- the reference pixel may not be interpolated in the prediction mode in which the prediction mode of the current prediction unit generates the prediction block without interpolating the reference pixel.
- the DC filter can generate a prediction block through filtering when the prediction mode of the current block is the DC mode.
- the restored block or picture may be provided to the filter unit 240.
- the filter unit 240 may include a deblocking filter, an offset correction unit, and an ALF.
- the deblocking filter of the video decoder When information on whether a deblocking filter is applied to a corresponding block or picture from the image encoder or a deblocking filter is applied, information on whether a strong filter or a weak filter is applied can be provided.
- the deblocking filter of the video decoder the deblocking filter related information provided by the video encoder is provided, and the video decoder can perform deblocking filtering for the corresponding block.
- the offset correction unit may perform offset correction on the reconstructed image based on the type of offset correction applied to the image and the offset value information during encoding.
- the ALF can be applied to an encoding unit on the basis of ALF application information and ALF coefficient information provided from an encoder.
- ALF information may be provided in a specific parameter set.
- the memory 245 may store the reconstructed picture or block to be used as a reference picture or a reference block, and may also provide the reconstructed picture to the output unit.
- a coding unit (coding unit) is used as a coding unit for convenience of explanation, but it may be a unit for performing not only coding but also decoding.
- the current block indicates a block to be coded / decoded.
- the current block includes a coding tree block (or coding tree unit), a coding block (or coding unit), a transform block (Or prediction unit), and the like.
- 'unit' represents a basic unit for performing a specific encoding / decoding process
- 'block' may represent a sample array of a predetermined size.
- the terms 'block' and 'unit' may be used interchangeably.
- the encoding block (coding block) and the encoding unit (coding unit) have mutually equivalent meanings.
- the basic block may be referred to as a coding tree unit.
- the coding tree unit may be defined as a coding unit of the largest size allowed in a sequence or a slice. Information regarding whether the coding tree unit is square or non-square or about the size of the coding tree unit can be signaled through a sequence parameter set, a picture parameter set, or a slice header.
- the coding tree unit can be divided into smaller size partitions. In this case, if the partition generated by dividing the coding tree unit is depth 1, the partition created by dividing the partition having depth 1 can be defined as depth 2. That is, the partition created by dividing the partition having the depth k in the coding tree unit can be defined as having the depth k + 1.
- a partition of arbitrary size generated as the coding tree unit is divided can be defined as a coding unit.
- the coding unit may be recursively divided or divided into basic units for performing prediction, quantization, transformation, or in-loop filtering, and the like.
- a partition of arbitrary size generated as a coding unit is divided may be defined as a coding unit, or may be defined as a conversion unit or a prediction unit, which is a basic unit for performing prediction, quantization, conversion or in-loop filtering and the like.
- a prediction block having the same size as the coding block or smaller than the coding block can be determined through predictive division of the coding block.
- Predictive partitioning of the coded block can be performed by a partition mode (Part_mode) indicating the partition type of the coded block.
- Part_mode partition mode
- the size or shape of the prediction block may be determined according to the partition mode of the coding block.
- the division type of the coding block can be determined through information specifying any one of the partition candidates.
- the partition candidates available to the coding block may include an asymmetric partition type (for example, nLx2N, nRx2N, 2NxnU, 2NxnD) depending on the size, type, coding mode or the like of the coding block.
- the partition candidate available to the coding block may be determined according to the coding mode of the current block. For example, FIG. 3 illustrates a partition mode that can be applied to a coding block when the coding block is coded by inter-picture prediction.
- one of eight partitioning modes can be applied to the coding block, as in the example shown in Fig.
- the coding mode can be applied to the partition mode PART_2Nx2N or PART_NxN.
- PART_NxN may be applied when the coding block has a minimum size.
- the minimum size of the coding block may be one previously defined in the encoder and the decoder.
- information regarding the minimum size of the coding block may be signaled via the bitstream.
- the minimum size of the coding block is signaled through the slice header, so that the minimum size of the coding block per slice can be defined.
- the partition candidates available to the coding block may be determined differently depending on at least one of the size or type of the coding block. In one example, the number or type of partition candidates available to the coding block may be differently determined according to at least one of the size or type of the coding block.
- the type or number of asymmetric partition candidates among the partition candidates available to the coding block may be limited depending on the size or type of the coding block. In one example, the number or type of asymmetric partition candidates available to the coding block may be differently determined according to at least one of the size or type of the coding block.
- the size of the prediction block may have a size from 64x64 to 4x4.
- the coding block is coded by inter-picture prediction, it is possible to prevent the prediction block from having a 4x4 size in order to reduce the memory bandwidth when performing motion compensation.
- FIGS. 4 to 6 show an example in which a plurality of cameras are used to photograph up and down, right and left, or front and back at the same time.
- a video generated by stitching a plurality of videos can be referred to as a panoramic video.
- an image having a degree of freedom (Degree of Freedom) based on a predetermined center axis can be referred to as a 360-degree video.
- the 360 degree video may be an image having rotational degrees of freedom for at least one of Yaw, Roll, and Pitch.
- the camera structure (or camera arrangement) for acquiring 360-degree video may have a circular arrangement, as in the example shown in Fig. 4, or a one-dimensional vertical / horizontal arrangement as in the example shown in Fig. Or a two-dimensional arrangement (i.e., a combination of vertical arrangement and horizontal arrangement) as in the example shown in Fig. 5 (b).
- a plurality of cameras may be mounted on the spherical device.
- FIG. 7 is a block diagram of a 360-degree video data generation apparatus and a 360-degree video play apparatus
- FIG. 8 is a flowchart illustrating operations of a 360-degree video data generation apparatus and a 360-degree video data apparatus.
- the 360-degree video data generation apparatus includes a projection unit 710, a frame packing unit 720, an encoding unit 730, and a transmission unit 740, A parsing unit 750, a decoding unit 760, a frame deblocking unit 770, and an inverse decoding unit 780.
- the encoding unit and the decoding unit shown in FIG. 7 may correspond to the image encoding apparatus and the image decoding apparatus shown in FIG. 1 and FIG. 2, respectively.
- the data generation apparatus can determine a projection transformation technique of a 360-degree image generated by stitching an image photographed by a plurality of cameras.
- the 3D shape of the 360-degree video is determined according to the determined projection transformation technique, and the 360-degree video is projected on the 2D plane according to the determined 3D shape (S801).
- the projection transformation technique can represent a 3D shape of 360-degree video and an aspect in which 360-degree video is developed on the 2D plane.
- 360 degree images can be approximated to have shapes such as spheres, cylinders, cubes, octahedrons, or regular twins, etc., in 3D space according to projection transformation techniques.
- an image generated by projecting a 360-degree video onto a 2D plane can be referred to as a 360-degree projection image.
- the 360 degree projection image may be composed of at least one face according to the projection transformation technique.
- each face constituting the polyhedron can be defined as a pace.
- the specific surface constituting the polyhedron may be divided into a plurality of regions, and each divided region may be configured to form a separate face.
- a plurality of faces on the polyhedron may be configured to form one face.
- one face on the polyhedron and the padding area may be configured to form one face.
- 360 degree video which approximates spherical shape, can have multiple faces according to the projection transformation technique.
- the face to be subjected to signal processing will be referred to as a " current face ".
- the current face may refer to a face to be subjected to encoding / decoding or frame packing / frame deblocking according to the signal processing step.
- Frame packing may be performed in the frame packing unit 720 in order to increase the encoding / decoding efficiency of the 360-degree video (S802).
- the frame packing may include at least one of rearranging, resizing, warping, rotating, or flipping the face.
- the 360 degree projection image can be converted into a form having a high encoding / decoding efficiency (for example, a rectangle) or discontinuous data between faces can be removed.
- the frame packing may also be referred to as frame reordering or Region-wise Packing.
- the frame packing may be selectively performed to improve the coding / decoding efficiency for the 360 degree projection image.
- the 360-degree projection image or the 360-degree projection image in which the frame packing is performed may be encoded (S803).
- the encoding unit 730 may encode information indicating a projection transformation technique for 360-degree video.
- the information indicating the projection transformation technique may be index information indicating any one of a plurality of projection transformation techniques.
- the encoding unit 730 can encode information related to frame packing for 360-degree video.
- the information related to the frame packing may include at least one of whether or not frame packing has been performed, the number of paces, the position of the pace, the size of the pace, the shape of the pace, or the rotation information of the pace.
- the transmitting unit 740 encapsulates the bit stream and transmits the encapsulated data to the player terminal (S804).
- the file parsing unit 750 can parse the file received from the content providing apparatus (S805).
- the decoding unit 760 the 360-degree projection image can be decoded using the parsed data (S806).
- the frame deblocking unit 760 may perform a frame de-packing (Region-wise depacking), which is opposite to the frame packing performed on the content providing side (S807).
- the frame de-packing may be to restore the frame-packed 360 degree projection image to before the frame packing is performed.
- frame de-packing may be to reverse the pacing, resizing, warping, rotation, or flipping performed at the data generating device.
- the inverse transformation unit 780 can perform inverse projection on the 360 degree projection image on the 2D plane in 3D form according to the projection transformation technique of 360 degree video (S808).
- Projection transformation techniques include ERP, Equirectangular Procction, Cube Map Projection (CMP), Icosahedral Projection (ISP), Octahedron Projection (OHP), Cutting Pyramid And may include at least one of Truncated Pyramid Projection (TPP), Sphere Segment Projection (SSP), Equatorial Cylindrical Projection (ECP), and rotated spherical projection (RSP).
- Figure 9 shows a 2D projection method using the isometric quadrature method.
- the isometric method is a method of projecting a pixel corresponding to a sphere into a rectangle having an aspect ratio of N: 1, which is the most widely used 2D transformation technique.
- N may be 2, or may be 2 or less or 2 or more real numbers.
- the actual length of the sphere corresponding to the unit length on the 2D plane becomes shorter as the sphere becomes closer to the sphere.
- the coordinates of both ends of the unit length on the 2D plane may correspond to a distance difference of 20 cm in the vicinity of the sphere of the sphere, and a distance difference of 5 cm in the vicinity of the sphere of the sphere.
- the isochronous quadrature method has a disadvantage in that the image is distorted in the vicinity of the sphere and the coding efficiency is lowered.
- the cube projection method approximates a 360 degree video with a cube and then transforms the cube into 2D.
- one face or plane
- the cube projection method has an advantage in that the coding efficiency is higher than that of the isotropic square method.
- the 2D projection converted image may be rearranged into a rectangular shape to perform encoding / decoding.
- FIG. 11 shows a 2D projection method using a bipartite projection technique.
- the trilateral projection method is a method of approximating a 360-degree video to a twenty-sided shape and transforming it into 2D.
- the twin-sided projection technique has a strong continuity between faces.
- the octahedron projection method is a method of approximating a 360 degree video to an octahedron and transforming it into 2D.
- the octahedral projection technique is characterized by strong continuity between faces. As in the example shown in FIG. 12, it is possible to perform encoding / decoding by rearranging the faces in the 2D projection-converted image.
- FIG. 13 shows a 2D projection method using a cutting pyramid projection technique.
- the truncated pyramid projection technique is a method of approximating a 360 degree video with a cutting pyramid and transforming it into 2D.
- frame packing may be performed such that the face at a particular point in time has a different size from the neighboring face.
- the Front face may have a larger size than the side face and the back face.
- the image data at a specific point in time is large and the encoding / decoding efficiency at a specific point is higher than that at the other points.
- the SSP is a method of performing 2D projection transformation by dividing spherical 360 degree video into high latitude regions and mid-latitude regions. Specifically, as in the example shown in Fig. 14, two high-latitude regions in the north and south directions of the sphere can be mapped to two circles on the 2D plane, and the mid-latitude region of the sphere can be mapped to a rectangle on the 2D plane like the ERP.
- the boundary between high latitudes and mid-latitudes may be 45 degrees latitude or above or below latitude 45 degrees.
- ECP is a method of transforming spherical 360 degree video into cylindrical shape and then 2D cylindrical projection of 360 degree video. Specifically, when the ECP is followed, the upper and lower surfaces of the cylinder can be mapped to two circles on the 2D plane, and the body of the cylinder can be mapped to a rectangle on the 2D plane.
- RSP represents a method of projecting and transforming a sphere-shaped 360-degree video into two ellipses on a 2D plane, such as a tennis ball.
- Each sample of the 360 degree projection image can be identified by face 2D coordinates.
- the face 2D coordinates may include an index f for identifying the face where the sample is located, and coordinates (m, n) representing a sample grid in the 360 degree projection image.
- FIG. 15 is an illustration to illustrate the conversion between face 2D coordinates and three-dimensional coordinates.
- (X, y, z) and the face 2D coordinates (f, m, n) can be performed using the following equations (1) have.
- the current picture may include at least one face.
- the number of faces may be 1, 2, 3, 4 or more natural numbers, depending on the projection method.
- f may be set to a value equal to or less than the number of faces.
- the current picture may include at least one pace having the same temporal order or output order (POC).
- the number of paces constituting the current picture may be fixed or variable.
- the number of paces constituting the current picture may be limited so as not to exceed a predetermined threshold value.
- the threshold value may be a fixed value promised in the encoder and the decoder.
- information regarding the maximum number of paces constituting one picture may be signaled through the bit stream.
- Paces can be determined by partitioning the current picture using at least one of horizontal, vertical, or diagonal lines, depending on the projection method.
- Each face in the picture may be assigned an index to identify each face.
- Each face may be capable of parallel processing, such as a tile or a slice. Accordingly, when intra prediction or inter prediction of the current block is performed, a neighboring block belonging to a different face from the current block can be judged as unavailable.
- Pairs that do not allow parallel processing may be defined, or interdependent paces may be defined.
- paces for which parallel processing is not allowed or interdependent paces may be sequentially encoded / decoded instead of being parallel-encoded / decoded. Accordingly, even if the neighboring block belongs to a different pace than the current block, the neighboring block may be determined to be available for intra prediction or inter prediction of the current block, depending on whether inter-face parallel processing is possible or dependency.
- padding can be performed at a picture or face boundary.
- the padding may be performed as a part of performing the frame packing (S802), or may be performed as a separate step before performing the frame packing.
- padding may be performed in the preprocessing process before encoding the 360-degree projection image in which the frame packing is performed, or padding may be performed as a part of the encoding step S803.
- the padding can be performed considering the continuity of the 360 degree image.
- the continuity of a 360 degree image may indicate whether it is spatially continuous when the 360 degree projection image is projected backwards into a sphere or a polyhedron.
- spatially contiguous paces can be understood to have continuity in 3D space.
- Padding between pictures or face boundaries may be performed using spatially continuous samples.
- 16 is a diagram for explaining an example in which padding is performed in an ERP projected image.
- the upper boundary on the left has continuity with the upper boundary on the right.
- pixels G and H outside the upper left boundary line can be predicted to be similar to the inner pixels G 'and H' of the upper right boundary, and pixels I and J Can be predicted to be similar to the inner pixels I 'and J' of the upper left boundary.
- the upper left boundary has continuity with the upper right boundary.
- pixels K and L outside the lower left boundary line can be predicted to be similar to the inner pixels K 'and L' of the lower right boundary
- pixels M and N Can be predicted to be similar to the inner pixels M 'and N' of the lower left boundary.
- padding can be performed at the boundary of the 360 degree projection image or at the boundary between faces.
- the padding can be performed using samples contained inside the boundary having continuity with the boundary where the padding is performed.
- padding is performed using the samples adjacent to the right boundary at the left boundary of the 360 degree projection image
- padding is performed using the samples adjacent to the left boundary at the right boundary of the 360 degree projection image . That is, at positions A, B and C of the left boundary, padding can be performed using samples at positions A ', B' and C 'contained inside the right boundary, and the positions D, E and F , Padding can be performed using samples of the positions of D ', E' and F 'included inside the left boundary.
- padding is performed using samples adjacent to the upper right boundary at the upper left boundary
- padding can be performed using samples adjacent to the upper left boundary at the upper right boundary. That is, at the G and H positions of the upper left boundary, padding is performed using the samples at G 'and H' positions contained in the upper right boundary, and at the I and J positions of the upper right boundary, The padding can be performed by using the samples of the positions I 'and J' contained inside.
- padding may be performed using samples adjacent to the lower-right boundary at the lower left boundary, and padding may be performed using samples adjacent to the lower left boundary at the lower right boundary. That is, at the K and L positions of the lower left boundary, padding is performed using samples at positions K 'and L' included in the upper right boundary, and at the M and N positions of the upper right boundary, The padding can be performed using the samples at the positions M 'and N' included in the inner side of the padding.
- a padding area An area where padding is performed may be referred to as a padding area, and a padding area may include a plurality of sample lines.
- the number of sample lines included in the padding area can be defined as the length of the padding area or the padding size.
- the length of the padding area is shown as k in both the horizontal and vertical directions.
- the length of the padding area may be set differently for each horizontal or vertical direction, or different for each face boundary.
- large distortion occurs at the upper or lower end of the 360 degree projection image using the ERP projection transformation.
- 17 is a view for explaining an example in which the lengths of the padding regions in the horizontal direction and the vertical direction are differently set in the ERP projection image.
- the length of the arrow indicates the length of the padding area.
- the length of the padding area performed in the horizontal direction and the length of the padding area performed in the vertical direction may be set differently, as in the example shown in FIG. For example, if k columns of samples are generated through padding in the horizontal direction, padding may be performed such that 2k rows of samples are generated in the vertical direction.
- padding may be performed with the same length in both the vertical direction and the horizontal direction, but the length of the padding area may be posteriorly extended through interpolation in at least one of the vertical direction and the horizontal direction.
- k sample lines in the vertical direction and horizontal direction can be generated, and k sample lines can be additionally generated in the vertical direction through interpolation or the like. That is, k sample lines are generated in both the horizontal and vertical directions (see FIG. 16), and k sample lines are further generated for the vertical direction so that the length in the vertical direction is 2k (refer to FIG. 17) .
- Interpolation may be performed using at least one of the samples contained within the boundary or the sample contained outside the boundary. For example, after copying the samples inside the lower boundary to the outside of the padding area adjacent to the upper boundary, additional padding areas can be created by interpolating the copied samples and the samples contained in the padding area adjacent to the upper boundary .
- the interpolation filter may include at least one of a vertical direction filter and a horizontal direction filter. Depending on the position of the sample to be produced, either the vertical filter or the horizontal filter may be selectively used. Alternatively, the vertical filter and the horizontal filter may be used simultaneously to generate a sample included in the additional padding area.
- the length n in the horizontal direction of the padding area and the length m in the vertical direction of the padding area may have the same value or may have different values.
- n and m are natural numbers equal to or greater than 0 and may have mutually the same value, or one of m and n may have a smaller value than the other.
- m and n can be encoded in the encoder and signaled through the bit stream.
- the length n in the horizontal direction and the length m in the vertical direction in the encoder and decoder may be predefined.
- the padding area may be generated by copying samples located inside the image.
- the padding region located adjacent to a predetermined boundary may be generated by copying a sample located inside the boundary having continuity with a predetermined boundary in 3D space.
- a padding area located at the left boundary of the image may be generated by copying the sample adjacent to the right border of the image.
- a padding area may be created using at least one sample inside the boundary to be padded and at least one sample outside the boundary. For example, after padding the spatially contiguous samples with the boundary to be padded to the outside of the boundary, a weighted average calculation or an average calculation is performed between the copied samples and the samples included in the boundary, Can be determined. 16 and 17, the sample value of the padding region located at the left boundary of the image may include at least one sample adjacent to the left boundary of the image and at least one sample adjacent to the right boundary of the image Weighted average or averaged.
- the weight applied to each sample in the weighted average operation may be determined based on the distance to the boundary where the padding region is located. For example, of the samples in the padding region located at the left boundary, a sample close to the left boundary is derived by giving a large weight to samples located inside the left boundary, while a sample far away from the left boundary is sampled That is, samples adjacent to the right border of the image).
- frame packing can be performed by adding a padding area between faces. That is, a 360 degree projection image can be generated by adding a padding area to the face boundary.
- 18 is a diagram showing an example in which padding is performed at the boundary of the face.
- the face located at the upper end of the 360 degree projection image will be referred to as the upper face and the face located at the lower end of the 360 degree projection image will be referred to as the lower face based on the drawing shown in FIG. 18 (a) do.
- the upper face may represent one of faces 1, 2, 3, and 4, and the lower face may represent any of faces 5, 6, 7,
- a padding area in the form of surrounding a predetermined face can be set.
- a padding region containing m samples may be created.
- the padding area is set to surround the face, but the padding area may be set to only a part of the face boundary. That is, unlike in the example shown in FIG. 18 (b), the padding area may be added only at the boundary of the image, or the padding area may be added only between the faces to perform the frame packing.
- the length of the padding area between the faces may be set the same or may be set differently depending on the position.
- the length (i.e., length in the horizontal direction) n of the padding region located at the left or right side of the predetermined face and the length m in the horizontal direction of the padding region located at the upper or lower end of the predetermined face may have the same value, Value.
- n and m are natural numbers equal to or greater than 0 and may have mutually the same value, or one of m and n may have a smaller value than the other.
- m and n can be encoded in the encoder and signaled through the bit stream.
- the length n in the horizontal direction and the length m in the vertical direction may be predefined in the encoder and decoder in accordance with the projection conversion method, the position of the face, the size of the face or the shape of the face.
- the sample value of the padding area may be determined based on the sample included in the predetermined face or the sample included in the predetermined face and the sample included in the face adjacent to the predetermined face.
- a sample value of a padding area adjacent to a boundary of a predetermined face may be generated by copying a sample included in the face or interpolating samples included in the face.
- the upper extension region U of the upper face may be created by copying a sample adjacent to the boundary of the upper face, or by interpolating a predetermined number of samples adjacent to the boundary of the upper face .
- the lower extension region D of the lower face may be generated by copying a sample adjacent to the boundary of the lower face or by interpolating a predetermined number of samples adjacent to the boundary of the lower face.
- a sample value of a padding area adjacent to a boundary of a predetermined face may be generated using a sample value included in a face spatially adjacent to the face.
- the inter-face adjacency can be determined based on whether the faces have continuity when the 360 degree projection image is projected back onto the 3D space.
- a sample value of a padding area adjacent to a boundary of a predetermined face is generated by copying a sample included in a face spatially adjacent to the face, or a sample included in the face and a sample included in the face spatially adjacent to the face Can be generated by interpolating samples.
- the left portion of the upper extended region of the second face may be generated based on the samples included in the first face, and the right portion may be generated based on the samples included in the third face.
- 19 is a diagram showing an example of determining a sample value of a padding area between paces.
- the padding region between the first face and the second face may be obtained by weighted averaging at least one sample included in the first face and at least one sample included in the second face.
- the padding region between the upper face and the lower face can be obtained by weighted averaging the upper extension region U and the lower extension region D.
- the weight w may be determined based on the information encoded and signaled by the encoder. Alternatively, depending on the position of the sample in the padding region, the weight w may be variably determined. For example, the weight w may be determined based on the distance from the position of the sample in the padding region to the first face and the distance from the position of the sample in the padding region to the second face.
- Equations (4) and (5) show examples in which the weight w is variably determined according to the position of the sample.
- a sample value of the padding area is generated based on Equation (4) in the lower extended region close to the lower face, and in the upper extended region close to the upper face, A sample value of the padding region can be generated.
- the filter for the weighting operation may have a vertical direction, a horizontal direction, or a predetermined angle. If the weighted filter has a predetermined angle, the sample included in the first pace and the sample included in the second pace located on the predetermined angle line from the sample in the padding region may be used to determine the sample value of the corresponding sample .
- the padding region may be generated using only samples included in either the first face or the second face. For example, if any one of the samples included in the first face or the sample included in the second face is not available, padding can be performed using only the available samples. Alternatively, padding may be performed by replacing the unavailable sample with the surrounding available sample.
- padding-related embodiments are described based on a specific projection transformation method
- padding can be performed on the same principle as the embodiments described in the projection transformation method other than the exemplified projection transformation method.
- padding can be performed at a face boundary or an image boundary even in a 360 degree projection image based on CMP, OHP, ECP, RSP, TPP, and the like.
- padding related information can be signaled through the bitstream.
- the padding related information may include whether padding has been performed, the position of the padding area or the padding size, and the like.
- the padding related information may be signaled in sequence, picture, slice or pace.
- information indicating whether padding was performed on the top boundary, bottom boundary, left boundary, or right boundary on a per-pace basis and the padding size may be signaled.
- a 360 degree image can be projected and converted into a two dimensional image composed of a plurality of faces.
- a 360 degree image can be projected and transformed into a two dimensional image composed of six faces.
- the six paces may be arranged in a 2x3 form, or in a 3x2 form, as in the example shown in Fig.
- FIG. 20 shows a 360-degree projection image in the form of 3 ⁇ 2.
- FIG. 20 six square faces of MxM size are illustrated as arranged in 3x2 form.
- the predetermined pace can be configured to include not only the area corresponding to the predetermined face but also the area adjacent to the corresponding area.
- a 360-degree image approximated to a cube can be projected and transformed onto a 2D plane such that one face on the cube becomes one face, as in the example shown in FIG.
- the Nth face of the cube may constitute the face of the index N of the 360 degree projection image.
- a face can be configured so that data of a plurality of faces are included in one face.
- the data of a plurality of surfaces may include at least a part of at least one of a surface located at the center of a predetermined face (hereinafter referred to as a 'center surface') and a plurality of surfaces adjacent to the center surface.
- a 'center surface' a predetermined face
- one face can be constructed using some data of the adjacent face adjacent to the center face in the 3D space and the center face.
- 21 is a diagram showing an example in which a plurality of data is included in one face.
- the face 0 may be configured to include a face located at the front face and at least a partial area of the face adjacent to the face located at the front face. That is, a 360 degree image may be projected and transformed so that at least some of the center planes of face 0 (i.e., the face located at the front face) and the center planes of face 2, face 3, face 4, have. Accordingly, a part of the data included in the face 0 may be overlapped with data included in the face 2, face 3, face 4, and face 5.
- each face is configured to include a plurality of faces.
- each face can be configured to include data for a plurality of planes.
- each face may be configured to include a center plane and a partial area of four sides adjacent to the center plane, as in the example shown in Fig.
- An area generated based on the adjacent surface adjacent to the center plane in the face may be defined as a padding area.
- the padding sizes for the vertical direction and the horizontal direction may have the same value.
- the padding size for the vertical and horizontal directions is illustrated as being set to k.
- the padding size for the vertical direction and the padding size for the horizontal direction may be set different from each other.
- the padding size for the vertical and horizontal directions may be adaptively set according to the position of the face.
- the padding size in the horizontal direction at the face located at the left or right boundary of the 360-degree projection image may be set larger than the padding size in the vertical direction.
- the padding size may be set differently for each face.
- the padding size in the horizontal direction at the face located at the left or right boundary of the 360-degree projection image may be set to be larger than the padding size in the horizontal direction at the other face.
- the predetermined face may be configured to include only a partial area of the adjacent face adjacent to the center face and the center face, or only a partial area of the adjacent face adjacent to the center face and the center face. That is, an area including the data of the adjacent faces only in the left and right or upper and lower sides of the face can be set.
- the number of adjacent faces included in each face may be set different from the example shown in Fig.
- the number of adjacent faces included in the face may be determined to be different.
- faces 2, 3, 4, and 5 in FIG. 22 located at the left and right boundaries of the image are configured to include a center plane and a partial area of the three sides adjacent to the center plane, while the remaining faces Faces 1 and 6 may be configured to include a center plane and a partial area of two sides adjacent to the center plane.
- a pace can be formed by adding a padding area outside the center plane while maintaining the size of the center plane. For example, by adding a k-sized padding region at the boundary of the center plane of the MxM size, it is possible to construct a face having a width and a height of M + 2k.
- the center plane may be resampled to a size smaller than the original size, padding may be performed on the remaining area in which the resampled image is disposed, and a predetermined pace may be formed.
- the center of the MxM size may be resampled to a size smaller than MxM, and the resampled image may be centered on the face.
- Resampling may be performed by interpolating a predetermined number of samples. At this time, at least one of the intensity, the number of taps, or the coefficient of the interpolation filter may be predefined, and may be adaptively determined according to the size of the face or the position of the resampling sample.
- information indicating at least one of the strength, the number of taps, or the coefficient of the interpolation filter may be signaled through the bit stream. Thereafter, padding is performed on the remaining area of the pace excluding the resampled image to form a face of MxM size.
- Resampling can be used to reduce the size of at least one of the width or height of the image corresponding to the center plane.
- resampling may be performed to make the width and height of the image corresponding to the front face smaller than M, as in the example shown in FIG. That is, a filter for resampling can be applied to both the horizontal direction and the vertical direction.
- resampling may be performed to keep the size of either the width or the height of the image corresponding to the center plane at M, while making the size of the other one smaller than M. That is, a filter for resampling can be applied only in the horizontal direction or the vertical direction.
- the padding may be performed using at least one of a sample (or block) located at the boundary of the center plane or a sample (or block) contained in the plane adjacent to the center plane.
- the value of the sample included in the padding region may be generated by copying a sample located at the boundary of the center plane or a sample contained in the plane adjacent to the center plane, Can be generated based on an averaging operation or a weighting operation of the samples included in the plane.
- the projection transformation method of constructing the face using the center plane and the adjacent faces adjacent to the center plane can be defined as Overlapped Face Projection. 21 to 23, the face overlap projection conversion method based on the CMP technique has been described.
- the face overlap projection conversion method can be applied to the projection conversion technique in which a plurality of face generation is caused.
- the face overlap projection conversion method may be applied to ISP, OHP, TPP, SSP, ECP, or RSP.
- padding can be set not to be performed between the current face and the neighbor face. That is, in performing the face overlap projection conversion, padding may not be performed on the boundary of the neighboring faces in both the 2D plane and the 3D space.
- 23 is a view showing an example in which padding is performed only at a partial boundary of the face in CMP.
- a face adjacent to both the center face of the current face and the 2D plane and the 3D space will be referred to as a common adjacent face.
- padding may not be performed at the boundary between the center plane and the common adjacent face.
- the 0-th surface is adjacent to both the 4-th surface and the 5-th surface in the 2D plane and the 3D space. Accordingly, padding may not be performed at the boundary between face 0 and face 4 and at the boundary between face 0 and face 5.
- the first surface is adjacent to the second surface and the third surface both in the 2D plane and the 3D space. Thus, padding may not be performed at the boundary between face 1 and face 2 and at the boundary between face 1 and face 3.
- padding using data of the adjacent face can be performed. Specifically, since the 0th face and the 1th face are not mutually common adjacent faces, a padding region using the data of the adjacent faces (more specifically, faces 2 and 3) can be added to the upper and lower boundaries of the 0th face have. Likewise, a padding area using data of adjacent faces (more specifically, faces 4 and 5) may be added to the upper and lower boundaries of the first face.
- padding may be performed only on a partial boundary of the center plane while maintaining the center plane at the size of MxM.
- a 360 degree projection image of size (3M + 2k) x (2M + 4k) can be obtained by adding a k-sized padding region to the remaining boundary except for the boundary between the center plane and the common adjacent plane.
- the 0th face and 1th face have (M-2k) xM size while the 2, 3, 4, and 5 faces have (Mk) xM size.
- a small-sized face can be resampled to the size of another face, or a large-sized face can be resampled to the size of another face.
- the 0th face and the 1th face may be resampled to (M-k) xM so that all faces have a size of (M-k) xM.
- all faces may be resampled to a predetermined size square (e.g., MxM, etc.).
- padding may be set not to be performed at the left and right boundaries of the 360 degree projection image. That is, padding may be performed only on the upper and lower boundaries for all faces, and padding may not be performed on the left and right boundaries.
- FIG. 23 shows an example in which paces are arranged in a 3x2 form
- the above embodiments can be applied even when paces are arranged in a 2x3 form.
- an example in which the figures shown in Figs. 22 and 23 are rotated clockwise or counterclockwise by 90 degrees can be applied to a 2x3 array.
- a padding area may be added to the upper and lower sides of the paces arranged in the middle row.
- a padding area may be added to the left and right boundaries and the upper left border, or a padding area may be added only to the left and right boundaries.
- a padding area may be added to left and right boundaries and a lower border except for the upper border, or a padding area may be added to only the left and right boundaries.
- the ECP is a method of approximating a sphere-shaped 360-degree image in the form of a cylinder, and converting a cylinder-shaped 360-degree video into a 2D projection. Specifically, a circle corresponding to a cylinder upper surface (hereinafter referred to as a higher circle) and a circle corresponding to a cylinder lower surface (hereinafter referred to as a lower circle) can be converted into a rectangular shape.
- 24 is a diagram showing an example of converting the upper circle and the lower circle of the cylinder into a rectangular shape.
- the region having a higher latitude than the predefined latitude with the predefined latitude of the sphere can be converted into the upper and lower sides of the cylinder, respectively, and the remaining region can be converted into the cylinder body.
- the predefined latitude is 41.81 degrees, but it is also possible to set the previously defined latitude.
- the upper and lower circles of the cylinder can be transformed into a rectangular shape.
- the upper and lower circles are converted into squares having the same length as the diameter of one side.
- the body of the cylinder can be deployed in a rectangular plane similar to the ERP.
- the cylinder body of the rectangular shape can be divided into a plurality of faces.
- 25 is a view showing a 360-degree projection image based on ECP.
- the upper and lower circles of the cylinder can be converted into a rectangular shape, and each converted square can be set as a face.
- each converted square can be set as a face.
- the face corresponding to the upper circle is set to face 0
- the face corresponding to the lower circle is set to face 1.
- the converted rectangle can be divided into a plurality of faces.
- the cylinder body is illustrated as being divided into four faces (face 2 to face 5).
- each face can be placed on a 2D plane to obtain a 360 degree projection image.
- the widths of face 0 and face 1 are equal to the width of the cylinder body (i.e., the width of face 2 to face 5), as in the example shown in FIG. 25 (a)
- the face 2 to the face 5 corresponding to the cylinder body in the 360 degree projection image.
- each face may be arranged in 3x2 or 2x3 form.
- three of the four paces corresponding to the face body (face, 2, 3, 4) are arranged in a row and the remaining one
- the face (face 5) and the face 0 corresponding to the upper circle and the face 1 corresponding to the lower circle can be arranged in a line.
- each side of the face 0 corresponding to the upper circle has continuity with four face upper sides corresponding to the face body in the 3D space
- each side of the face 1 corresponding to the lower circle has four faces corresponding to the face body in the 3D space It has continuity with the bottom side. Accordingly, in consideration of the continuity in the 3D space, one remaining face (face 5) corresponding to the torso, face 0 and face 1 can be arranged in a line.
- three faces arranged in a line among four faces corresponding to the face body have continuity in both the 2D plane and the 3D plane. Accordingly, three faces arranged in a line among the four faces corresponding to the face body can be redefined as one face.
- three faces arranged in a line among four faces corresponding to the face body are defined as a front face, and the remaining one face is defined as a back face .
- the top face corresponds to the upper circle
- the bottom face corresponds to the lower circle.
- 25 (a) to 25 (c) may be set to be sequentially performed in the frame packing process.
- a padding area may be added at the boundary of the face. At this time, as described with reference to FIG. 23, padding can be set not to be performed at the boundary between the current face and the common adjacent face.
- 26 is a view showing an example in which padding is performed only at a part of the boundary of the face in the ECP.
- a padding area is added at the boundary of the current face, but no padding is performed at the boundary between the current face and the common adjacent face.
- the back face is continuous in both the top face and the bottom face, the 2D plane, and the 3D space. Accordingly, as in the example shown in FIG. 26, the padding area may not be added to the boundary between the back face and the top face, and between the back face and the bottom face.
- the front face does not have continuity with neighboring paces in 3D space, so padding areas can be added to all borders of the front face.
- the sizes of the faces at the bottom row may differ.
- the size of the back face may be (M-2k) xM while the size of the top face and bottom face may be (M-k) xM.
- the backface can be resampled to match the size of the top and bottom faces, so that the faces all have the same size (e.g., (M-k) xM).
- padding may be set not to be performed at the left and right boundaries of the 360 degree projection image. That is, padding may be performed only on the upper and lower boundaries for all faces, and padding may not be performed on the left and right boundaries.
- Information regarding the face overlap projection conversion method can be signaled through the bit stream.
- the information on the face overlap projection conversion method includes information indicating whether or not the face overlap projection conversion method is used, information indicating the number of adjacent faces included in the face, information indicating whether or not the padding area exists, Information indicating padding size, information indicating whether a padding area is set between the current face and the common adjacent face, or information indicating whether face resampling has been performed.
- the 360 degree video play device may perform decoding / frame depacking on the 360 degree projection image using information on the face overlap projection transformation method.
- the upper and lower circles of the cylinder may be converted into rectangles, and one side of the rectangle may be converted into a curved surface so that the 360-degree video of the cylinder shape can be 2D-transformed.
- frame packing can be performed by converting the upper and lower circles of the cylinder into an arc shape.
- FIG. 27 is a view showing an example in which frame packing is performed in a state where an upper circle and a lower circle of a cylinder are converted into an arc shape.
- one of the upper and lower circles of the cylinder may be converted into a curved surface and then frame packing may be performed.
- the upper circle and the lower circle are converted into a rectangular shape
- the portion corresponding to one side of the rectangle is set as a curved surface
- the paces converted to the arc shape can be disposed on the left and right sides of the 360 degree projection image.
- the top face disposed on the left side of the 360 degree projection image the left boundary of the rectangle is set to be converted into a curved surface shape.
- the bottom face disposed on the right side of the 360 degree projection image Can be set to be converted.
- the lower end of the 360 degree projection image can have an arc shape at the left and right edges.
- the 360 degree projection image when the 360 degree projection image is reversely projected to the 360 degree projection image, an area that is not mapped to any part of the 360 degree image occurs.
- the area between the top face and bottom face boundary of the bottom face and the boundary of the 360 degree projection image may be an area that is not used in the rendering of the 360 degree image.
- An area not used for rendering a 360-degree image may be referred to as a 'rendering padding region' or an 'inactive region'.
- the samples in the rendering padding region or the inactive region may be referred to as an inactive sample.
- Inactive samples can be set to predefined values.
- the predefined value can be determined by the bit depth.
- the predefined value may be a median, a minimum, or a maximum of a range of sample values allowed by the bit depth.
- the inactive sample may be determined based on at least one of a sample adjacent to the curved surface boundary adjacent to the inactive region or a sample adjacent to the boundary of the face spatially adjacent to the curved surface boundary in 3D space.
- an inactive sample may be generated by copying a sample adjacent to a curved surface boundary adjacent to the inactive area, or may be set to an average value, a minimum value, or a maximum value of samples adjacent to the curved surface boundary.
- the inactive sample may be generated by copying samples adjacent to at least a portion of the top boundary or bottom boundary of the Front face spatially adjacent to the curved boundary in the 3D space, or at least a portion of the top boundary or bottom boundary of the Front face Minimum, or maximum value of the samples to be analyzed.
- the value of the inactive sample may be determined by averaging or weighted summing the samples adjacent to the curved boundary and at least a portion of the top boundary or bottom boundary of the Front face.
- the projection transformation technique in which the upper circle and the lower circle of the cylinder are converted into the arch shape can be referred to as a modified ECP.
- padding can be set not to be performed at the boundary between the current face and the common adjacent face.
- FIGS. 28 and 29 are diagrams illustrating an example in which padding is performed only on a partial boundary of a face in a modified ECP.
- padding may be performed on the upper and lower faces of the 360 degree projection image. At this time, a padding area is added to the boundary of the current face, but no padding is performed at the boundary between the current face and the common adjacent face.
- the back face is continuous in both the top face and the bottom face, the 2D plane and the 3D space. Accordingly, as in the example shown in FIG. 28, a padding area may not be added to the boundary between the back face and the top face, and between the back face and the bottom face.
- the front face does not have continuity with neighboring paces in 3D space, so padding areas can be added to all borders of the front face.
- padding may not be performed on the left and right boundaries of the 360-degree projection image.
- padding can be performed along the boundary of the curved surface. Accordingly, the padding size for the horizontal direction of the top face and the bottom face may increase or decrease along the curved surface.
- padding when padding is not performed on the left and right boundaries of the 360-degree projection image, padding may not be performed at the arch ends of the top face and the bottom face.
- the sizes of the top face and the bottom face may be larger than the back face.
- the size of the top face and the bottom face can be (M-k) xM when the inactive area is included, while the size of the back face is (M-2k) xM.
- the size of the back face is (M-2k) xM.
- by resampling the back face to the size of the top face and the bottom face it is possible to set all the faces to have the same size (e.g., (Mk) xM), or resample the top face and the bottom face to the size of the back face ,
- the sephace may all have the same size (e.g., (M-2k) xM).
- the width of the front face can be set to the sum of the widths of the top face, back face, and bottom face.
- RSP is a projection transformation technique that transforms a 360 degree image projected into a sphere into two faces.
- the RSP method has the advantage that the number of paces is smaller than that of CMP, ISP or OHP, discontinuous data between paces is reduced, and face artifacts can be reduced.
- the two faces based on the RSP can be composed of a predefined region of a 360 degree image and a predefined region of a 360 degree image rotated by a predetermined angle in a predetermined direction.
- the positions and sizes of predefined regions for generating two faces based on the RSP can be the same before and after the rotation of the 360-degree image.
- FIG. 30 shows two phases of a 360 degree projection image based on RSP.
- the center region of the 360 degree projection image generated by projection transformation based on the ERP based on the 360 degree image is cropped to constitute the top face based on the RSP.
- the 360-degree image is rotated 90 degrees in the X-axis direction and 180 degrees in the Y-axis direction, and the central area of the 360-degree projection image generated by projective transformation based on the ERP is cropped, Can be configured.
- the central region may be a region including a range of latitude of 90 degrees in a 360 degree image, a range of 45 degrees in the north and 45 degrees in the north, and a range of -135 degrees to 135 degrees in the longitude.
- Both ends of the RSP-based top and bottom faces may be in the form of an arc.
- an image obtained by converting both sides of a rectangular image, which is a central region of a 360 degree projection image, into a curved surface can be set as an upper face or a lower face.
- the top face and bottom face may be schematized into two faces surrounding the ball, such as a tennis ball.
- a 360-degree projection image based on the RSP may cause a rendering padding area that is not mapped to any part of the 360-degree image if the 360-degree projection image is projected back to the 360- have.
- the remaining area excluding the rendering padding area i.e., area used for rendering the 360-degree image
- a 2D data area is referred to as a 2D data area.
- 31 is a diagram illustrating a rendering padding area in a 360-degree projection image based on an RSP.
- an area U0-U3 between the curved surface boundary of the upper face and the boundary of the 360 degree projection image may correspond to the rendering padding area.
- regions D0-D3 between the curved boundary of the lower face and the boundary of the 360 degree projection image may correspond to the rendering padding region.
- the remaining regions (i.e., D and L), except for the rendering padding, may correspond to 2D data regions.
- the face overlap projection conversion method may be applied to the 360-degree projection image based on the RSP. That is, horizontal and / or vertical padding regions may be added for the top face and / or the bottom face.
- Inactive samples in the rendering padding region can be set to predefined values.
- the predefined value can be determined by the bit depth.
- the predefined value may be a median, a minimum, or a maximum of a range of sample values allowed by the bit depth.
- the inactive sample may be determined based on at least one of a sample adjacent to the curved surface boundary adjacent to the inactive region or a sample adjacent to the boundary of the face spatially adjacent to the curved surface boundary in 3D space.
- an inactive sample may be generated by copying a sample adjacent to a curved surface boundary adjacent to the inactive area, or may be set to an average value, a minimum value, or a maximum value of samples adjacent to the curved surface boundary.
- the inactive sample may be generated by copying a sample adjacent to the boundary of the face spatially adjacent to the curved surface boundary in the 3D space, or an average value of the samples adjacent to the boundary of the face spatially adjacent to the curved surface boundary in the 3D space, Or a maximum value.
- the value of the inactive sample may be determined through an average value or a weighted sum operation of the sample adjacent to the boundary of the curve boundary and the sample adjacent to the boundary of the space spatially adjacent to the curve boundary in the 3D space.
- top and bottom faces may be configured so that rendering padding areas do not occur. That is, instead of setting the crop area in the ERP image such that the upper boundary of the upper face and the lower boundary of the lower face have an arc shape as in the example shown in FIG. 30, the crop area in the ERP image is set such that the upper face and the lower face are rectangular. Can be set.
- data overlapping the bottom face can be added to the rendering padding area of the top face
- data overlapping with the top face can be added to the rendering padding area of the bottom face
- 32 is a diagram showing 2D data that replaces the rendering padding area.
- some data of the lower face may be copied to form the upper padded rendering padding areas U0-U3, and some data of the upper face may be copied to render padding areas D0-D3 .
- the sample value of the 2D data area overlapping the rendering padding area may be updated based on the sample value of the rendering padding area.
- the sample value of the 2D data area can be updated by weighted prediction of the sample value of the decoded 2D data area and the sample value of the rendering padding area.
- the sample value of the 2D data region in the bottom face may be generated based on a weighted prediction of the sample value of the rendering padding region of the top face and the decoded sample value in the bottom face.
- the sample value of the 2D data area in the top face can be generated based on the weighted prediction of the sample value of the rendering padding area of the bottom face and the decoded sample value in the top face.
- Equation (6) shows an example of updating the sample value in the 2D data area overlapping with the rendering padding area of the upper face at the bottom face.
- L j denotes a decoded sample value in the 2D data area of the lower face
- U i j denotes a sample value in the rendering padding area of the upper face
- i represents the index (0-3) of the rendering padding area.
- Equation (7) shows an example of updating the sample value in the 2D data area overlapping with the rendering padding area of the lower face in the upper face.
- U j in Equation (7) represents the sample values in the decoded 2D data area of the top face
- Di j denotes the rendering padding area within the sample values of the upper face
- i represents the index (0-3) of the rendering padding area.
- the weight given to each sample may be predefined in the encoder and the decoder. Alternatively, the weights may be determined based on the position of the rendered padding region and the position of the sample in the rendered padding region. Alternatively, information for determining a weight may be signaled via a bitstream. The decoder may decode the information to determine weights applied to each sample.
- the rendering padding area has been described taking the ECP and RSP techniques as an example.
- the matters related to the rendering padding area described above can be applied not only to ECP and RSP techniques, but also to projection transformation techniques such as SSP including a face of a curved surface.
- the 360 degree projection image can be encoded / decoded by applying the 2D image encoding / decoding technique.
- the encoder and decoder described with reference to FIGS. 1 and 2 may be used for encoding / decoding a 360 degree projection image.
- an inter prediction technique that can be applied when encoding / decoding a 360 degree projection image will be described in detail.
- 33 is a flowchart showing an inter prediction method according to an embodiment to which the present invention is applied.
- motion information of a current block can be determined (S3310).
- the motion information of the current block may include at least one of a motion vector relating to the current block, a reference picture index of the current block, or an inter prediction direction of the current block.
- the motion information of the current block may be obtained based on at least one of information signaled through a bitstream or motion information of a neighboring block neighboring the current block.
- FIG. 34 illustrates a process of deriving motion information of a current block when a merge mode is applied to the current block.
- the merge mode indicates a method of deriving motion information of a current block from a neighboring block.
- a spatial merge candidate may be derived from the spatially neighboring block of the current block (S3410).
- Spatial neighbor blocks may include at least one of the top, left, or adjacent blocks of the current block (e.g., at the top left corner, the top right corner, or the bottom left corner) of the current block.
- 35 is a diagram showing an example of a spatial neighboring block.
- the spatial neighboring block includes a neighboring block A 1 neighboring the left side of the current block, a neighboring block B 1 neighboring the upper end of the current block, a neighboring block B 1 neighboring the upper left corner of the current block, An adjacent block A 0 , a neighboring block B 0 adjacent to the upper right corner of the current block, and a neighboring block B 2 neighboring the upper left corner of the current block.
- 35 may be further expanded so that at least one of a block neighboring the upper left sample of the current block, a block neighboring the upper center sample, or a block adjacent to the upper right sample of the current block is referred to as a block adjacent to the upper end of the current block , At least one block adjacent to the upper left sample of the current block, a block neighboring the left center sample, or a block adjacent to the lower left sample of the current block may be defined as a block adjacent to the left of the current block.
- a spatial merge candidate may be derived from non-contiguous spatial non-neighboring blocks. For example, a block located on the same vertical line as the block adjacent to the top, right upper corner or left upper corner of the current block, block located on the same horizontal line as the block adjacent to the left, lower left corner, Alternatively, the spatial merge candidate of the current block may be derived using at least one of the blocks located on the same diagonal line as the block adjacent to the corner of the current block. As a specific example, if an adjacent block adjacent to the current block can not be used as a merge candidate, a block that is not adjacent to the current block can be used as a merge candidate of the current block.
- the motion information of the spatial merge candidate may be set to be the same as the motion information of the spatial neighboring block.
- Spatial merge candidates can be determined by searching for neighboring blocks in a predetermined order. For example, a search for spatial merge candidates can be performed in the order of A 1 , B 1 , B 0 , A 0, and B 2 blocks. At this time, the B 2 block can be used when at least one of the remaining blocks (i.e., A 1 , B 1 , B 0, and A 0 ) is not present or at least one is coded in the intra prediction mode.
- the search order of the spatial merge candidate may be as previously defined in the encoder / decoder. Alternatively, the search order of the spatial merge candidate may be determined adaptively according to the size or type of the current block. Alternatively, the search order of the spatial merge candidate may be determined based on the information signaled through the bit stream.
- the temporal merge candidate may be derived from the temporally neighboring block of the current block (S3420).
- the temporal neighbor block may refer to a co-located block included in the collocated picture.
- a collocated picture has a picture order count (POC) different from the current picture including the current block.
- the collocated picture can be determined as a picture having a predefined index in the reference picture list or a picture having the smallest output order (POC) difference from the current picture.
- the collocated picture may be determined by the information signaled from the bitstream.
- the information signaled from the bitstream may include information indicating a reference picture list (for example, a L0 reference picture list or an L1 reference picture list) including a collocated picture and / or an index indicating a collocated picture in a reference picture list . ≪ / RTI >
- the information for determining the collocated picture may be signaled in at least one of a picture parameter set, a slice header, or a block level.
- the temporal merge candidate motion information can be determined based on the motion information of the collocated block.
- the temporal merge candidate motion vector may be determined based on the motion vector of the collocated block.
- the temporal merge candidate motion vector may be set equal to the motion vector of the collocated block.
- the temporal merge candidate motion vector may be based on the difference in the output order (POC) between the current picture and the reference picture of the current block and / or the output order (POC) difference between the reference picture of the collocated picture and the collocated picture. And then scaling the motion vector of the collocated block.
- POC output order
- 36 is a diagram for explaining an example of deriving a motion vector of a temporal merging candidate.
- tb represents the POC difference between the current picture (curr_pic) and the current picture reference picture (curr_ref)
- td represents the POC difference between the collocated picture col_pic and the reference picture of the collocated block col_ref).
- the temporal merge candidate motion vector may be derived by scaling the motion vector of the collocated block (col_PU) based on tb and / or td.
- both the motion vector of the collocated block and the motion vector scaled by the collocated block may be used as a motion vector of the temporal merging candidate.
- a motion vector of a collocated block may be set as a motion vector of a first temporal merge candidate, and a value obtained by scaling a motion vector of the collocated block may be set as a motion vector of a second temporal merge candidate.
- the inter prediction direction of the temporal merge candidate may be set equal to the inter prediction direction of the temporal neighbor block.
- the reference picture index of the temporal merge candidate may have a fixed value.
- the reference picture index of the temporal merge candidate may be set to '0'.
- the reference picture index of the temporal merging candidate may be adaptively determined based on at least one of the reference picture index of the spatial merge candidate and the reference picture index of the current picture.
- the collocated block may be determined to be any block in the block having the same position and size as the current block in the collocated picture or a block adjacent to the block having the same position and size as the current block.
- 37 is a diagram showing the positions of candidate blocks that can be used as a collocated block.
- the candidate block may include at least one of a block adjacent to the upper left corner position of the current block in the collocated picture, a block adjacent to the center sample position of the current block, or a block adjacent to the lower left corner position of the current block.
- the candidate block includes a block TL including the upper left sample position of the current block in the collocated picture, a block BR including the lower right sample position of the current block, a block BR adjacent to the lower right corner of the current block A block C3 containing the center sample position of the current block or a block adjacent to the center sample of the current block such as a sample position spaced apart from the center sample of the current block by (-1, -1) (C0).
- the block including the position of the neighboring block adjacent to the predetermined boundary of the current block in the collocated picture may be selected as the collocated block.
- the number of temporal merge candidates can be one or more. As an example, based on one or more collocated blocks, one or more temporal merge candidates may be derived.
- the maximum number of temporal merge candidates may be encoded and signaled by the encoder.
- the maximum number of temporal merge candidates may be derived based on the maximum number of merge candidates that can be included in the merge candidate list and / or the maximum number of spatial merge candidates.
- the maximum number of temporal merge candidates may be determined based on the number of collocated blocks available.
- any one of the C3 block and the H block may be referred to as a collocated block You can decide. If an H block is available, the H block may be determined as a collocated block. On the other hand, when the H block is unavailable (for example, when the H block is coded by the intra prediction, the H block is not available, or the H block is located outside the largest coding unit (LCU) , Etc.), the C3 block may be determined as a collocated block.
- LCU largest coding unit
- the non-available block is replaced with another available block .
- another available block E.g., C0 and / or C3 adjacent to the center sample position of the current block in the collocated picture or at least one block (e.g., TL) adjacent to the upper left corner position of the current block One can be included.
- the merge candidate list including the spatial merge candidate and the temporal merge candidate may be generated (S3430).
- Information regarding the maximum number of merge candidates may be signaled through the bitstream.
- information indicating the maximum number of merge candidates may be signaled through a sequence parameter or a picture parameter. For example, if the maximum number of merge candidates is 5, then 5 can be selected as the sum of the spatial merge candidate and the temporal merge candidate. For example, four of the five spatial merge candidates may be selected, and one of the two temporal merge candidates may be selected. If the number of merge candidates included in the remainder candidate list is smaller than the maximum number of merge candidates, a merge candidate having a combined merge candidate combining two or more merge candidates or a (0, 0) motion vector May be included in the merge candidate list.
- the merge candidate may be included in the merge candidate list according to the predefined priority. The higher the priority, the smaller the index assigned to the merge candidate.
- the spatial merge candidate may be added to the merge candidate list earlier than the temporal merge candidate.
- the spatial merge candidates may also include a spatial merge candidate of the left neighboring block, a spatial merge candidate of the upper neighboring block, a spatial merge candidate of the block adjacent to the upper right corner, a spatial merge candidate of the block adjacent to the lower left corner, Can be added to the merge candidate list in the order of the spatial merge candidate of the block.
- the priority among the merge candidates may be determined according to the size or type of the current block. For example, if the current block is of a rectangular shape with a width greater than the height, the spatial merge candidate of the left neighboring block may be added to the merge candidate list before the spatial merge candidate of the upper neighboring block. On the other hand, if the current block is of a rectangular shape having a height greater than the width, the spatial merge candidate of the upper neighboring block may be added to the merge candidate list before the spatial merge candidate of the left neighboring block.
- the priority among the merge candidates may be determined according to the motion information of each merge candidate. For example, a merge candidate with bi-directional motion information may have a higher priority than a merge candidate with unidirectional motion information. Accordingly, the merge candidate having bidirectional motion information can be added to the merge candidate list before merge candidate having unidirectional motion information.
- the merge candidates may be rearranged.
- Rearrangement can be performed based on motion information of merge candidates.
- the rearrangement may be performed based on at least one of whether the merge candidate has bidirectional motion information, the size of the motion vector, or the temporal order (POC) between the current picture and the merge candidate's reference picture.
- POC temporal order
- rearrangement can be performed so as to have a higher priority than a merge candidate having a unidirectional merge candidate after merge having bidirectional motion information.
- the motion information of the current block may be set to be the same as the motion information of the merge candidate specified by the merge candidate index (S3450).
- the motion information of the current block can be set to be the same as the motion information of the spatial neighboring block.
- the motion information of the current block may be set to be the same as the motion information of the temporally neighboring block.
- FIG. 38 shows a process of deriving motion information of a current block when the AMVP mode is applied to the current block.
- At least one of the inter prediction direction of the current block or the reference picture index can be decoded from the bit stream (S3810). That is, when the AMVP mode is applied, at least one of the inter prediction direction of the current block or the reference picture index may be determined based on the information encoded through the bit stream.
- the spatial motion vector candidate can be determined based on the motion vector of the spatial neighboring block of the current block (S3820).
- the spatial motion vector candidate may include at least one of a first spatial motion vector candidate derived from the top neighboring block of the current block or a second spatial motion vector candidate derived from the left neighboring block of the current block.
- the upper neighbor block includes at least one of the blocks adjacent to the upper or upper right corner of the current block
- the left neighbor block of the current block includes at least one of blocks adjacent to the left or lower left corner of the current block .
- a block adjacent to the upper left corner of the current block may be treated as a top neighboring block, or it may be treated as a left neighboring block.
- a spatial motion vector candidate may be derived from a spatial non-neighboring block that is not neighboring the current block. For example, a block located on the same vertical line as the block adjacent to the top, right upper corner or left upper corner of the current block, block located on the same horizontal line as the block adjacent to the left, lower left corner, Alternatively, a spatial motion vector candidate of the current block may be derived using at least one of the blocks located on the same diagonal line as the block adjacent to the corner of the current block. If the spatial neighboring block is not available, the spatial non-neighboring block can be used to derive the spatial motion vector candidate.
- two or more spatial motion vector candidates may be derived using spatial neighbor blocks and spatial non-neighbor blocks.
- a first spatial motion vector candidate and a second spatial motion vector candidate are derived based on neighboring blocks adjacent to a current block, while a neighboring block neighboring to the current block is based on To derive a third spatial motion vector candidate and / or a fourth spatial motion vector candidate.
- the spatial motion vector may be obtained by scaling the motion vector of the spatial neighboring block.
- the temporal motion vector candidate can be determined based on the motion vector of the temporally neighboring block of the current block (S3830). If the reference picture between the current block and the temporal neighboring block is different, the temporal motion vector may be obtained by scaling the motion vector of the temporal neighboring block. At this time, temporal motion vector candidates can be derived only when the number of spatial motion vector candidates is equal to or less than a predetermined number.
- a motion vector candidate list including a spatial motion vector candidate and a temporal motion vector candidate may be generated (S3840).
- At least one of the motion vector candidates included in the motion vector candidate list can be specified based on information specifying at least one of the motion vector candidate lists (S3850).
- a motion vector candidate specified by the information is set as a motion vector prediction value of the current block, a motion vector difference value is added to the motion vector prediction value, and a motion vector of the current block is obtained.
- the motion vector difference value can be parsed through the bit stream.
- the motion compensation for the current block can be performed based on the obtained motion information (S3320). More specifically, motion compensation for the current block can be performed based on the inter prediction direction of the current block, the reference picture index, and the motion vector.
- the current block can be reconstructed based on the generated prediction sample. Specifically, a reconstructed sample can be obtained by summing the predicted sample and the residual sample of the current block.
- the motion vector of the current block may include an L0 motion vector for the L0 reference picture and an L1 motion vector for the L1 reference picture.
- a symmetric mode can be used. Under symmetric mode, the L0 motion vector and the L1 motion vector are assumed to be symmetric, and an Ll motion vector may be derived from the L0 motion vector or a L0 motion vector may be derived from the L1 motion vector.
- symmetric_flag indicates whether or not the current block is subjected to a symmetric mode.
- symmetric mode may be allowed only in merge mode, or may be allowed only in AMVP mode.
- 39 is a view showing an example in which the symmetric mode is applied under the merge mode.
- first motion information of the current block can be derived from the merge candidate (S3950).
- the L0 motion vector of the current block may be set equal to the L0 motion vector of the merge candidate.
- the second direction motion information of the current block can be derived based on the derived first direction motion information (S3960). Specifically, based on the symmetry of the first direction motion vector and the second direction motion vector, a vector having the same absolute value as the first direction motion vector but opposite in direction can be set as the second direction motion vector. For example, if the L0 motion vector of the current block derived from the merge candidate is (mv x0 , mv y0 ), the L1 motion vector of the current block may be determined as (-mv x0 , -mv y0 ).
- the second direction reference picture index of the current block may be set to the same value as the first direction reference picture index.
- the L1 reference picture index may be set to the same value as the L0 reference picture index of the current block derived from the merge candidate.
- the second direction reference picture (or the second direction reference picture index) may be determined based on the output order (POC) difference between the current picture and the first direction reference picture. For example, when the output sequence difference between the current picture and the L0 reference picture is td, an L1 reference picture having an output sequence difference of td from the current picture in the L1 reference picture list or a reference picture whose output sequence difference is closest to td is referred to as an L1 reference picture .
- POC output order
- the second direction reference picture index may be set so as to indicate a reference picture of a predetermined order in the second direction reference picture list.
- the L1 reference picture index may have a value indicating the first reference picture, the first long-term reference picture, or the last reference picture in the L1 reference picture.
- the second direction motion vector may be derived based on the first motion vector, while the second direction reference picture index may be obtained from the bitstream.
- the first direction motion vector is scaled and the second direction A motion vector may be derived.
- the second direction motion vector is set in a direction opposite to the first direction motion vector, and the size of the second direction motion vector can be obtained by scaling the first direction motion vector.
- 40 is a view showing an example in which the symmetric mode is applied under the AMVP mode.
- the first direction motion information of the current block can be decoded from the bitstream (S4050).
- the first direction motion information may include information for specifying at least one of the motion vector candidates included in the motion vector candidate list and information on the motion vector difference value.
- the first direction motion vector of the current block may be obtained by setting a motion vector candidate specified by the information as a motion vector prediction value and adding a motion vector difference value to the motion vector prediction value.
- the second direction motion information of the current block may be derived based on the first direction motion information (S4060). Specifically, based on the symmetry of the first direction motion vector and the second direction motion vector, a vector having the same absolute value as the first direction motion vector but opposite in direction can be set as the second direction motion vector.
- the L0 motion vector of the current block obtained based on the motion vector candidate mvp and the motion vector difference mvd is (mvp x0 + mvd x0 , mvp y0 + mvd y0 )
- the L1 motion vector of the current block is (-mvp x0 -mvd x0 , mvp y0 -mvd y0 ).
- the first direction motion information to be decoded from the bitstream may include a first direction reference picture index for specifying a first direction reference picture.
- the second direction reference picture index of the current block may be set to the same value as the first direction reference picture index.
- the L1 reference picture index can be set to the same value as the L0 reference picture index of the current block obtained from the bit stream.
- the second direction reference picture (or the second direction reference picture index) may be determined based on the output order (POC) difference between the current picture and the first direction reference picture. For example, when the output sequence difference between the current picture and the L0 reference picture is td, an L1 reference picture having an output sequence difference of td from the current picture in the L1 reference picture list or a reference picture whose output sequence difference is closest to td is referred to as an L1 reference picture .
- POC output order
- the first direction reference picture index and the second direction reference picture index may be set to have predetermined values.
- the L0 reference picture index and the L1 reference picture index may be set to have values indicating the first reference picture in the reference picture, the first longtimest reference picture, or the last reference picture.
- the second direction motion vector may be derived based on the first motion vector, while the second direction reference picture index may be obtained from the bitstream.
- the first direction motion vector is scaled and the second direction A motion vector may be derived.
- the second direction motion vector is set in a direction opposite to the first direction motion vector, and the size of the second direction motion vector can be obtained by scaling the first direction motion vector.
- each of the components (for example, units, modules, etc.) constituting the block diagram may be implemented by a hardware device or software, and a plurality of components may be combined into one hardware device or software .
- the above-described embodiments may be implemented in the form of program instructions that may be executed through various computer components and recorded in a computer-readable recording medium.
- the computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination.
- Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.
- the hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.
- the present invention can be applied to an electronic device capable of encoding / decoding an image.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2017-0121455 | 2017-09-20 | ||
| KR20170121455 | 2017-09-20 | ||
| KR20170122108 | 2017-09-21 | ||
| KR10-2017-0122108 | 2017-09-21 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019059646A1 true WO2019059646A1 (fr) | 2019-03-28 |
Family
ID=65811467
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2018/011069 Ceased WO2019059646A1 (fr) | 2017-09-20 | 2018-09-19 | Procédé et dispositif de traitement de signal vidéo |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20190033029A (fr) |
| WO (1) | WO2019059646A1 (fr) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114009033A (zh) | 2019-05-30 | 2022-02-01 | 北京达佳互联信息技术有限公司 | 用于用信号通知对称运动矢量差模式的方法和装置 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014014276A1 (fr) * | 2012-07-17 | 2014-01-23 | 한국전자통신연구원 | Procédé de filtrage en boucle et appareil associé |
| US20170214937A1 (en) * | 2016-01-22 | 2017-07-27 | Mediatek Inc. | Apparatus of Inter Prediction for Spherical Images and Cubic Images |
-
2018
- 2018-09-19 WO PCT/KR2018/011069 patent/WO2019059646A1/fr not_active Ceased
- 2018-09-19 KR KR1020180112495A patent/KR20190033029A/ko not_active Withdrawn
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014014276A1 (fr) * | 2012-07-17 | 2014-01-23 | 한국전자통신연구원 | Procédé de filtrage en boucle et appareil associé |
| US20170214937A1 (en) * | 2016-01-22 | 2017-07-27 | Mediatek Inc. | Apparatus of Inter Prediction for Spherical Images and Cubic Images |
Non-Patent Citations (3)
| Title |
|---|
| ABBAS, ADEEL ET AL.: "AHG8: Rotated Sphere Projection for 360 Video", JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3, 7 April 2017 (2017-04-07), Hobart, AU, pages 1 - 7 * |
| DSOUZA, AMITH ET AL.: "AHG8: Padding Investigation in Compact ISP Format", JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3, 21 July 2017 (2017-07-21), Torino, IT, pages 1 - 7 * |
| ZHANG, CHUANYI ET AL.: "AHG8: Padding Method for Segmented Sphere Projection", JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3, 7 April 2017 (2017-04-07), Hobart, AU, pages 1 - 6 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20190033029A (ko) | 2019-03-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019132577A1 (fr) | Procédé et dispositif d'encodage et de décodage d'image, et support d'enregistrement avec un train de bits stocké dedans | |
| WO2020226424A1 (fr) | Procédé et dispositif de codage/décodage d'image pour réaliser une mip et une lfnst, et procédé de transmission de flux binaire | |
| WO2021040480A1 (fr) | Dispositif et procédé de codage vidéo sur la base d'un filtrage en boucle | |
| WO2019050292A1 (fr) | Procédé et dispositif de traitement de signal vidéo | |
| WO2019017651A1 (fr) | Procédé et appareil de codage/décodage d'image | |
| WO2018026219A1 (fr) | Procédé et dispositif de traitement de signal vidéo | |
| WO2017222326A1 (fr) | Procédé et dispositif de traitement de signal vidéo | |
| WO2020246806A1 (fr) | Dispositif et procédé de prédiction intra basée sur matrice | |
| WO2018066988A1 (fr) | Procédé et appareil de codage et de décodage de données d'images | |
| WO2018048078A1 (fr) | Procédé de codage/décodage d'image multivue synchronisée à l'aide d'informations de structure spatiale et appareil associé | |
| WO2017204532A1 (fr) | Procédé de codage/décodage d'images et support d'enregistrement correspondant | |
| WO2017086747A1 (fr) | Procédé et dispositif pour coder/décoder une image à l'aide d'une image géométriquement modifiée | |
| WO2019009590A1 (fr) | Procédé et dispositif de décodage d'image au moyen d'une unité de partition comprenant une région supplémentaire | |
| WO2017090988A1 (fr) | Procédé de codage/décodage vidéo à points de vue multiples | |
| WO2018047995A1 (fr) | Procédé de traitement d'image basé sur un mode d'intraprédiction et appareil associé | |
| WO2018056701A1 (fr) | Procédé et appareil de traitement de signal vidéo | |
| WO2019050291A1 (fr) | Procédé et dispositif de traitement de signal vidéo | |
| WO2018066958A1 (fr) | Procédé et appareil de traitement de signal vidéo | |
| WO2021060834A1 (fr) | Procédé et dispositif de codage/décodage d'image basé sur une sous-image, et procédé de transmission de flux binaire | |
| WO2019059681A1 (fr) | Procédé et dispositif de traitement de signal vidéo | |
| WO2021010687A1 (fr) | Procédé de codage d'image basé sur une transformée et dispositif associé | |
| WO2021040482A1 (fr) | Dispositif et procédé de codage d'image à base de filtrage de boucle adaptatif | |
| WO2021086061A1 (fr) | Procédé de codage d'images basé sur une transformée et dispositif associé | |
| WO2018066990A1 (fr) | Procédé et appareil de codage/décodage de données d'images | |
| WO2023043223A1 (fr) | Procédé de codage/décodage de signal vidéo et support d'enregistrement dans lequel est stocké un flux binaire |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18859763 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18859763 Country of ref document: EP Kind code of ref document: A1 |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/01/2021) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18859763 Country of ref document: EP Kind code of ref document: A1 |