WO2014029086A1

WO2014029086A1 - Methods to improve motion vector inheritance and inter-view motion prediction for depth map

Info

Publication number: WO2014029086A1
Application number: PCT/CN2012/080463
Authority: WO
Inventors: Liang Zhao; Jicheng An; Jian-Liang Lin
Original assignee: Mediatek Singapore Pte. Ltd.
Priority date: 2012-08-22
Filing date: 2012-08-22
Publication date: 2014-02-27

Abstract

Methods to improve motion vector inheritance and inter-view motion prediction for depth map in the multi-view video coding and 3D video coding are provided. The motion vector inheritance mode is used to derive the motion vectors of the block in depth map from the co-located region of video/texture signal. The inter- view motion prediction utilizes a derived disparity vector to obtain the coded motion information of reference view or to do the disparity-compensated prediction directly.

Description

METHODS TO IMPROVE MOTION VECTOR INHERITANCE AND INTER-VIEW MOTION PREDICTION FOR DEPTH MAP

BACKGROUND OF THE INVENTION

Field of the Invention

[0001] The invention relates generally to Three-Dimensional (3D) video processing. In particular, the present invention relates to methods for motion vector inheritance and inter- view motion prediction for depth map in 3D video coding.

Description of the Related Art

[0002] The 3D video coding is developed for encoding/decoding multi-view videos simultaneously captured by several cameras. It is represented using the multi-view video/texture plus depth format, in which a small number of captured views as well as depth maps are coded and the resulting bit stream packages are multiplexed into a 3D video bit stream.

[0003] Motion vector inheritance

[0004] In the reference software of the HEVC-based 3D video coding version 4 (HTM v4.0), motion vector inheritance (MVI) mode is adopted to explore the correlation between the video signal and its associated depth map, since they are both the projection of the same scenery from the same viewpoint at the same time instant.

[0005] In HTM4.0, the basic unit for compression, termed coding unit (CU), is a 2Nx2N square block, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs). Therefore, in order to enable efficient encoding of the depth map data, MVI allows inheritance of the treeblock subdivision into CUs and PUs and their corresponding motion parameters from the video signal. Since the motion vectors of the video signal have the quarter- sample accuracy, whereas for the depth map signal only the full-sample accuracy is used, the motion vectors are quantized to their nearest full-sample position in the inheritance process. For each CU of the depth map, the encoder can adaptively decide whether the motion data are inherited from the co-located region of the video signal or if new motion data are transmitted, as shown in Fig. 1. To signal the MVI coding mode, the syntax is integrated into the merge and skip mode. The merging candidate list has been extended by adding the MVI coding mode as the first candidate in the merging candidate list for depth map coding.

[0006] Independent of the partitioning of the video picture into its CUs, the MVI mode can be applied at any level of the treeblock hierarchy for the depth map. If the MVI mode is applied at a higher level of the depth map coding tree (the CU size that is larger than the CU size coded in the corresponding video signal), the CU subdivision, together with the corresponding motion data, is inherited from the video signal. This makes it possible to specify once for a whole treeblock, typically corresponding to 64 x 64 depth samples by inheriting the partition of the CU/PU and the motion information coded in the corresponding video signal into the depth map signal. On the other hands, if MVI is applied at a CU level which is the same or smaller than the CU sized coded in the corresponding video signal, only the motion information will be inherited from the video signal.

[0007] In MVI, not only the partitioning and the motion vectors, but also the reference picture indices are inherited from the video signal. It has to make sure that the reference depth maps corresponding to the video reference pictures are also available in the reference picture buffer.

The MVI mode is only possible, if the whole region of the video signal, that the motion data and partitioning are inherited from, is coded using inter prediction.

[0008] The syntax table related to MVI mode in HTM4.0 is illustrated in Fig. 2.

[0009] Inter- view motion prediction

[0010] To share the previously encoded motion information of reference views, the inter-view motion prediction is employed for texture coding. For deriving candidate motion parameters for a current block in a dependent view, a disparity vector (DV) for current block is firstly derived, and then the prediction block in the already coded picture in the reference view is located by adding the DV to the location of current block. If the prediction block is coded using motion- compensated prediction (MCP), the associated motion parameters can be used as candidate motion parameters for the current block in the current view in AMVP and merge/skip modes. The derived DV can also be directly used as a candidate DV for disparity-compensated prediction (DCP) in AMVP and merge/skip modes.

[0011] However, in HTM4.0, the inter- view motion prediction is only used for texture coding but not for depth map coding.

BRIEF SUMMARY OF THE INVENTION

[0012] In this invention, we propose several methods to improve the MVI mode and inter-view motion prediction for depth map in the 3D and multi-view video coding as follows:

[0013] In HTM4.0, MVI mode is signalled as a merge/skip candidate. In this invention, we propose to signal the MVI mode flag at the CU level before or after the skip_flag. [0014] In HTM4.0, no matter whether the part_size is Size_2Nx2N or not, the first merge candidate refers to the MVI mode (i.e., merging with the corresponding block from the associated video signal), when MVI is enabled in current slice. In this invention, we further restrict the MVI mode to be only enabled for 2Nx2N part_size.

[0015] In HTM4.0, if the MVI is selected, the CU will be firstly split into leaf CUs as the splitting in the co-located texture region, and then the leaf CU in depth map will always use Size_NxN partition to perform the motion compensation regardless the partition for the co- located texture block. This implementation method follows the spirit of MVI only when the asymmetrical partition (AMP) is not used in texture coding since the Size_NxN partition has the finest granularity MV assignment for all the symmetrical partitions but not for AMP. In this invention, when MVI is used, several methods can be used for depth map to perform the motion compensation.

[0016] In HTM4.0, MVI is applied for merge/skip mode. In this invention, MVI mode is limited to the skip mode only. There the MVI flag will not be included in the merging candidate list for the merge mode.

[0017] In HTM4.0, MVI is applied for all the CU sizes. In this invention, we restrict the MVI mode only to particular CU sizes.

[0018] In current HTM4.0, the inter- view motion prediction is only used for texture coding but not for depth map coding. In this invention, we propose to use the inter-view motion prediction for depth map.

[0019] Other aspects and features of the invention will become apparent to those with ordinary skill in the art upon review of the following descriptions of specific embodiments.

BRIEF DESCRIPTION OF DRAWINGS

The invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:

[0020] Fig. 1 is a diagram illustrating the concept of motion parameter inheritance;

[0021] Fig. 2 is a syntax table in current HTM4.0;

[0022] Fig. 3 is a syntax table for the first embodiment of signaling the mvi_flag before the skip_flag; [0023] Fig. 4 is a syntax table for the first embodiment of signaling the mvi_flag after the skip_f ag.

DETAILED DESCRIPTION OF THE INVENTION

[0024] The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

[0025] In this invention, we propose several methods to improve the MVI mode and inter-view motion prediction for depth map in the 3D and multi-view video coding.

[0026] First embodiment

[0027] In HTM4.0, MVI mode is signalled as a merge/skip candidate. In an embodiment of this invention, we propose to signal the MVI mode flag at the CU level before or after the skip_flag. When the MVI mode flag is true, the current depth CU will merge with the corresponding block from the associated video signal. Furthermore, when MVI mode is selected, no residual data will be transmitted to the decoder, therefore the syntax element representing whether the current CU having residual or not will not be signaled in the bitstream. Since the MVI mode is not signalled as a merge/skip candidate any more, the candidate number in merge/skip mode is reduced by 1.

[0028] The syntax table related to mvi_flag for a leaf CU signaled before the skip_flag in this proposed method is illustrated in Fig. 3.

[0029] The MVI mode can also be signalled after the skip_flag. Fig. 4 illustrates the syntax table related to the mvi_flag signalled after the skip_flag.

[0030] In Fig. 3 and Fig. 4, mvi_flag[ xO ][ yO ] equal to 1 specifies that for the current coding unit, when decoding a P or B slice, no more syntax elements are parsed after mvi_flag[ xO ][ yO ]. mvi_flag[ xO ][ yO ] equal to 0 specifies that the coding unit is not MVI mode. The array indices xO, yO specify the location ( xO, yO ) of the top-left luma sample of the considered coding block relative to the top-left luma sample of the picture. When mvi_flag[ xO ][ yO ] is not present, it is inferred to be equal to 0.

[0031] To encode the mvi_flag, several methods can be used independently.

[0032] 1. Three contexts are used for CAB AC, which depend on the mvi_flag of the neighboring left and above CU. If both the mvi_flag of the left and above CU are true, the third context is used; else if, both the mvi_flag of the left and above CU are false, the first context is used; else, the second context is used. [0033] 2. Two contexts are used for CABAC, which depend on the mvi_flag of the neighboring left and above CU. If both the mvi_flag of the left and above CU are true, the first context is used; else, the second context is used.

[0034] 3. Two contexts are used for CABAC, which depend on the mvi_flag of the neighboring left and above CU. If both the mvi_flag of the left and above CU are false, the first context is used; else, the second context is used.

[0035] 4. One context is used for CABAC.

[0036] 5. No context is used and mvi_flag is coded using the by-bass mode of CABAC.

[0037] The mvi_flag and the skip_flag can also be adaptively switched depending on whether the reference or neighbor block is skip or MVI mode. For example, if the majority of the spatial neighbor blocks are skip modes, the skip_flag will be coded first. On the other hands, if the majority of the spatial neighbor blocks are MVI mode, the mvi_flag will be coded first.

[0038] Second embodiment

[0039] In HTM4.0, no matter whether the part_size is Size_2Nx2N or not, the first merge candidate refers to the MVI mode (i.e., merging with the corresponding block from the associated video signal), when MVI is enabled in current slice.

[0040] In an embodiment of this invention, we further restrict the MVI mode to be only enabled for 2Nx2N part_size. Therefore, for other partitions, the MVI mode is not included in the merging candidate list.

[0041] Third embodiment

[0042] In HTM4.0, if the MVI is selected, the CU will be firstly split into leaf CUs as the splitting in the co-located texture region, and then the leaf CU in depth map will always use Size_NxN partition to perform the motion compensation regardless the partition for the co- located texture block. This implementation method follows the spirit of MVI only when the asymmetrical partition (AMP) is not used in texture coding since the Size_NxN partition has the finest granularity MV assignment for all the symmetrical partitions but not for AMP.

[0043] In an embodiment of this invention, when MVI is used, several methods can be used for depth map to perform the motion compensation.

[0044] 1. The leaf CU in depth map uses the same partition size as the texture to perform the motion compensation.

[0045] 2. Each 4x4 block of the leaf CU in depth map uses the same motion vector as the texture to perform the motion compensation. [0046] 3. The leaf CU in depth map always uses Size_2Nx2N to perform the motion compensation for simplification. In this case, there are several methods for selecting the final motion vector of the Size_2Nx2N partition.

a. The motion vector of the top-left 4x4 block will be selected as the motion vector of the Size_2Nx2N partition.

b. The average motion vector of all 4x4 blocks in the co-located block in texture will be selected as the motion vector of the Size_2Nx2N partition.

c. The average motion vector of different partitions in the co-located block in texture will be selected as the motion vector of the Size_2Nx2N partition.

[0047] 4. If the leaf CU size in depth map is equal to the associated texture CU size, the leaf CU in depth map uses the same partition size as the texture to perform the motion compensation; else if the leaf CU size in depth map is less than the associated texture CU size, Size_NxN is used to perform the motion compensation.

[0048] Fourth embodiment

[0049] In HTM4.0, MVI is applied for merge/skip mode. In this embodiment, MVI mode is limited to the skip mode only. The MVI flag will not be included in the merging candidate list for the merge mode.

[0050] Fifth embodiment

[0051] In HTM4.0, MVI is applied for all the CU sizes. In this embodiment, we restrict the MVI mode only to particular CU sizes as follows.

[0052] 1. MVI mode is restricted to CU whose size is larger than the smallest CU (SCU), which is 8x8 in HTM common test condition.

[0053] 2. MVI mode is restricted to the CU whose size is larger than or equal to the size of co- located block in the associated texture.

[0054] 3. MVI mode is restricted to the CU whose CU split level is less than or equal to the CU split level of co-located block in the associated texture.

[0055] Sixth embodiment

[0056] In current HTM4.0, the inter-view motion prediction is only used for texture coding but not for depth map coding.

[0057] In this embodiment, we propose to use the inter-view motion prediction for depth map as follows:

[0058] 1. Similar as the texture coding, for each CU, an inter- view candidate is added in the candidate list of AMVP and merge/skip mode for depth map coding. [0059] 2. The inter-view candidate derivation for depth map is also similar as that for texture. First a DV is derived for current block, and then this DV can be used directly as a candidate for DCP, or can also be used to locate the prediction block in reference view, if the prediction block in reference view uses MCP, then the motion vectors of the prediction block can be used as the motion candidate for current block.

[0060] 3. In item 2, the DV can be derived by the neighboring blocks as specified in JCT2- A0097 and JCT2-A0126.

[0061] 4. In item 2, the DV can also be derived by converting the estimated depth value to disparity by using the camera parameters.

[0062] 5. In item 4, the estimated depth value can be derived by the depth map estimation method as in current HTM for texture coding.

[0063] 6. In item 2, the DV can also be derived by the neighboring reconstruction depth map pixels. For example, first obtain the average value of the top and left reconstruction pixels, and then converting this average depth map value to a disparity by using the camera parameters.

[0064] 7. In item 1, the inter- view candidate in AMVP and merge/skip modes for depth map can be inserted in any position of the candidate list. For example, the inter-view candidate in AMVP can be in the first position or third position, and in merge/skip mode can be in the position after the MVI mode candidate. Therefore, for depth map, the candidate number is 3 for AMVP mode, and 7 for merge/skip mode.

[0065] 8. For the method in item 1, a flag can be inserted in SPS of depth map to indicate whether the method in item 1 is on or off, which is similar as the texture.

[0066] Seventh embodiment

[0067] This method is a combination of the first and the sixth embodiments.

[0068] The MVI mode flag is signaled before the skip flag, and not signaled as a candidate in merge/skip mode. Therefore, the candidate number for depth map will be 3 for AMVP mode and 6 for merge/skip mode, which is the same as that for texture.

[0069] And the inter-view candidate positions in AMVP and merge/skip candidate lists are also the same as those in texture, i.e., the inter-view candidate is located at the third position in AMVP, and the first position in merge/skip.

[0070] In summary, a depth map coding method related to motion vector inheritance (MVI) and inter-view motion prediction is proposed. An MVI mode flag, such as an ON/OFF flag can be signaled in CU level, PU level, or other levels. The MVI mode flag can be signaled as a merging candidate in the merge/skip mode. If the merging candidate representing the MVI mode is selected, MVI is on, else the MVI is off. In an embodiment, the MVI mode flag can only be signaled as a merging candidate in the skip mode, and the MVI mode will not be signaled as a merging candidate in the merge mode.

[0071] The merging candidate representing the MVI mode can be in the first position of the candidate list, or in other positions of the candidate list. The MVI mode on/off flag can be only signaled for PU with 2Nx2N partition, or can be only signaled for PU with other particular partitions, or can be signaled for all the PU regardless the partition. If the MVI mode is signaled as a merging candidate in the merge or skip mode and can be only signaled for PU with some particular partition such as 2Nx2N, then for the PU with other partitions the merging candidate list will not include the MVI mode, and then the candidate number for merge/skip mode will be reduced by 1.

[0072] The MVI mode on/off flag can be signaled in CU level before the skip flag signaling, i.e., in the first signaling position in CU level, or can be signaled immediately after the skip flag signaling, or can be signaled in CU level in other positions, or can also be signalled in the position adaptively switched depending on whether the reference or neighbor block is skip or MVI mode. In an embodiment, the merging candidate list will not include the MVI mode. In another embodiment, the MVI mode on/off flag can be coded by CABAC. The contexts used to code the MVI mode on/off flag can depend on the MVI mode on/off flag of the neighboring blocks. The neighboring blocks can be left CU, top CU, or others. The contexts number can be 2 or 3 or others. In an embodiment, one context can be used when all the neighboring blocks having MVI mode on; one context can be used when all the neighboring blocks having MVI mode off; one context can be used when the MVI mode on and off are both exist in the neighboring blocks. The MVI mode on/off flag can be coded by the bypass mode of CABAC. Only one context can be used to code the MVI mode on/off flag. In another embodiment, the MVI mode on/off flag can be coded by VLC.

[0073] The current block will merge with the co-located block from the associated video (texture) signal when the MVI mode is selected (i.e., the MVI mode is on). The current block will not have residual data when the MVI mode is selected. That is to say that the flag indicating whether the current block having residual or not will not be signaled in the bitstream when the MVI mode is selected. The PU partition of current block will be set as the PU partition of the co-located block in the associated texture, not always NxN partition as in current HTM4.0 when the MVI mode is selected. The motion vector of each 4x4 block will be set to the motion vector of the co-located block in the associated texture. The PU partition of current block will be always set as 2Nx2N. The motion parameters of each 4x4 block in current block will be set as the motion parameters of the each co-located 4x4 block in the associated texture. The motion parameters of each 4x4 block in current block will be set as the same one. The motion parameter of each 4x4 block in current block will be set as the average of all the motion parameters of the co-located 4x4 block in the associated texture. The motion parameter of each 4x4 block in current block will be set as the average of all the motion parameters in different partitions in co- located block in the associated texture. The motion parameter of each 4x4 block in current block will be set as the motion parameter of one particular 4x4 block in the associated texture. The one particular 4x4 block in the associated texture can be in top-left position, middle position, or in other particular positions of the co-located block in the associated texture.

[0074] The MVI mode on/off flag can be signaled only when the depth of current block is less than or equal to the depth of co-located block in the associated texture, and the MVI mode is off when the depth of current block is larger than the depth of co-located block in the associated texture.

[0075] The MVI mode flag can be signaled only when the depth of current block is larger than the smallest CU (SCU), and the MVI mode is off when the depth of current block is equal to the SCU.

[0076] The inter-view candidate can be in the AMVP and merge/skip candidate list for depth map. The inter-view candidate can be in any position of the AMVP and merge/skip candidate list. The inter-view candidate can be in the third position of the AMVP candidate list, and in the second position of the merge/skip candidate list. The position of inter-view candidate for depth map can be the same as that for texture. The inter-view candidate derivation for depth map follows the same concept as that for texture. The inter-view candidate derivation includes a step called disparity vector (DV) derivation for current block. The DV can be derived by the neighboring blocks. For example, as specified in JCT2-A0097 and JCT2-A0126. The DV can also be derived by converting the estimated depth value to disparity by using the camera parameters. The estimated depth value can be derived by the depth map estimation method as in current HTM for texture coding. The DV can also be derived by the neighboring reconstruction depth map pixels. For example, first obtain the average value of the top and left reconstruction pixels, and then converting this average depth map value to a disparity by using the camera parameters.

[0077] A flag can be inserted in SPS of depth map to indicate whether the inter-view candidate in the AMVP and merge/skip candidate list for depth map is on or off, which is similar as the case for texture.

[0078] The MVI mode is not in the candidate list of merge/skip mode and inter-view candidate is in the candidate list of both AMVP and merge/skip mode. Therefore, the candidate numbers in AMVP and merge/skip modes for depth map are 3 and 6 respectively, which are the same as those for texture. [0079] The MVI and inter-view motion prediction methods described above can be used in a video encoder as well as in a video decoder. Embodiments of MVI and inter-view motion prediction methods according to the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be a circuit integrated into a video compression chip or program codes integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program codes to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine- readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware codes may be developed in different programming languages and different format or style. The software code may also be compiled for different target platform. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

[0080] The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. To the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.

Claims

1. A depth map coding method related to motion vector inheritance (MVI) and inter- view motion prediction, comprising:

signaling an MVI mode flag in a video bitstream representing a status of the MVI mode.

5

2. The method as claimed in claim 1, wherein the MVI mode flag is signaled as a merging candidate in the merge/skip mode, and the MVI is on if the merging candidate representing the MVI mode is selected, else the MVI is off.

L O 3. The method as claimed in claim 1, wherein the MVI mode flag is only signaled as a merging candidate in the skip mode.

4. The methods as claimed in claim 3, wherein the merging candidate representing the MVI mode is in a first position of the candidate list.

L 5

5. The method as claimed in claim 1, wherein the MVI mode flag is only signaled for Prediction Unit (PU) with 2Nx2N partition, or is only signaled for PU with other particular partitions, or is signaled for all the PU regardless the partition.

1 0 6. The method as claimed in claim 1, wherein the MVI mode flag is signaled in CU level before skip flag signaling, or is signaled immediately after the skip flag signaling, or is signaled in CU level in other positions, or is signalled in the position adaptively switched depending on whether the reference or neighbor block is skip or MVI mode.

15 7. The method as claimed in claim 1, wherein the MVI mode flag is coded by CAB AC.

8. The method as claimed in claim 7, wherein contexts used to code the MVI mode flag is depend on the MVI mode flag of a neighboring block.

30 9. The method as claimed in claim 8, wherein the neighboring block comprises one or a combination of a left CU and a top CU.

10. The method as claimed in claim 7, the MVI mode flag is coded by the bypass mode of CABAC.

11. The method as claimed in claim 1, wherein the MVI mode flag is coded by variable length coding (VLC).

5 12. The method as claimed in claim 1, wherein the MVI mode flag is signaled only when the depth of a current block is less than or equal to the depth of a co-located block in the associated texture, and the MVI mode is off when the depth of the current block is larger than the depth of the co-located block in the associated texture.

L O 13. The method as claimed in claim 1, wherein the MVI mode flag is signaled only when the depth of a current block is larger than a smallest CU (SCU), and the MVI mode is off when the depth of the current block is equal to the SCU.

14. The method as claimed in claim 1, wherein the inter- view candidate is in an AMVP L 5 and merge/skip candidate list for depth map.