WO2006041261A1

WO2006041261A1 - Method and system for encoding/decoding multi-view video based on layered-depth image

Info

Publication number: WO2006041261A1
Application number: PCT/KR2005/003418
Authority: WO
Inventors: Kug Jin Yun; Dae Hee Kim; Suk Hee Cho; Chung Hyun Ahn; Soo In Lee; Yo Sung Ho; Seung Uk Yoon; Sung Yeol Kim
Original assignee: Electronics and Telecommunications Research Institute ETRI; Gwangju Institute of Science and Technology
Current assignee: Electronics and Telecommunications Research Institute ETRI; Gwangju Institute of Science and Technology
Priority date: 2004-10-16
Filing date: 2005-10-13
Publication date: 2006-04-20
Anticipated expiration: 2007-04-16
Also published as: EP1800493A1; KR100714068B1; KR20060053268A; EP1800493A4

Abstract

Provided are a method and an apparatus for enoding/decoding a multi-view video using LDI. Specifically, provided are a method and an apparatus for enoding/decoding a multi-view video using LDI, which uses a linear decorrelation process to improve compression efficiency. The LDI encoding method according to present invention includes: (i) generating the LDI including multiple layers by using color and depth information of each viewpoint image of the multi-view video; (ii) performing linear decorrelation in each layer of the LDI; (iii) performing data aggregation in each linearly-decorrelated layer of the LDI; and (iv) encoding the aggregated data in each layer of the LDI to generate an encoded LDI bit stream.

Description

METHOD AND SYSTEM FOR ENCODING/DECODING MULTI- VIEW VIDEO BASED ON LAYERED-DEPTH IMAGE

Technical Field

[1] The present invention relates to a linear decorrelation method and apparatus that adjust a probability distribution of a layered depth image to improve coding efficiency in encoding and decoding the layered depth image. Background Art

[2] A multi-view video has been used in various applications to provide more realistic services, but a great amount of data is required so that an extremely wide bandwidth is needed to transmit the data. Therefore, to solve this problem, a layered depth image ("LDI") method requiring a relatively narrow bandwidth can be utilized.

[3] Unlike a typical 3-D modeling mechanism using a mesh, LDI represents a 3-D object with an array of pixels seen from a single camera position. Each LDI pixel is represented by its color, depth that is the distance of the pixel to the camera, and some other property information supporting LDI rendering. In other words, the LDI is composed of pixels similar to a typical 2-D image, but each pixel has color information as well as depth information and additional information that supports rendering. Therefore, any view image within a certain view angles can be easily rendered by using the LDI, which is constructed from a single view. Specifically, the LDI contains color information on Y, Cb, Cr, and Alpha, depth information representing a distance between a camera and an object, and a splat table index used to support various pixel sizes upon rendering. Each LDI pixel contains 63-bit information in total to include all the information, so that one sheet of LDI includes data from several megabytes to several tens of megabytes.

[4] LDI is divided into multiple layers, each of which contains a mask indicating the existence of pixel in the layer. LDI is characterized in that the distribution of pixels becomes sparser towards the back layer. Such a phenomenon becomes more noticeable, as the number of LDI layers becomes more.

[5] A paper entitled to "Compression of the layered depth image" (J. Duan and J. Li,

IEEE TRANSACTIONS OF IMAGE PROCESSING, VOL., 12, NO.3, 2003/3) discloses a data aggregation method as a preprocessing process prior to compression. Data aggregation, which uses a feature of LDI that the distribution of pixels becomes sparser towards the back layer, is performed to aggregates pixels distributed in each layer. However, such simple data aggregation does not consider correlation of LDI data. [6] Therefore, there is a need to improve the coding efficiency of LDI data by removing redundant (duplicated) information from a highly correlated LDI data, converting them into de-correlated data, and then encoding the non-correlated data. Disclosure of Invention Technical Problem

[7] The object of the present invention is to propose a linear decorrelation process, which is a new preprocessing process to remove redundant depth information prior to performing data aggregation and a method for encoding multiview video using LDI. Other object of the present invention is to improve coding efficiency in encoding process using the LDI, by making a distribution of depth information of the LDI data highly skewed around a median, through the linear decorrelation. Technical Solution

[8] In order to achieve the above objects, the present invention provides a method and an apparatus for encoding and decoding multi-view video using LDI.

[9] According to an aspect of the present invention, a method of encoding multiview video data using LDI is provided. The method includes: (i) generating the LDI including multiple layers by using color and depth information of each viewpoint image of the multi-view video; (ii) performing linear decorrelation in each layer of the LDI; (iii) performing data aggregation in each linearly-decorrelated layer of the LDI; and (iv) encoding the aggregated data in each layer of the LDI to generate an encoded LDI bit stream.

[10] The step (ii) may includes, for each of all pixels in each layer of the LDI, calculating a minimum distance between a line connecting two previous pixels and a depth value of a current pixel to replace the depth value of the current pixel with the minimum distance. Also, in the step (b), when a depth value of the current pixel does not exist, an average depth value of the two previous pixels may be used as the depth value of the current pixel.

[11] In addition, the information for compensating information loss occurred in the LDI generation step may be transmitted to a decoding apparatus, together with the encoded LDI bit stream, so that images close to the original ones may be reconstructed.

[12] According to another aspect of the present invention, a method of decoding a multi- view video comprising the steps of decoding an encoded LDI bit stream; decoding a bit stream of residual information between an original multi-view video and a multi-view video reconstructed from the encoded LDI bit stream; reconstructing the multi-view video based on the decoded LDI bit stream and residual information is provided.

[13] According to an embodiment of the present invention, when an instruction selecting a viewpoint to be reconstructed is received from a user; the image at the corresponding viewpoint may be reconstructed only.

Advantageous Effects

[14] According to the LDI-based multiview video encoding/decoding methods of the present invention, coding efficiency may be improved and high-quality images at the corresponding viewpoints, which are close to the original ones, can be reconstructed. Brief Description of the Drawings

[15] FIG. 1 shows a typical LDI structure.

[16] FIG. 2 is a schematic diagram of a multi-view video LDI-based encoding/decoding apparatus according to an embodiment of the present invention

[17] FIG. 3 is a diagram illustrating how a LDI is generated from multi-view video data;

[18] FIG. 4 shows how to perform linear decorrelation on the LDI layer in which all the pixels have depth values, according to the present invention;

[19] FIG. 5 shows how to perform linear decorrelation on the LDI layer in which some of the pixels do not have depth values, according to the present invention;

[20] FIG. 6 is a flowchart showing the linear decorrelation process according to a preferred embodiment of the present invention. Mode for the Invention

[21] The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. This invention may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein.

[22] FIG. 1 shows a typical LDI structure. The LDI includes an array of pixels seen from a single LDI camera position, together with multiple layers based on any viewpoints. As shown in FIG. 1 , when rays are shot from a LDI camera position P, the rays intersect with an object at a plurality of points, which are ordered from the front to the back. The first intersection points constitute the first LDI layer; the second intersection points constitute the second layer, and so on. Each LDI layer is separated into individual components: luminance, color, transparency and depth. Further, component image of each layer is compressed separately. In order to increase a compression rate, data aggregation is performed to aggregate data on the same layer, so that data are more compactly distributed.

[23] FIG. 2 is a schematic diagram of a multi-view video LDI-based encoding/decoding apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus 210 includes a LDI generation unit 201, a linear decorrelation unit 202, a data aggregation unit 203, an LDI encoding unit 204, an LDI decoding unit 205, a multi-view video generation unit 206 and residual information encoding unit 207.

[24] The LDI generation unit 201 generates a LDI, which is composed of multiple layers, by 3-D warping of multiview video images with depth information, which uses color and depth information of each image. As an example, as shown in Fig. 3, while the images with depth information at different camera viewpoints C and C are warped into one with depth information at a common viewpoint C , when the warped pixels are placed in the same pixel location, their depth values are compared. If the difference between depth values is less than predefined threshold, they are merged. Otherwise, a new layer having the average depth value of the two pixels is created. The former case is shown as 'c' and 'd' in Fig. 3. Since algorithms for generating a LDI are well known to those skilled in the art, a detailed description thereof will be omitted in this spec¬ ification.

[25] The linear decorrelation unit 202, which performs a pre-process before data ag¬ gregation to improve the coding efficiency, makes the depth values of pixels in each layer of the LDI to be gathered around the median, in order to reduce the variance of them. Specifically, the linear decorrelation is performed on each layer consisting of LDI (hereinafter, "LDI layer"). The details of the linear decorrelation will be explained referring to Figs. 3-5.

[26] Next, the data aggregation unit 203 performs the LDI data aggregation in each LDI layer, in order to reduce distribution of depth values. Since data aggregation process is disclosed in the above-article "Compression of the layered depth image" (J. Duan and J. Li, IEEE TRANSACTIONS OF IMAGE PROCESSING, VOL., 12, NO.3, 2003/3), it will be omitted herein.

[27] The LDI encoding 204 encodes the data aggregated toward a certain direction in a space. The encoded LDI bit stream will be transmitted through a communication channel or a storage medium to a multi-view video decoding apparatus 220.

[28] When the multi-view video is reconstructed from the LDI data generated by LDI generation unit 201, the reconstructed images may have a residual with the original images. This is due to information loss during the LDI generation. Accordingly, it is required to separately transmit information for compensating such information loss to a multi-view video decoding apparatus 220, in order to reconstruct high-quality images close to the original ones.

[29] In order to do this, according to one embodiment of the invention, the multi-view video encoding apparatus 210 may additionally include LDI decoding unit 205, multi- view image generation unit 206 and residual information encoding unit 207. LDI decoding unit 205 receives the encoded LDI bit stream from the LDI encoding unit 204 and decodes it. The multi-view image generation unit 206 generates each of multi-view images from the decoded LDI data. The residual information encoding unit 207 calculates residual information between multi-view images generated by the multi- view image generation unit 206 and original multi-view images, encodes and transmits it to the multi-view video decoding apparatus 220.

[30] The multi-view video decoding apparatus 220 includes an LDI decoding unit 221, multi-view image generation unit 222 and residual information decoding unit 223. The LDI decoding unit 221 receives the encoded LDI bit stream from the multi-view video encoding apparatus 220 and decodes it. The residual information decoding unit 223 receives the encoded residual information bit stream and from the multi-view video encoding apparatus 220 and decodes it. The multi-view image generation unit 222 generates each of the multi-view images close to the original images, using the LDI data decoded by the LDI decoding unit 221 and the residual information decoded by the residual information decoding unit 223. In another embodiment, a user can select which viewpoint will be reconstructed and, the multi-view image generation unit 222 can generate the image corresponding to the selected viewpoint, in response to the selection.

[31] FIG. 4 shows how to perform linear decorrelation on the LDI layer in which all the pixels have depth information, according to the present invention. As shown, the one- dimensional (1-D) depth value of a pixel may be considered as the two-dimensional (2-D) value point. As shown in FIG. 4, in case that all the pixels have depth values on the same LDI layer, the minimum distance between a line passing through previous two points, which represent the depth values of the previous two pixels, and the depth value of a current pixel is calculated; and then the depth value of the current pixel is replaced with the minimum distance.

[32] On the other hand, FIG. 5 shows how to perform linear decorrelation on the LDI layer in which some of the pixels do not have depth values, according to the present invention. As shown in FIG. 5, in case that there is a pixel that does not have a depth value, the average depth value of the previous two points is inserted into a depth value of the pixel, which does not have a depth value. In the same manner, the minimum distance between a line passing through the previous two points and the depth value of the current pixel is calculated and then the depth value of the current pixel can be replaced with the minimum distance.

[33] However, there may be a case where the some of the previous two pixels do not have depth values. For example, when the depth value of a first pixel does not exist, the depth value of the first pixel is filled with '0', and when the depth value of a second pixel does not exist, the depth value of the second pixel is filled with that of the first pixel. Accordingly, the depth value of the third pixel can be filled with the average depth value of the previous two pixels. Then, the minimum distance is calculated using this average value as a depth value of the current pixel. In other words, the depth values of all the pixels on each LDI layer are filled and then the minimum distance is calculated. The minimum distance, d, between a line passing through the previous two points, which represent the depth values of the two previous pixels (for example, A(x z ) and B(x , z )) and a current point, which represents the depth value of the current i i pixel (for example, C(x , z )) , can be computed by

[34]

(A-B) ^L ' (C-A)I

\A-B\

(i)

[35] where

A ¹- represents A(-z , x ). Since the depth value does not exist in the position of C, as described above, the average value of the previous two depth values is inserted into z . With this, the variance distribution of the depth values can be reduced.

[36] FIG. 6 is a flowchart of the linear decorrelation process according to an embodiment of the present invention. As shown in FIG. 6, in step 610, it is checked whether all the LDI pixels have depth values on the same LDI layer. When it is determined that there exists a pixel having no depth value, it is determined if the pixel having no depth value is the first pixel in step 620. If it is the first pixel, the value thereof is filled with '0' in step 630. Next, in step 640, it is determined if the pixel having no depth value is the second pixel. If it is, the value of the second pixel is filled with the depth value of the first pixel in step 650. When the other pixels do not have depth values, the depth value of the corresponding pixel is filled with the average depth value of the previous two points, which represent the depth values of the previous two pixels, in step 660. In case that some pixels do not have depth values, the steps 620 to 660 are performed to fill the depth values of the corresponding pixels.

[37] Next, in step 670, the minimum distance between a line passing through the previous two points and a depth value of a current pixel is calculated, and the depth value of the current pixel is replaced with the minimum distance.

[38] The above steps 610- 670 are repetitively performed on each layer of the LDI data, so that linear decorrelation is obtained on each layer.

[39] The present invention can be provided as one or more computer readable medium implemented on one or more products. The products may be a floppy disk, a hard disk, a CD ROM, a flash memory card, a PROM, a RAM, a ROM, or a magnetic tape. In general, a computer readable program can be implemented in any programming language. Some examples of available languages include C, C++, or JAVA.

[40] Although exemplary embodiments of the present invention have been described with reference to the attached drawings, the present invention is not limited to these embodiments, and it should be appreciated to those skilled in the art that a variety of modifications and changes can be made without departing from the spirit and scope of the present invention.

Claims

[1] A method of encoding multi-view video using LDI, the method comprising the steps of:

(i) generating the LDI including multiple layers by using color and depth in¬ formation of each viewpoint image of the multi-view video; (ii) performing linear decorrelation in each layer of the LDI; (iii) performing data aggregation in each linearly-decorrelated layer of the LDI; and

(iv) encoding the aggregated data in each layer of the LDI to generate an encoded LDI bit stream..

[2] The method according to claim 1, wherein the step (ii) includes, for each of all pixels in each layer of the LDI, calculating a minimum distance between a line connecting two previous pixels and a depth value of a current pixel to replace the depth value of the current pixel with the minimum distance.

[3] The method according to claim 2, wherein the step (ii) includes, when a depth value of the current pixel does not exist, using an average depth value of the two previous pixels as the depth value of the current pixel.

[4] The method according to claim 3, further comprising, when the pixel not having the depth value is a first pixel of the LDI, filling a depth value of the first pixel with a value of ¹O¹.

[5] The method according to claim 3, further comprising, when the pixel not having the depth value is a second pixel of the LDI, copying the depth value of the first pixel.

[6] The method according to claim 1 , further comprising transmitting information for compensating information loss occurred in the step (i), together with the encoded LDI bit stream.

[7] The method according to claim 6, wherein the information for compensating in¬ formation loss is residual information between an original multi-view video and a multi-view video reconstructed from the encoded LDI bit stream and the residual information is encoded and transmitted to a decoding apparatus.

[8] A method of decoding a multi-view video, the method comprising: decoding an encoded LDI bit stream; decoding a bit stream of residual information between an original multi-view video and a multi-view video reconstructed from the encoded LDI bit stream; reconstructing the multi-view video based on the decoded LDI bit stream and residual information.

[9] The method according to claim 8, further comprising: receiving an instruction selecting a viewpoint to be reconstructed from a user; and in response to the received instruction, reconstructing image data at the cor¬ responding viewpoint. [10] An apparatus of encoding multi-view video using LDI, comprising:

(i) means for generating the LDI including multiple layers by using color and depth information of each viewpoint image of the multi-view video;

(ii) means for performing linear decorrelation in each layer of the LDI;

(iii) means for performing data aggregation in each linearly-decorrelated layer of the LDI; and

(iv) means for encoding the aggregated data in each layer of the LDI to generate an encoded LDI bit stream. [11] The apparatus according to claim 10, wherein the means for performing linear decorrelation calculates, for each of all pixels in each layer of the LDI, a minimum distance between a line connecting two previous pixels and a depth value of a current pixel to replace the depth value of the current pixel with the minimum distance. [12] The apparatus according to claim 10, further comprising means for calculating and encoding residual information between an original multi-view video and a multi-view video reconstructed from the encoded LDI bit stream. [13] The apparatus according to claim 12, comprising: means for decoding the encoded LDI bit stream; means for reconstructing the multi-view video data based on the decoded LDI data; and means for calculating and encoding the residual information between the original multi-view video data and the reconstructed multi-view video data. [14] An apparatus of decoding a multi-view video, comprising: means for decoding an encoded LDI bit stream; means for decoding a bit stream of residual information between an original multi-view video and a multi-view video reconstructed from the encoded LDI bit stream; means for reconstructing the multi-view video based on the decoded LDI bit stream and residual information. [15] A computer readable recording medium having a computer program thereon, which performs a method according to any one of claims 1 to 7. [16] A computer readable recording medium having a computer program thereon, which performs a method for decoding multi-view video according to claim 8 or