WO2009091563A1 - Depth-image-based rendering - Google Patents
Depth-image-based rendering Download PDFInfo
- Publication number
- WO2009091563A1 WO2009091563A1 PCT/US2009/000245 US2009000245W WO2009091563A1 WO 2009091563 A1 WO2009091563 A1 WO 2009091563A1 US 2009000245 W US2009000245 W US 2009000245W WO 2009091563 A1 WO2009091563 A1 WO 2009091563A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- information
- additional
- view
- particular time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/579—Depth or shape recovery from multiple images from motion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/553—Motion estimation dealing with occlusions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Definitions
- TECHNICAL FIELD Implementations are described that relate, for example, to coding and decoding systems and apparatus including the same. Particular implementations relate to depth-image-based rendering.
- BACKGROUND Some three dimensional applications create an intermediary view by interpolating between two views, or simply extending a single view.
- background objects can be uncovered when creating the intermediary view and typically information is unavailable for such objects, thus presenting a problem relating to how such objects should be treated in order to obtain an accurate representation of the same.
- the creation of these holes is referred to as the dis-occlusion problem.
- problems there are other problems. For example, the occlusion problem described herein below, as well as artifacts created at the boundary between objects at different depths during the warping process are other problems that may be addressed.
- information from a reference image is accessed.
- the reference image is for a reference view at a particular time.
- Information from a second image is accessed.
- the second image is for a different time than the particular time.
- An additional image is created based on the information from the reference image and on the information from the second image.
- the additional image is for an additional view that is different from the reference view and is for the particular time.
- an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal.
- FIG. 1 is a block diagram of an implementation of an encoder for encoding image data for a view obtained using depth-image-based rendering.
- FIG. 2 is a block diagram of an implementation of a decoder for decoding image data for a view obtained using depth-image-based rendering.
- FIG. 3 is a block diagram of an implementation of an apparatus for encoding and transmitting image data for a view obtained using depth-image-based rendering.
- FIG. 4 is a block diagram of an implementation of an apparatus for demodulating and decoding image data for a view obtained using depth-image-based rendering.
- FIG. 5 is a diagram of an implementation of a pixel-based boundary layer construction method.
- FIG. 6 is a diagram of an implementation of a splatting technique with respect to boundary layer construction.
- FIG. 7 is a diagram of an implementation of a triangle-based boundary layer construction method.
- FIG. 8 is a diagram of an implementation of a mesh warping technique with respect to boundary layer construction.
- FIG. 9 is a diagram of an implementation of multiple (two) reference views based rendering.
- FIG. 10 is a flow diagram of an implementation of a method for encoding and transmitting image data for a view obtained using depth-image-based rendering.
- FIG. 1 1 is a flow diagram further illustrating step 1015 of method 1000 of FIG. 10.
- FIG. 12 is a flow diagram of an implementation of a method for demodulating and decoding image data for a view obtained using depth-image-based rendering.
- Image-based rendering combines both computer vision and computer graphics technologies to generate a novel view using a collection of images from different viewpoints.
- IBR has received much attention as a powerful alternative to the traditional geometry-based rendering for view synthesis.
- Applications such as video games, virtual travel, multi-view video coding (MVC), three dimensional (3D) television, and free viewpoint video (FW) stand to benefit from this technology.
- MVC multi-view video coding
- 3D three dimensional
- FW free viewpoint video
- Depth-image-based rendering is a technique of view synthesis that uses a number of images captured from multiple calibrated cameras as well as associated per-pixel depth information.
- the per-pixel depth information may be computed using, for example, stereo vision.
- various methods may be used to deal with the occlusion and dis-occlusion problems.
- the occlusion problem also interchangeably referred to as the visibility problem, refers to the situation when multiple pixels are mapped to the same location in the synthesized view.
- an image portion e.g., one or more pixels, one or more image blocks, and so forth
- an image portion is not visible in the new view obtained by warping, although the image portion was visible prior to the warping.
- the dis-occlusion problem also interchangeably referred to herein as the exposure problem, refers to the situation when previously invisible scene points are uncovered in the synthesized view, producing what are commonly referred to as holes.
- an image portion e.g., one or more pixels, one or more image blocks, and so forth
- is visible although likely represented as a "hole" in the new view obtained by warping, although the image portion was not visible prior to the warping.
- One technique for dealing with the dis-occlusion problem includes creating a boundary layer around the hole. This technique determines which pixel in the boundary layer has the greatest depth, and copies this pixel to the hole based on the assumed rationale that, the pixel is in the background and odds are that the hole is in the background. However, the copied pixel might not be in the background. Also, ⁇
- the background might not be a solid color.
- a second technique proposes a layered method to resolve the visibility problem in depth-image-based rendering.
- a novel three-layer representation that is, the main layer, the background layer and the boundary layer.
- the phrases "boundary" and "boundary layer” generally refer to an edge which results from depth discontinuities.
- the rendering algorithm which may involve, but is not limited to, for example, pixel-based (splatting) or triangular mesh-based, we design an associated method to generate the boundary layer in a spatio-temporal manner.
- We build a temporal background model for each frame by searching backward and forward for uncovered background information in other frames in the same reference video, based on depth variance.
- Three dimensional image warping can be used to realize DIBR.
- Three dimensional image warping is well known to one of ordinary skill in the art.
- Three dimensional warping generates a novel image from any nearby viewpoints by un-projecting pixels of reference images from the proper three dimensional locations and re-projecting them onto the new image space.
- the determination of colors per pixel in the synthesized view is typically the classical computer graphics problem of reconstruction and re-sampling.
- the rendering method can be pixel-based (splatting), or mesh-based (triangular). Either method is capable of dealing with the occlusion and dis-occlusion problem.
- the occlusion (visibility) problem refers to the case when multiple pixels are mapped to the same location in the synthesized view.
- One solution to the occlusion problem is Z-buffering.
- An alternative method is mapping the pixels in a specific order referred to as back-to-front occlusion compatible.
- One short-coming of the alternate method is unavailability for rendering with multiple reference images, since a mapping order cannot be found for multiple reference views or sources.
- dis-occlusion occurs when the previously invisible scene points are uncovered in the synthesized view, producing what are commonly referred to as holes. Since the reference view does not provide information about this portion, a view synthesis system may assume the background typically extends into the hole. This simplistic approach would examine the depth of all the pixels bordering the hole, and copy the pixel that is the farthest away to each exposed pixel. This method is generally inefficient and not appropriate for textured backgrounds.
- depth discontinuities at the boundary between the foreground and background can be considered to cause the holes.
- the depth discontinuities may be located, and a boundary strip may be created around these depth-discontinuity pixels.
- a boundary strip refers to a narrow (one or more pixels wide) strip between the foreground and the background in a particular picture or portion of a picture.
- a Bayesian matting may be used to determine the color and depth within these strips. While rendering the synthesized view, both the main layer and the boundary layer may be blended together to remove cracks and artifacts.
- temporal artifacts could be visible when the hole-filling method in IBR is applied for each frame. Meanwhile, the occluded region in some frame could be uncovered at other frames of the video captured from the same view, since the foreground may disappear while the background may appear in the same location.
- a method may be used for temporal maintenance of a background model that helps fill in holes and reduce temporal artifacts. The method segments the unique foreground from the background first by a bi-modal histogram thresholding, then updates the background model with the newly discovered pixels of the background.
- FIG. 1 shows a non-limiting block diagram of an implementation of an encoder
- the encoder 100 includes a view multiplexer 105 having an output connected in signal communication with a non-inverting input of a combiner 110 and a first input of a motion estimator 130.
- An output of the combiner 110 is connected in signal communication with an input of a transformer 115.
- An output of the transformer 115 is connected in signal communication with an input of a quantizer 120.
- An output of the quantizer 120 is connected in signal communication with an input of an entropy coder 125 and an input of an inverse quantizer 140.
- An output of the inverse quantizer 140 is connected in signal communication with an input of an inverse transformer 145.
- An output of the inverse transformer 145 is connected in signal communication with a first non-inverting input of a combiner 150.
- An output of the combiner 150 is connected in signal communication with an input of an intra predictor 164 and with an input of a deblocking filter 152.
- An output of the deblocking filter is connected in signal communication with a first input of an image warper 155 and an input of a reference view portion 170 of a decoder picture buffer 177.
- An output of the image warper 155 is connected in signal communication with an input of a synthesized view portion 165 of the decoder picture buffer 177.
- An output of the reference view portion 170 and an output of the synthesized view portion 165 are connected in signal communication with a second input of the motion estimator 130 and a first input of a motion compensator 135.
- An output of the motion estimator 130 is connected in signal communication with a second input of the motion compensator 135.
- An output of the motion compensator 135 is connected in signal communication with a first input of an inter/intra or synthesis mode selector 166.
- An output of the inter/intra or synthesis mode selector 166 is connected in signal communication with an inverting input of the combiner 110 and a second non-inverting input of the combiner 150.
- An output of the intra predictor 164 is connected in signal communication with a second input of the inter/intra or synthesis mode selector 166.
- Inputs of the view multiplexer 105 are available as inputs to the encoder 100, for receiving picture data for views 0 through N.
- a second input and third input of the image warper 155 are available as inputs of the encoder 100, for receiving camera parameters and depth values.
- An output of the entropy coder 125 is available as an output of the encoder 100, for outputting a bitstream corresponding to the multi-view picture data.
- the image warper 155 takes the last encoded image (for a particular view) and creates warped images for one or more views other than the particular view.
- the warped images are stored in the decoder picture buffer 177 and will be used as reference for the encoding of future images.
- the decoder picture buffer 177 includes all the reference images available for encoding future images.
- the reference view portion 170 includes previously decoded -
- the synthesized view portion 165 includes the set of warped images created from the previously decoded images.
- the mode selector 166 selects the best prediction mode to be used for the encoding. Besides the two modes available in standard video encoders (inter and intra modes), the modified mode selector 166 can also choose a synthesis mode which uses a synthesized image for the inter prediction.
- FIG. 1 essentially operate as in any standard MPEG-4 AVC encoder.
- FIG. 2 shows a non-limiting block diagram of an implementation of a decoder 200 for decoding image data for a view obtained using depth-image-based rendering.
- the decoder 200 includes an entropy decoder 205 having an output connected in signal communication with an input of an inverse quantizer 210.
- An output of the inverse quantizer 210 is connected in signal communication with an input of an inverse transformer 215.
- An output of the inverse transformer 215 is connected in signal communication with a first non-inverting input of a combiner 220.
- An output of the combiner 220 is connected in signal communication with an input of a deblocking filter 225 and an input of an intra predictor 235.
- An output of the deblocking filter is connected in signal communication with an input of a picture buffer 240.
- An output of the picture buffer 240 is connected in signal communication with a first input of a motion compensator 260 and a first input of an image warper 250.
- An output of the image warper 250 is also connected in signal communication with the first input of the motion compensator 260.
- An output of the motion compensator 260 is connected in signal communication with a first input of an intra/inter or synthesis mode selector 230.
- An output of an intra predictor is connected in signal communication with a second input of the intra/inter or synthesis mode selector 230.
- An output of the intra/inter or synthesis mode selector 230 is connected in signal communication with a second non-inverting input of the combiner 220.
- An input of the entropy decoder 205 is available as an input of the decoder 200, for receiving a bitstream.
- a second input of the motion compensator 260 is available as an input of the decoder 200, for receiving motion vectors.
- a second input of the image warper 250 is available as an input of the decoder 200, for receiving camera parameters.
- a third input of the image warper 250 is available as an input of the decoder 200, for receiving depth values.
- An output of the deblocking filter 225 is available as an output of the decoder 200, for outputting pictures.
- the intra/inter or synthesis mode selector 230 selects the prediction mode to be used for the decoding based on the information present on the received bit stream. Besides the two modes available in standard video decoders (intra and inter modes), the modified mode selector 230 can also choose a synthesis mode which uses a synthesized image for the inter prediction.
- the image warper 250 creates a synthesized image from one of the decoded pictures stored in the decoded picture buffer 240 when such synthesized image is required by the mode selector 230.
- the parameters required to perform the image synthesis are obtained from the received bit stream.
- the remaining elements in FIG. 2 operate as in any standard MPEG-4 AVC decoder.
- FIG. 3 shows a non-limiting block diagram of an implementation of an apparatus 300 for encoding and transmitting image data for a view obtained using depth-image-based rendering.
- the apparatus 300 includes a rendering unit 305 having an output connected in signal communication with an input of an encoder 310.
- An output of the encoder 310 is connected in signal communication with an input of a transmitter 315.
- An input of the rendering unit 305 is available as an input of the apparatus 300, for receiving a reference image and a second image.
- An output of the transmitter 315 is available as an output of the apparatus 300, for outputting encoded images for transmission, for example, over one or more networks.
- the rendering unit 305 is configured to access information from a reference image and a second image.
- the reference image is for a reference view at a particular time, and the second image is for a different time than the particular time.
- the rendering unit 305 is also configured to create an additional image based on the information from the reference image and on the information from the second image.
- the additional image is for an additional view that is different from the reference view and being for the particular time.
- the encoder 310 is configured to encode the reference image, the second image, and the additional image.
- the transmitter 315 is configured to transmit the encoded reference image, the encoded second image, and the encoded additional image.
- the rendering unit 305 includes a memory interface 306 and a synthesizer 307.
- the memory interface 306 may be configured to access the information from the reference image and second image.
- the interface 306 may also be configured to create the additional image based on the information from the reference image and on the information from the second image.
- the transmitter 315 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers.
- the transmitter may include, or interface with, an antenna (not shown).
- FIG. 4 shows a non-limiting block diagram of an implementation of a decoder 400 for demodulating and decoding image data for a view obtained using depth-image-based rendering.
- the apparatus 400 includes a demodulator 405 having an output connected in signal communication with an input of a decoder 410.
- An output of the decoder 410 is connected in signal communication with an input of a rendering unit 415.
- An output of the rendering unit is connected in signal communication with an input of a presentation device 420.
- An input of the demodulator 405 is available as an input to the apparatus 400, for receiving a signal including an encoded reference image and an encoded second image.
- An output of the presentation device 420 is available as an output of the apparatus 400, for displaying any of the reference image, the second image, and an additional image.
- the demodulator 405 is configured to receive and demodulate a signal.
- the signal includes an encoded reference image and an encoded second image.
- the reference image is for a reference view at a particular time.
- the second image is for a different time than the particular time.
- the decoder 410 is configured to decode the encoded reference image and the encoded second image.
- the rendering unit 415 is configured to access information from the decoded reference image, to access information from the decoded second image, and to create an additional image based on the information from the decoded reference image and on the information from the decoded second image.
- the additional image is for an additional view that is different from the reference view and is for the particular time.
- the presentation device 420 is configured to display the additional image.
- the demodulator 405 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures.
- Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more 1 Q
- decoder 410 may also be performed by the demodulator 405 in various implementations.
- the demodulator 405 may include, or interface with, an antenna (not shown). It is to be appreciated that apparatus 300, apparatus 400, and/or other implementations of the present principles may be implemented in a set top box, a transmitter, mobile phones, personal digital assistants (PDAs), mobile computers, and so forth.
- PDAs personal digital assistants
- apparatus 300 may represent all or part of a video transmission system.
- the video transmission system may be, for example, a head-end or transmission system for transmitting a signal using one or more of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast.
- the transmission may be provided over the Internet or some other network.
- apparatus 400 may represent all or part of a video receiving system.
- the video receiving system may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage.
- the video receiving system may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
- the video receiving system may be configured, for example, to receive signals over one or more of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network.
- DIBR three dimensional image warping
- Equation (2) Re-projecting the three dimensional point onto the synthesized image plane using Equation (2), we obtain the novel view's image P 1 .
- re-projecting we map the three dimensional point onto a two dimensional point in the image plane.
- Boundary Layer In the reference view, depth discontinuities at the boundary between the foreground and background cause the holes in the synthesized view. Since the pixels along the boundary of the objects receive contributions from the foreground and the background colors, these mixed color pixels will result in visible artifacts in depth-image-based rendering. Boundary matting is a technique to reduce the artifacts caused by mixed pixels. Boundary matting and the generation of a boundary layer are well-known to one of ordinary skill in the art. , ,
- the boundary layer mainly for filling in holes.
- First we locate the depth discontinuities by checking if the disparity jump between each neighboring pixel pair is greater than ⁇ pixels, denoted with a boolean function dpbound (x,y).
- a disparity image is typically generated in three dimensional warping, and the disparity jumps can be determined based on the disparity image.
- the threshold, ⁇ can be selected based on the scene and intensity range of the disparity image. In some implementations, the range is 0-255 and the threshold is selected as 5 pixels.
- splatting or mesh warping herein below we discuss the procedure to form the boundary layer.
- FIG. 5 shows a non-limiting diagram of an implementation of a pixel-based boundary layer construction method 500, also interchangeably referred to herein as Algorithm 1. That is, Algorithm 1 is the pixel-based process used to label the boundary layer pixels and determine their color and disparity values based on the background extension. Algorithm 1 checks pixel d's disparity value disp ⁇ d) with its 8 pixel neighborhood.
- FIG. 6 illustrates pixel d and its 8 pixel neighborhood. That is, FIG. 6 shows a non-limiting diagram of an implementation of a splatting technique 600 with respect to boundary layer construction. The modification formula for the pair of pixels at the depth discontinuity is shown with respect to FIG. 6. If a depth jump is found between pixel d and pixel e 4 , where d is the foreground and e 4 is the background, then we modify d and e 4 from U as follows:
- ⁇ , ⁇ is a constant factor (whose value is preferably, but not mandorily as follows: 0.5 ⁇ ⁇ /? ⁇ 1) and val(.) denotes the color or depth information. Note that in general we are extending the background into the boundary layer. We erode the boundary layer obtained by Algorithm 1 by one pixel to prevent cracks from appearing in the rendering. Mesh warping
- FIG. 7 shows a non-limiting diagram of an implementation of triangle-based boundary layer construction method 700, also interchangeably referred to herein as Algorithm 2. That is, Algorithm 2 is the triangle-based process used to split each section into two triangles, label the boundary layer pixels, and determine their color and disparity value based on background extension. Thus, Algorithm 2 provides boundary layer pixels using mesh warping.
- FIG. 8 shows a non-limiting diagram of an implementation of a mesh warping technique 800 with respect to boundary layer construction. That is, FIG.
- val ⁇ d 2 a - val(c g ) + (1 - a) ⁇ val(d 2 )
- val' (d 3 ) a - v ⁇ /(c, ,) + (. - «) • val(d 3 )
- val' ⁇ d ⁇ ) ⁇ - val ⁇ d ⁇ ) + (1 - ⁇ ) ⁇ val ⁇ d, ) .
- the pixel corresponding to the abrupt disparity reduction is a background candidate pixel.
- the corresponding pixels in these frames can also each be considered a background candidate pixel.
- a simple method is median-filtering them on the disparity component.
- the color consistency measure is chosen to be a L 2 distance in the RGB space, where an L 2 distance is the 2 nd root of the sum of squared difference, like sqrt(r ⁇ 2+g ⁇ 2+b ⁇ 2). If different background candidates are found in the forward and backward directions, then the color consistency metric is also used to determine the ultimate background candidate selected. If no background candidate pixels are found, then the existing information obtained from the background extension will be preserved.
- Such disparity suggests, for example, that the foreground moves away and the background appears.
- This varying disparity as a function of frame, for a given pixel location can be referred to as the temporal disparity curve.
- a temporal disparity curve may look like a stepped function having a relatively constant value for a first segment of time, and a second (different) relatively constant value for a second segment of time.
- the jump from the first segment to the second segment indicates, for example, that at the pixel being investigated an object has moved away from the pixel's location and revealed the background for that object. For example, a person may have moved, revealing a parked car behind.
- a third segment in time may be associated with the car moving away, revealing a building behind.
- Each segment is assumed to possess the same background, and that background is also assumed to possibly be the background for another segment. Note that if the disparity is smooth, then we assume that there is no substantial depth change and that the object does not move much.
- the object may be in the background or the foreground.
- Implementations may use an algorithm that utilizes the spatial-temporal (color) consistency to optimize the background model.
- Such an algorithm can be time-consuming and complicated. Accordingly, other implementations instead use simple median filtering to determine the disparity and the color for each segment.
- an implementation analyzes odd temporal images (that is, images from the view under consideration for time t-1 , t-3, ... , and t+1 , t+3, ...), and for even pixels analyzes even temporal images.
- the compositing method combines the warped frames from different layers and different views.
- the emphasis of each reference view is defined by its angular distance as described herein below.
- the angular distance can be determined as follows.
- FIG. 9 shows the compositing framework from two reference views.
- a reference view 1 we perform main layer rendering 910, background layer rendering 915, and boundary layer rendering 920.
- Blending 980 is then performed to, for example, obtain a blended image.
- the splatting method will render the novel view pixel-by-pixel varying the reconstruction kernel size (which can be considered to be the window function in splatting) depending on the disparity and normal vector orientation of the reference pixel.
- the splatting kernel size for the background/main layer differs from the boundary layer, because the latter will be warped to the dis-occluded area in the synthesized view.
- the hole size can be estimated based on the depth discrepancy, which decides the reconstruction kernel size in splatting.
- the (triangular) mesh-based method converts each 2x2 section of the depth map into two triangles if the depth difference between either pair of diagonal vertexes is less than the given threshold.
- the background layer is rendered to fill in the holes in the novel view.
- the boundary layer is rendered with those triangles removed by the main layer.
- we run the simplest approach which would examine all the pixels bordering the hole, and copy the one that is the farthest away. The one that is farthest away has the biggest depth and is most likely to be in the background.
- a layer may have more than one value for a given pixel.
- a background layer (915) retains two values for a given pixel. One value represents a foreground value, and a second value represents a background value.
- the analysis for a given pixel, in a given warped view may not be able to accurately determine whether or not the given pixel (for example, located at the boundary of a hole) is in the foreground or the background. Accordingly, the analysis may retain the foreground value and the background value that is produced from, for example, the disparity-curve analysis.
- a second view (950, 955, 960) may provide additional information allowing the blending operation (980) to determine whether the given pixel is in the foreground or the background.
- FIG. 9 need not produce two final synthesized images (a first from Reference view 1 , and a second from Reference view 2) prior to performing the blending operation (980). Although this is possible in some implementations, the implementation of FIG. 9 performs the blending operation (980) using six "images". The six images are the output from blocks 910, 915, 920, 950, 955, and 960. Note that these "images" need not be full images. For example, in one implementation the background layers (915, 955) need only include the information for the pixels that are part of the hole boundaries.
- implementations may, for example, combine warped frames from only a single view, but from multiple layers.
- an implementation may produce only a single warped main layer, a single background layer, and a single boundary layer. These three layers from the same view may be combined to form a composite image.
- FIG. 10 is a non-limiting flow diagram of an implementation of a method 1000 for encoding and transmitting image data for a view obtained using depth-image-based rendering.
- step 1005 information from a reference image is accessed.
- the reference image is for a reference view at a particular time.
- step 1010 information from a second image is accessed.
- the second image is for a different time than the particular time.
- an additional image is created based on the information from the reference image and on the information from the second image.
- the additional image is for an additional view that is different from the reference view and is for the particular time.
- the reference image, the second image, and the additional image are encoded.
- the encoded reference image, the encoded second image, and the encoded additional image are transmitted.
- implementations need only perform operations 1005, 1010, and 1015 of the method 1000. That is, these implementations are directed toward creating the additional image.
- These implementations may be performed at an encoder or at a decoder, for example.
- the additional image may be encoded and transmitted, and/or the additional image may be used as a reference for encoding another image.
- the additional image may be, for example, a synthesis of a view that is to be encoded, and the synthesized additional image may be used as a reference for encoding that view.
- the encoder may also may also signal to a decoder, using signaling information such as values for a syntax, which information was used to synthesize the additional image.
- the signaling information may indicate, for example, the view that was used to synthesize the additional image, the view location of the additional image, and any other (for example, temporal information) that was used in the synthesis of the additional image.
- the decoder can then perform the synthesis of the additional view at the decoder and use that synthesized additional view to decode the encoded view.
- the additional image may be used as a reference for synthesizing yet another image.
- the additional image is warped, and a background layer and a boundary layer are generated.
- FIG. 11 is a non-limiting flow diagram of an implementation of step 1015 of method 1000 of FIG. 10.
- the additional image is synthesized based on the reference image and estimating a value for a pixel in a dis-occluded portion (occurring in, e.g., a background portion) of the additional image using the information from the second image.
- a pixel in the reference image that corresponds to the pixel in the dis-occluded portion of the additional image is identified.
- identification may involve, but is not limited to, for example, coherence and consistence of neighboring depth and color information.
- depth information for the pixel in the reference image is compared with depth information for a corresponding pixel in the second image.
- a size of the dis-occluded portion is refined using depth information, by comparing depth information for a pixel in the dis-occluded portion with depth information for a neighboring pixel outside of the dis-occluded portion, and determining whether to include the neighboring pixel in the dis-occluded portion based on the comparing. 2
- the value of the pixel in the dis-occluded portion is estimated based on a value of the corresponding background/foreground pixel in the second image.
- FIG. 12 is a non-limiting flow diagram of an implementation of a method 1200 for demodulating and decoding image data for a view obtained using depth-image-based rendering.
- a signal is received and demodulated.
- the signal includes an encoded reference image and an encoded second image.
- the reference image is for a reference view at a particular time.
- the second image is for a different time than the particular time.
- the encoded reference image and the encoded second image are decoded.
- step 1215 information from the decoded reference image is accessed.
- step 1220 information from the decoded second image is accessed.
- step 1225 an additional image is created based on the information from the decoded reference image and on the information from the decoded second image.
- the additional image is for an additional view that is different from the reference view and is for the particular time.
- At step 1230 at least the additional image is displayed on a presentation device.
- T 1 "and/or”, and "at least one of, for example, in the cases of "A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
- T 1 "and/or” and “at least one of, for example, in the cases of "A/B”, “A and/or B” and “at least one of A and B” is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
- T 1 "and/or” and "at least one of, for example, in the cases of "A/B”, “A and/or B” and “at least one of A and B” is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
- the implementations described herein may be implemented in, for example, a method or a process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
- An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
- the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
- Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
- Communication devices such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
- Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding. Examples of equipment include video coders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cell phones, PDAs, and other courts
- the equipment may be mobile and even installed in a mobile vehicle.
- the methods may be implemented by instructions being performed by a processor, and such instructions may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory (“ROM").
- the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two.
- a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a computer readable medium having instructions for carrying out a process.
- implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
- the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
- a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment.
- Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
- the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
- the information that the signal carries may be, for example, analog or digital information.
- the signal may be transmitted over a variety of different wired or wireless links, as is known.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Various implementations are described. Several implementations relate to depth-image-based rendering. Many of these implementations use temporal information in synthesizing an image. For example, temporal information may be used to generate a background layer for a warped image, and then the background layer may be blended with the main layer. One method includes accessing information from a reference image (1005). The reference image is for a reference view at a particular time. Information from a second image is accessed (1010). The second image is for a different time than the particular time. An additional image is created based on the information from the reference image and on the information from the second image (1015). The additional image is for an additional view that is different from the reference view and being for the particular time.
Description
,
DEPTH-IMAGE-BASED RENDERING
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Serial No. 61/011,519, filed on January 18, 2008, titled "Depth-Image-Based Rendering", the contents of which are hereby incorporated by reference in their entirety for all purposes.
TECHNICAL FIELD Implementations are described that relate, for example, to coding and decoding systems and apparatus including the same. Particular implementations relate to depth-image-based rendering.
BACKGROUND Some three dimensional applications create an intermediary view by interpolating between two views, or simply extending a single view. However, background objects (holes) can be uncovered when creating the intermediary view and typically information is unavailable for such objects, thus presenting a problem relating to how such objects should be treated in order to obtain an accurate representation of the same. The creation of these holes is referred to as the dis-occlusion problem. Moreover, there are other problems. For example, the occlusion problem described herein below, as well as artifacts created at the boundary between objects at different depths during the warping process are other problems that may be addressed.
SUMMARY
According to a general aspect, information from a reference image is accessed. The reference image is for a reference view at a particular time. Information from a second image is accessed. The second image is for a different time than the particular time. An additional image is created based on the information from the reference image and on the information from the second image. The additional image is for an additional view that is different from the reference view and is for the particular time.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an implementation of an encoder for encoding image data for a view obtained using depth-image-based rendering.
FIG. 2 is a block diagram of an implementation of a decoder for decoding image data for a view obtained using depth-image-based rendering.
FIG. 3 is a block diagram of an implementation of an apparatus for encoding and transmitting image data for a view obtained using depth-image-based rendering.
FIG. 4 is a block diagram of an implementation of an apparatus for demodulating and decoding image data for a view obtained using depth-image-based rendering.
FIG. 5 is a diagram of an implementation of a pixel-based boundary layer construction method.
FIG. 6 is a diagram of an implementation of a splatting technique with respect to boundary layer construction. FIG. 7 is a diagram of an implementation of a triangle-based boundary layer construction method.
FIG. 8 is a diagram of an implementation of a mesh warping technique with respect to boundary layer construction.
FIG. 9 is a diagram of an implementation of multiple (two) reference views based rendering.
FIG. 10 is a flow diagram of an implementation of a method for encoding and transmitting image data for a view obtained using depth-image-based rendering.
FIG. 1 1 is a flow diagram further illustrating step 1015 of method 1000 of FIG. 10.
FIG. 12 is a flow diagram of an implementation of a method for demodulating and decoding image data for a view obtained using depth-image-based rendering.
DETAILED DESCRIPTION Image-based rendering (IBR) combines both computer vision and computer graphics technologies to generate a novel view using a collection of images from different viewpoints. In the past decade, IBR has received much attention as a powerful alternative to the traditional geometry-based rendering for view synthesis. Applications such as video games, virtual travel, multi-view video coding (MVC), three dimensional (3D) television, and free viewpoint video (FW) stand to benefit from this technology.
Depth-image-based rendering (DIBR) is a technique of view synthesis that uses a number of images captured from multiple calibrated cameras as well as associated per-pixel depth information. The per-pixel depth information may be computed using, for example, stereo vision. In the rendering process, various methods may be used to deal with the occlusion and dis-occlusion problems. As used herein, the occlusion problem, also interchangeably referred to as the visibility problem, refers to the situation when multiple pixels are mapped to the same location in the synthesized view. With respect to the occlusion problem, an image portion (e.g., one or more pixels, one or more image blocks, and so forth) is not visible in the new view obtained by warping, although the image portion was visible prior to the warping. Moreover, as used herein, the dis-occlusion problem, also interchangeably referred to herein as the exposure problem, refers to the situation when previously invisible scene points are uncovered in the synthesized view, producing what are commonly referred to as holes. With respect to the dis-occlusion problem, an image portion (e.g., one or more pixels, one or more image blocks, and so forth) is visible (although likely represented as a "hole") in the new view obtained by warping, although the image portion was not visible prior to the warping.
One technique for dealing with the dis-occlusion problem includes creating a boundary layer around the hole. This technique determines which pixel in the boundary layer has the greatest depth, and copies this pixel to the hole based on the assumed rationale that, the pixel is in the background and odds are that the hole is in the background. However, the copied pixel might not be in the background. Also,
Λ
even if the copied pixel is in the background, the background might not be a solid color.
A second technique, discussed in at least one implementation in this application, proposes a layered method to resolve the visibility problem in depth-image-based rendering. In at least one such implementation, for each reference view, we use a novel three-layer representation, that is, the main layer, the background layer and the boundary layer. As used herein, the phrases "boundary" and "boundary layer" generally refer to an edge which results from depth discontinuities. Based on the rendering algorithm which may involve, but is not limited to, for example, pixel-based (splatting) or triangular mesh-based, we design an associated method to generate the boundary layer in a spatio-temporal manner. We build a temporal background model for each frame by searching backward and forward for uncovered background information in other frames in the same reference video, based on depth variance. Three dimensional image warping can be used to realize DIBR. Three dimensional image warping is well known to one of ordinary skill in the art. Three dimensional warping generates a novel image from any nearby viewpoints by un-projecting pixels of reference images from the proper three dimensional locations and re-projecting them onto the new image space. After three dimensional warping, the determination of colors per pixel in the synthesized view is typically the classical computer graphics problem of reconstruction and re-sampling. Generally, the rendering method can be pixel-based (splatting), or mesh-based (triangular). Either method is capable of dealing with the occlusion and dis-occlusion problem. As noted above, the occlusion (visibility) problem refers to the case when multiple pixels are mapped to the same location in the synthesized view. One solution to the occlusion problem is Z-buffering. An alternative method is mapping the pixels in a specific order referred to as back-to-front occlusion compatible. One short-coming of the alternate method is unavailability for rendering with multiple reference images, since a mapping order cannot be found for multiple reference views or sources.
As noted above, dis-occlusion (exposure) occurs when the previously invisible scene points are uncovered in the synthesized view, producing what are commonly referred to as holes. Since the reference view does not provide information about this portion, a view synthesis system may assume the background typically extends into
the hole. This simplistic approach would examine the depth of all the pixels bordering the hole, and copy the pixel that is the farthest away to each exposed pixel. This method is generally inefficient and not appropriate for textured backgrounds.
In the reference view, depth discontinuities at the boundary between the foreground and background can be considered to cause the holes. The depth discontinuities may be located, and a boundary strip may be created around these depth-discontinuity pixels. As used herein, a boundary strip refers to a narrow (one or more pixels wide) strip between the foreground and the background in a particular picture or portion of a picture. A Bayesian matting may be used to determine the color and depth within these strips. While rendering the synthesized view, both the main layer and the boundary layer may be blended together to remove cracks and artifacts.
In video-based rendering, temporal artifacts could be visible when the hole-filling method in IBR is applied for each frame. Meanwhile, the occluded region in some frame could be uncovered at other frames of the video captured from the same view, since the foreground may disappear while the background may appear in the same location. In view synthesis from a pair of rectified images for one-to-one conferencing, a method may be used for temporal maintenance of a background model that helps fill in holes and reduce temporal artifacts. The method segments the unique foreground from the background first by a bi-modal histogram thresholding, then updates the background model with the newly discovered pixels of the background. Although the bi-modal histogram looks characteristic of the relatively simple scene of a talking head, the bi-modal histogram will face more challenges when applied to a complicated background. Moreover, in general, range data segmentation is difficult. Further, another challenging problem is background maintenance although the camera is assumed fixed. Several issues in real scenarios include, for example, illuminance change (lighting conditions), small moving objects on the background (e.g., moving curtain or shaking tree leaves), sleeping object (moving into the background and then motionless), waking object (moving away from the background), foreground objects' shadows, and so forth. FIG. 1 shows a non-limiting block diagram of an implementation of an encoder
100 for encoding image data for a view obtained using depth-image-based rendering. The encoder 100 includes a view multiplexer 105 having an output connected in signal communication with a non-inverting input of a combiner 110 and a first input of a motion estimator 130. An output of the combiner 110 is connected in signal
communication with an input of a transformer 115. An output of the transformer 115 is connected in signal communication with an input of a quantizer 120. An output of the quantizer 120 is connected in signal communication with an input of an entropy coder 125 and an input of an inverse quantizer 140. An output of the inverse quantizer 140 is connected in signal communication with an input of an inverse transformer 145. An output of the inverse transformer 145 is connected in signal communication with a first non-inverting input of a combiner 150. An output of the combiner 150 is connected in signal communication with an input of an intra predictor 164 and with an input of a deblocking filter 152. An output of the deblocking filter is connected in signal communication with a first input of an image warper 155 and an input of a reference view portion 170 of a decoder picture buffer 177. An output of the image warper 155 is connected in signal communication with an input of a synthesized view portion 165 of the decoder picture buffer 177. An output of the reference view portion 170 and an output of the synthesized view portion 165 are connected in signal communication with a second input of the motion estimator 130 and a first input of a motion compensator 135. An output of the motion estimator 130 is connected in signal communication with a second input of the motion compensator 135. An output of the motion compensator 135 is connected in signal communication with a first input of an inter/intra or synthesis mode selector 166. An output of the inter/intra or synthesis mode selector 166 is connected in signal communication with an inverting input of the combiner 110 and a second non-inverting input of the combiner 150. An output of the intra predictor 164 is connected in signal communication with a second input of the inter/intra or synthesis mode selector 166. Inputs of the view multiplexer 105 are available as inputs to the encoder 100, for receiving picture data for views 0 through N. A second input and third input of the image warper 155 are available as inputs of the encoder 100, for receiving camera parameters and depth values. An output of the entropy coder 125 is available as an output of the encoder 100, for outputting a bitstream corresponding to the multi-view picture data.
The image warper 155 takes the last encoded image (for a particular view) and creates warped images for one or more views other than the particular view. The warped images are stored in the decoder picture buffer 177 and will be used as reference for the encoding of future images.
The decoder picture buffer 177 includes all the reference images available for encoding future images. The reference view portion 170 includes previously decoded
-
images, and the synthesized view portion 165 includes the set of warped images created from the previously decoded images.
The mode selector 166 selects the best prediction mode to be used for the encoding. Besides the two modes available in standard video encoders (inter and intra modes), the modified mode selector 166 can also choose a synthesis mode which uses a synthesized image for the inter prediction.
The remaining elements in FIG. 1 essentially operate as in any standard MPEG-4 AVC encoder.
FIG. 2 shows a non-limiting block diagram of an implementation of a decoder 200 for decoding image data for a view obtained using depth-image-based rendering. The decoder 200 includes an entropy decoder 205 having an output connected in signal communication with an input of an inverse quantizer 210. An output of the inverse quantizer 210 is connected in signal communication with an input of an inverse transformer 215. An output of the inverse transformer 215 is connected in signal communication with a first non-inverting input of a combiner 220. An output of the combiner 220 is connected in signal communication with an input of a deblocking filter 225 and an input of an intra predictor 235. An output of the deblocking filter is connected in signal communication with an input of a picture buffer 240. An output of the picture buffer 240 is connected in signal communication with a first input of a motion compensator 260 and a first input of an image warper 250. An output of the image warper 250 is also connected in signal communication with the first input of the motion compensator 260. An output of the motion compensator 260 is connected in signal communication with a first input of an intra/inter or synthesis mode selector 230. An output of an intra predictor is connected in signal communication with a second input of the intra/inter or synthesis mode selector 230. An output of the intra/inter or synthesis mode selector 230 is connected in signal communication with a second non-inverting input of the combiner 220. An input of the entropy decoder 205 is available as an input of the decoder 200, for receiving a bitstream. A second input of the motion compensator 260 is available as an input of the decoder 200, for receiving motion vectors. A second input of the image warper 250 is available as an input of the decoder 200, for receiving camera parameters. A third input of the image warper 250 is available as an input of the decoder 200, for receiving depth values. An output of the deblocking filter 225 is available as an output of the decoder 200, for outputting pictures.
o
The intra/inter or synthesis mode selector 230 selects the prediction mode to be used for the decoding based on the information present on the received bit stream. Besides the two modes available in standard video decoders (intra and inter modes), the modified mode selector 230 can also choose a synthesis mode which uses a synthesized image for the inter prediction.
The image warper 250 creates a synthesized image from one of the decoded pictures stored in the decoded picture buffer 240 when such synthesized image is required by the mode selector 230. The parameters required to perform the image synthesis are obtained from the received bit stream. The remaining elements in FIG. 2 operate as in any standard MPEG-4 AVC decoder.
FIG. 3 shows a non-limiting block diagram of an implementation of an apparatus 300 for encoding and transmitting image data for a view obtained using depth-image-based rendering. The apparatus 300 includes a rendering unit 305 having an output connected in signal communication with an input of an encoder 310. An output of the encoder 310 is connected in signal communication with an input of a transmitter 315. An input of the rendering unit 305 is available as an input of the apparatus 300, for receiving a reference image and a second image. An output of the transmitter 315 is available as an output of the apparatus 300, for outputting encoded images for transmission, for example, over one or more networks.
In an embodiment, the rendering unit 305 is configured to access information from a reference image and a second image. The reference image is for a reference view at a particular time, and the second image is for a different time than the particular time. The rendering unit 305 is also configured to create an additional image based on the information from the reference image and on the information from the second image. The additional image is for an additional view that is different from the reference view and being for the particular time. The encoder 310 is configured to encode the reference image, the second image, and the additional image. The transmitter 315 is configured to transmit the encoded reference image, the encoded second image, and the encoded additional image.
In an embodiment, the rendering unit 305 includes a memory interface 306 and a synthesizer 307. In an embodiment, the memory interface 306 may be configured to access the information from the reference image and second image. The memory
g
interface 306 may also be configured to create the additional image based on the information from the reference image and on the information from the second image. The transmitter 315 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error-correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers. The transmitter may include, or interface with, an antenna (not shown).
FIG. 4 shows a non-limiting block diagram of an implementation of a decoder 400 for demodulating and decoding image data for a view obtained using depth-image-based rendering. The apparatus 400 includes a demodulator 405 having an output connected in signal communication with an input of a decoder 410. An output of the decoder 410 is connected in signal communication with an input of a rendering unit 415. An output of the rendering unit is connected in signal communication with an input of a presentation device 420. An input of the demodulator 405 is available as an input to the apparatus 400, for receiving a signal including an encoded reference image and an encoded second image. An output of the presentation device 420 is available as an output of the apparatus 400, for displaying any of the reference image, the second image, and an additional image. In an embodiment, the demodulator 405 is configured to receive and demodulate a signal. The signal includes an encoded reference image and an encoded second image. The reference image is for a reference view at a particular time. The second image is for a different time than the particular time. The decoder 410 is configured to decode the encoded reference image and the encoded second image. The rendering unit 415 is configured to access information from the decoded reference image, to access information from the decoded second image, and to create an additional image based on the information from the decoded reference image and on the information from the decoded second image. The additional image is for an additional view that is different from the reference view and is for the particular time. The presentation device 420 is configured to display the additional image.
The demodulator 405 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures. Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more
1 Q
carriers, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. The functions of decoder 410 may also be performed by the demodulator 405 in various implementations. The demodulator 405 may include, or interface with, an antenna (not shown). It is to be appreciated that apparatus 300, apparatus 400, and/or other implementations of the present principles may be implemented in a set top box, a transmitter, mobile phones, personal digital assistants (PDAs), mobile computers, and so forth.
As another example, apparatus 300 may represent all or part of a video transmission system. The video transmission system may be, for example, a head-end or transmission system for transmitting a signal using one or more of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The transmission may be provided over the Internet or some other network. As yet another example, apparatus 400 may represent all or part of a video receiving system. The video receiving system may be, for example, a cell-phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video for display to a user or for storage. Thus, the video receiving system may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device. The video receiving system may be configured, for example, to receive signals over one or more of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network.
1. 3-D Warping
One technical route to realize DIBR is via the three dimensional image warping that is known in computer graphics literatures. If we define a three dimensional point by its homogenous coordinates p - (x,y,z,\)' , and its perspective projection in the reference image plane by Pr = (ur, vΛ,l)' , then we have a general perspective projection defined as follows:
wrPr = PPMrP , (1)
χ }
where wr is the depth factor, and PPM r is the 3x4 perspective projection matrix built by extrinisic and intrinsic parameters of the calibrated reference camera. Correspondingly, we get the equation for the synthesized view as follows:
wsP^ PPM,P < (2)
where P5 is also a homogeneous coordinate as defined above, and ws is a depth scaling factor. We denote the twelve elements of PPM r as qtJ , / =1,2,3, j=1,2,3,4. From the image point Pr and its depth z, we can estimate the other two components of the three dimensional point p by a linear equation as follows:
using the following equalities:
b\ = føi4 - M r?34) + føπ ~ urq^)z , O1 1 = urq3l - qn , an = urqi2 - qu . b2 = (?24 - V^34) + (^23 -V^33)Z , α21 = vrq2 l - g2] , a22 = vrq32 - q22 .
Re-projecting the three dimensional point onto the synthesized image plane using Equation (2), we obtain the novel view's image P1. In re-projecting, we map the three dimensional point onto a two dimensional point in the image plane.
2. Boundary Layer In the reference view, depth discontinuities at the boundary between the foreground and background cause the holes in the synthesized view. Since the pixels along the boundary of the objects receive contributions from the foreground and the background colors, these mixed color pixels will result in visible artifacts in depth-image-based rendering. Boundary matting is a technique to reduce the artifacts caused by mixed pixels. Boundary matting and the generation of a boundary layer are well-known to one of ordinary skill in the art.
, ,
12
In at least one implementation, we use the boundary layer mainly for filling in holes. First we locate the depth discontinuities by checking if the disparity jump between each neighboring pixel pair is greater than ξ pixels, denoted with a boolean function dpbound (x,y). Note that a disparity image is typically generated in three dimensional warping, and the disparity jumps can be determined based on the disparity image. The threshold, ξ , can be selected based on the scene and intensity range of the disparity image. In some implementations, the range is 0-255 and the threshold is selected as 5 pixels. Based on the associated rendering method, for example, splatting or mesh warping, herein below we discuss the procedure to form the boundary layer.
Splatting
Splatting is a well-known technique. FIG. 5 shows a non-limiting diagram of an implementation of a pixel-based boundary layer construction method 500, also interchangeably referred to herein as Algorithm 1. That is, Algorithm 1 is the pixel-based process used to label the boundary layer pixels and determine their color and disparity values based on the background extension. Algorithm 1 checks pixel d's disparity value disp{d) with its 8 pixel neighborhood. FIG. 6 illustrates pixel d and its 8 pixel neighborhood. That is, FIG. 6 shows a non-limiting diagram of an implementation of a splatting technique 600 with respect to boundary layer construction. The modification formula for the pair of pixels at the depth discontinuity is shown with respect to FIG. 6. If a depth jump is found between pixel d and pixel e4, where d is the foreground and e4 is the background, then we modify d and e4 from U as follows:
* val {θ4 ) = α ■ val{f4 ) + (1 - α) ■ val{e^ ) , vat*{d) = β- val*{e4) + (1 - β) ■ val(d) ,
where α,β is a constant factor (whose value is preferably, but not mandorily as follows: 0.5<α </?<1) and val(.) denotes the color or depth information. Note that in general we are extending the background into the boundary layer. We erode the boundary layer obtained by Algorithm 1 by one pixel to prevent cracks from appearing in the rendering.
Mesh warping
Mesh warping is well-known, particularly in the area of computer graphics. Since, in at least one implementation, we use the triangular mesh as the basic primitive, we check the disparity jump in another way. FIG. 7 shows a non-limiting diagram of an implementation of triangle-based boundary layer construction method 700, also interchangeably referred to herein as Algorithm 2. That is, Algorithm 2 is the triangle-based process used to split each section into two triangles, label the boundary layer pixels, and determine their color and disparity value based on background extension. Thus, Algorithm 2 provides boundary layer pixels using mesh warping. FIG. 8 shows a non-limiting diagram of an implementation of a mesh warping technique 800 with respect to boundary layer construction. That is, FIG. 8 illustrates how we check each 2x2 section and its neighborhood with respect to the boundary layer construction. The modification of a triangle at depth discontunities is similar to the pixel operation. As shown with respect to FIG. 8, if there is depth jump in triangle Cf1Cf2Cf3, where c/i is the foreground and d2d3 is the background, then we modify triangle cficf2cf3 from triangle d^^cz as follows:
val \d2) = a - val(cg ) + (1 - a) ■ val(d2 ) , val' (d3) = a - vα/(c, ,) + (. - «) • val(d3 ) , val' {dλ) = β- val{d \ ) + (1 - β) ■ val{d, ) .
In the triangle-based boundary layer, we do not run erosion like the pixel-based one because the latter is in fact one pixel wider. It is shown in Algorithms 1 and 2 that we determine the pixels' color in the boundary layer by extending the background, either pixel-based or triangle-based. In Algorithms 1 and 2, we repeatedly compare disparity along varying directions. If the pixel of the boundary layer is touched by multiple extensions from different directions, then its ultimate color and depth are determined by the one with the largest depth. The largest depth is most likely to be the background.
Furthermore, for each pixel in the boundary region, we exploit its information in the temporal dimension. That is, we search forward and backward, starting from the closest (in time) frames in the same video for uncovered background by checking whether abrupt disparity reduction is more than a given threshold. In particular, in one
] 4
implementation, we search each direction (forward and backward) until we find an abrupt disparity reduction satisfying the threshold. The pixel corresponding to the abrupt disparity reduction is a background candidate pixel.
After the abrupt disparity reductions, we may find a smooth disparity change during a period of continuous frames. If so, then the corresponding pixels in these frames can also each be considered a background candidate pixel. A simple method is median-filtering them on the disparity component. Alternatively, for each direction, we can select the pixel whose color is consistent with the existing color determined by background extension as above. The color consistency measure is chosen to be a L2 distance in the RGB space, where an L2 distance is the 2nd root of the sum of squared difference, like sqrt(rΛ2+gΛ2+bΛ2). If different background candidates are found in the forward and backward directions, then the color consistency metric is also used to determine the ultimate background candidate selected. If no background candidate pixels are found, then the existing information obtained from the background extension will be preserved.
In summary, our method to build the boundary layer works in a spatial-temporal way. Information in the temporal dimension is exploited that is more reliable than simply extending the background in the spatial dimension. Note that Algorithms 1 and 2 can replace, for example, the original RGB and depth values using background extension. Various implementations may use temporal pictures, and/or the background layer described below, in generating the boundary layer.
3. Background Layer
Usually holes result from unknown information in the novel view. Multiple reference views provide spatial information in IBR, while temporal consistency offers additional information in video-based rendering.
We propose a temporal method that uses depth information, such as, for example, actual depth values or disparity values. Recall that disparity values are related to depth values. In particular, below we propose a disparity-based temporal method which forms the background for each frame as described herein below. The idea is similar to the temporal background extension in the boundary layer described herein above.
For each pixel location, we investigate its varying disparity in the whole sequence and detect abrupt disparity change that is greater than a given threshold.
, 5
Such disparity suggests, for example, that the foreground moves away and the background appears. This varying disparity as a function of frame, for a given pixel location, can be referred to as the temporal disparity curve.
Based on the preceding, we separate the temporal disparity curve into different segments, where disparity in each segment varies smoothly. Thus, we separate the curve at, for example, the abrupt disparity changes.
For example, a temporal disparity curve may look like a stepped function having a relatively constant value for a first segment of time, and a second (different) relatively constant value for a second segment of time. The jump from the first segment to the second segment indicates, for example, that at the pixel being investigated an object has moved away from the pixel's location and revealed the background for that object. For example, a person may have moved, revealing a parked car behind. Note also that a third segment in time may be associated with the car moving away, revealing a building behind. Each segment is assumed to possess the same background, and that background is also assumed to possibly be the background for another segment. Note that if the disparity is smooth, then we assume that there is no substantial depth change and that the object does not move much. However, the object may be in the background or the foreground. Eventually for each segment we search forward and background to find its background from neighboring segments. Implementations may use an algorithm that utilizes the spatial-temporal (color) consistency to optimize the background model. Such an algorithm can be time-consuming and complicated. Accordingly, other implementations instead use simple median filtering to determine the disparity and the color for each segment.
One technique for determining the pixel value associated with the background for a given pixel is now presented. We examine the disparity curve at the current time and note the disparity value for the segment that includes the current time. Then we move forward and backward in time to find, for example, a neighboring segment that represents the background for that pixel (higher depth). Then we select a time that corresponds to the background segment and access the picture from the selected time (forward or backward in time, as needed). Then we copy the pixel values from the accessed picture to the background layer for the given pixel.
Certain implementations form only partial disparity curves by analyzing selected temporal images from a given view. One such implementation analyzes every other temporal image to compute partial disparity curves. Other such implementations actually analyze different temporal images for different pixels. For example, for odd pixels an implementation analyzes odd temporal images (that is, images from the view under consideration for time t-1 , t-3, ... , and t+1 , t+3, ...), and for even pixels analyzes even temporal images.
Various of these methods are not necessarily trying to maintain a background model, although such a model may be maintained in various implementations. Rather, the above implementation temporally discovers the uncovered background information in neighboring frames based on depth change.
4. Compositing in Rendering
The compositing method combines the warped frames from different layers and different views. The emphasis of each reference view is defined by its angular distance as described herein below. The angular distance can be determined as follows.
For a pixel (u, v), we estimate its three dimensional location by Equation (3), i.e., p=(x,y,z)'. We know the optic focal center for reference view i as Ori, i=1 ,2. Meanwhile, we are given the optic focal center for the synthesized view as Os, where Oh and Os can be estimated from the camera parameters, i.e., Oj = -Rj'tj, with Rj denoting the rotation matrix, t denoting the translation vector, and Rj' denoting the transpose of Rj. Then we calculate the angular distance of the three dimensional point p=(x,y,z)' for each reference view by cos(angle(Ori-p-Os))Λ-q, q>2, i=1 ,2.
The smaller the angular distance, then the smaller the angle, and the closer the reference view is to the synthesized view, resulting in more emphasis of the reference view. Here, in this implementation, we use a "winner-take-all" approach to composite different reference views. For each reference view, we have the background layer, the main layer, and its boundary layer. Pixel blending is realized by Z-buffer, which is known. The pixel blending involves taking the pixel with the smallest depth because that pixel is most likely to be in the foreground.
FIG. 9 shows the compositing framework from two reference views. With respect to a reference view 1 , we perform main layer rendering 910, background layer rendering 915, and boundary layer rendering 920. With respect to a reference view 2, we perform main layer rendering 950, background layer rendering 955, and boundary layer rendering 960. Blending 980 is then performed to, for example, obtain a blended image.
Herein below we discuss the procedure using different rendering methods such as, for example, splatting and mesh warping.
The splatting method will render the novel view pixel-by-pixel varying the reconstruction kernel size (which can be considered to be the window function in splatting) depending on the disparity and normal vector orientation of the reference pixel. The splatting kernel size for the background/main layer differs from the boundary layer, because the latter will be warped to the dis-occluded area in the synthesized view. The hole size can be estimated based on the depth discrepancy, which decides the reconstruction kernel size in splatting.
The (triangular) mesh-based method converts each 2x2 section of the depth map into two triangles if the depth difference between either pair of diagonal vertexes is less than the given threshold. When rendering the main layer, the depth discontinuities are handled by removing the corresponding triangles. The background layer is rendered to fill in the holes in the novel view. Eventually, the boundary layer is rendered with those triangles removed by the main layer. In one implementation, to fill in the remaining holes after the three-layered rendering, we run the simplest approach which would examine all the pixels bordering the hole, and copy the one that is the farthest away. The one that is farthest away has the biggest depth and is most likely to be in the background.
Note that a layer may have more than one value for a given pixel. In one implementation, for example, a background layer (915) retains two values for a given pixel. One value represents a foreground value, and a second value represents a background value. The analysis for a given pixel, in a given warped view, may not be able to accurately determine whether or not the given pixel (for example, located at the boundary of a hole) is in the foreground or the background. Accordingly, the analysis may retain the foreground value and the background value that is produced from, for example, the disparity-curve analysis. A second view (950, 955, 960) may provide
additional information allowing the blending operation (980) to determine whether the given pixel is in the foreground or the background.
It should be clear that the implementation of FIG. 9 need not produce two final synthesized images (a first from Reference view 1 , and a second from Reference view 2) prior to performing the blending operation (980). Although this is possible in some implementations, the implementation of FIG. 9 performs the blending operation (980) using six "images". The six images are the output from blocks 910, 915, 920, 950, 955, and 960. Note that these "images" need not be full images. For example, in one implementation the background layers (915, 955) need only include the information for the pixels that are part of the hole boundaries.
Other implementations may, for example, combine warped frames from only a single view, but from multiple layers. For example, an implementation may produce only a single warped main layer, a single background layer, and a single boundary layer. These three layers from the same view may be combined to form a composite image.
FIG. 10 is a non-limiting flow diagram of an implementation of a method 1000 for encoding and transmitting image data for a view obtained using depth-image-based rendering.
At step 1005, information from a reference image is accessed. The reference image is for a reference view at a particular time.
At step 1010, information from a second image is accessed. The second image is for a different time than the particular time.
At step 1015, an additional image is created based on the information from the reference image and on the information from the second image. The additional image is for an additional view that is different from the reference view and is for the particular time.
At step 1020, the reference image, the second image, and the additional image are encoded.
At step 1025, the encoded reference image, the encoded second image, and the encoded additional image are transmitted.
Note that many implementations need only perform operations 1005, 1010, and 1015 of the method 1000. That is, these implementations are directed toward creating the additional image. These implementations may be performed at an encoder or at a decoder, for example.
On the encoder side, there are several uses. For example, the additional image may be encoded and transmitted, and/or the additional image may be used as a reference for encoding another image. The additional image may be, for example, a synthesis of a view that is to be encoded, and the synthesized additional image may be used as a reference for encoding that view. The encoder may also may also signal to a decoder, using signaling information such as values for a syntax, which information was used to synthesize the additional image. The signaling information may indicate, for example, the view that was used to synthesize the additional image, the view location of the additional image, and any other (for example, temporal information) that was used in the synthesis of the additional image. The decoder can then perform the synthesis of the additional view at the decoder and use that synthesized additional view to decode the encoded view.
On the encoder side, the additional image may be used as a reference for synthesizing yet another image. In such an implementation, the additional image is warped, and a background layer and a boundary layer are generated.
FIG. 11 is a non-limiting flow diagram of an implementation of step 1015 of method 1000 of FIG. 10.
At step 1105, the additional image is synthesized based on the reference image and estimating a value for a pixel in a dis-occluded portion (occurring in, e.g., a background portion) of the additional image using the information from the second image.
At step 1110, a pixel in the reference image that corresponds to the pixel in the dis-occluded portion of the additional image is identified. Such identification may involve, but is not limited to, for example, coherence and consistence of neighboring depth and color information.
At step 1115, depth information for the pixel in the reference image is compared with depth information for a corresponding pixel in the second image.
At step 1120, it is determined whether or not the pixel in the second image is a background/foreground pixel based on the comparing. At step 1125, a size of the dis-occluded portion is refined using depth information, by comparing depth information for a pixel in the dis-occluded portion with depth information for a neighboring pixel outside of the dis-occluded portion, and determining whether to include the neighboring pixel in the dis-occluded portion based on the comparing.
2
At step 1130, the value of the pixel in the dis-occluded portion is estimated based on a value of the corresponding background/foreground pixel in the second image.
FIG. 12 is a non-limiting flow diagram of an implementation of a method 1200 for demodulating and decoding image data for a view obtained using depth-image-based rendering.
At step 1205, a signal is received and demodulated. The signal includes an encoded reference image and an encoded second image. The reference image is for a reference view at a particular time. The second image is for a different time than the particular time.
At step 1210, the encoded reference image and the encoded second image are decoded.
At step 1215, information from the decoded reference image is accessed.
At step 1220, information from the decoded second image is accessed. At step 1225, an additional image is created based on the information from the decoded reference image and on the information from the decoded second image. The additional image is for an additional view that is different from the reference view and is for the particular time.
At step 1230, at least the additional image is displayed on a presentation device.
Reference in the specification to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
It is to be appreciated that the use of any of the following T1 "and/or", and "at least one of, for example, in the cases of "A/B", "A and/or B" and "at least one of A and B", is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of "A, B, and/or C" and "at least one of A, B, and C", such phrasing is intended to encompass the selection of the first listed option (A)
2 ]
only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
We thus provide one or more implementations having particular features and aspects. Certain features and aspects relate to using temporal information in the synthesis of additional images, particularly using temporal information in producing a background layer that is used in the synthesis of an additional image. However, other features and aspects have been disclosed. Further, features and aspects of described implementations may also be adapted for other implementations. For example, additional layers may be used, and image-based rendering may be combined with model-based or geometry-based rendering. Although implementations described herein may be described in a particular context, such descriptions should in no way be taken as limiting the features and concepts to such implementations or contexts.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, or a software program. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users. Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding and decoding. Examples of equipment include video coders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cell phones, PDAs, and other
„
22
communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette, a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a computer readable medium having instructions for carrying out a process.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application and are within the scope of the following claims.
Claims
1. A method comprising: accessing (1005) information from a reference image, the reference image being for a reference view at a particular time; accessing (1010) information from a second image, the second image being for a different time than the particular time; and creating (1015) an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
2. The method of claim 1 wherein creating the additional image comprises: synthesizing (1105) the additional image based on the reference image, the additional image including a dis-occluded portion; and estimating (1 130) a value for a pixel in the dis-occluded portion using the information from the second image.
3. The method of claim 2 wherein at least part of the dis-occluded portion represents a background portion of the additional image (1120).
4. The method of claim 2 wherein estimating the value comprises forming one or more of a background layer or a boundary layer.
5. The method of claim 2 further comprising refining (1 125) a size of the dis-occluded portion using depth information.
6. The method of claim 5 wherein refining comprises: comparing (1125) depth information for a pixel in the dis-occluded portion with depth information for a neighboring pixel outside of the dis-occluded portion; and determining (1 125) whether to include the neighboring pixel in the dis-occluded portion based on the comparing. ^
7. The method of claim 2 further comprising: accessing (1010) information from a third image, the third image being for the different time; and estimating (1130) a value for a second pixel in the dis-occluded portion using the information from the third image.
8. The method of claim 2 wherein the second image is from the reference view, and estimating the value of the pixel in the dis-occluded portion comprises: identifying (1110) a pixel in the reference image that corresponds to the pixel in the dis-occluded portion of the additional image; comparing (1115) depth information for the pixel in the reference image with depth information for a corresponding pixel in the second image; determining (1120) that the corresponding pixel in the second image is a background pixel based on the comparing; and estimating (1130) the value of the pixel in the dis-occluded portion based on a value of the corresponding background pixel in the second image.
9. The method of claim 8 wherein at least part of the dis-occluded portion of the additional image represents a background portion of the additional image.
10. The method of claim 8 wherein at least part of the dis-occluded portion of the additional image represents a boundary portion of the additional image.
11. The method of claim 2>wherein the second image is from the reference view, and estimating the value of the pixel in the dis-occluded portion comprises: identifying (1110) a pixel in the reference image that corresponds to the pixel in the dis-occluded portion of the additional image; comparing (1115) depth information for the pixel in the reference image with depth information for a corresponding pixel in the second image; determining (1120) that the corresponding pixel in the second image is a foreground pixel based on the comparing; and estimating (1130) the value of the pixel in the dis-occluded portion based on a value of the corresponding foreground pixel in the second image.
12. The method of claim 1 wherein the method is performed in at least one of an encoder, a decoder, a post-processor subsequent to the reference image being decoded, and a pre-processor prior to the reference image being encoded.
13. An apparatus comprising: means for accessing information from a reference image, the reference image being for a reference view at a particular time; means for accessing information from a second image, the second image being for a different time than the particular time; and means for creating an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
14. A processor-readable medium having stored thereon instructions for causing a processor to perform at least the following: accessing (1005) information from a reference image, the reference image being for a reference view at a particular time; accessing (1010) information from a second image, the second image being for a different time than the particular time; and creating (1015) an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
15. An apparatus comprising a processor configured to perform at least the following: accessing (1005) information from a reference image, the reference image being for a reference view at a particular time; accessing (1010) information from a second image, the second image being for a different time than the particular time; and creating (1015) an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
16. An apparatus comprising: a memory interface (306) for accessing information from a reference image, the reference image being for a reference view at a particular time, and accessing information from a second image, the second image being for a different time than the particular time; and a synthesizer (307) for creating an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
17. An apparatus comprising: a rendering unit (305) configured: to access information from a reference image, the reference image being for a reference view at a particular time, to access information from a second image, the second image being for a different time than the particular time, and to create an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time; an encoder (310) configured to encode the reference image, the second image, and the additional image; and a transmitter (315) configured to transmit the encoded reference image, the encoded second image, and the encoded additional image.
18. An apparatus comprising: means for accessing information from a reference image, the reference image being for a reference view at a particular time; means for accessing information from a second image, the second image being for a different time than the particular time; means for creating an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time; means for encoding the reference image, the second image, and the additional image; and means for transmitting the encoded reference image, the encoded second image, and the encoded additional image.
19. A method comprising: accessing (1005) information from a reference image, the reference image being for a reference view at a particular time; accessing (1010) information from a second image, the second image being for a different time than the particular time; creating (1015) an additional image based on the information from the reference image and on the information from the second image, the additional image being for an additional view that is different from the reference view and being for the particular time; encoding (1020) the reference image, the second image, and the additional image; and transmitting (1025) the encoded reference image, the encoded second image, and the encoded additional image.
20. An apparatus, comprising: a demodulator (405) configured to receive and demodulate a signal, the signal including an encoded reference image and an encoded second image, the reference image being for a reference view at a particular time, and the second image being for a different time than the particular time; a decoder (410) configured to decode the encoded reference image and the encoded second image; and a rendering unit (415) configured: to access information from the decoded reference image; to access information from the decoded second image; and to create an additional image based on the information from the decoded reference image and on the information from the decoded second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
21. The apparatus of claim 20, further comprising a presentation device (420) for displaying the additional image.
22. An apparatus, comprising: means for receiving and demodulating a signal, the signal including an encoded reference image and an encoded second image, the reference image being for a reference view at a particular time, and the second image being for a different time than the particular time; means for decoding the encoded reference image and the encoded second image; means for accessing information from the decoded reference image; means for accessing information from the decoded second image; and means for creating an additional image based on the information from the decoded reference image and on the information from the decoded second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
23. A method, comprising: receiving and demodulating (1205) a signal, the signal including an encoded reference image and an encoded second image, the reference image being for a reference view at a particular time, and the second image being for a different time than the particular time; decoding (1210) the encoded reference image and the encoded second image; accessing (1215) information from the decoded reference image; accessing (1220) information from the decoded second image; and creating (1225) an additional image based on the information from the decoded reference image and on the information from the decoded second image, the additional image being for an additional view that is different from the reference view and being for the particular time.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US1151908P | 2008-01-18 | 2008-01-18 | |
| US61/011,519 | 2008-01-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2009091563A1 true WO2009091563A1 (en) | 2009-07-23 |
Family
ID=40677748
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2009/000245 Ceased WO2009091563A1 (en) | 2008-01-18 | 2009-01-15 | Depth-image-based rendering |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2009091563A1 (en) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101969564A (en) * | 2010-10-29 | 2011-02-09 | 清华大学 | Upsampling method for depth video compression of three-dimensional television |
| WO2011030298A1 (en) * | 2009-09-09 | 2011-03-17 | Nokia Corporation | Rendering multiview content in a 3d video system |
| EP2458879A1 (en) * | 2010-11-26 | 2012-05-30 | Thomson Licensing | Occlusion layer extension |
| US20140198182A1 (en) * | 2011-09-29 | 2014-07-17 | Dolby Laboratories Licensing Corporation | Representation and Coding of Multi-View Images Using Tapestry Encoding |
| TWI493963B (en) * | 2011-11-01 | 2015-07-21 | Acer Inc | Image generating device and image adjusting method |
| WO2015164636A1 (en) * | 2014-04-25 | 2015-10-29 | Sony Computer Entertainment America Llc | Computer graphics with enhanced depth effect |
| WO2018029399A1 (en) * | 2016-08-11 | 2018-02-15 | Teknologian Tutkimuskeskus Vtt Oy | Apparatus, method, and computer program code for producing composite image |
| WO2018047033A1 (en) * | 2016-09-07 | 2018-03-15 | Nokia Technologies Oy | Method and apparatus for facilitating stereo vision through the use of multi-layer shifting |
| WO2018063579A1 (en) * | 2016-09-29 | 2018-04-05 | Intel Corporation | Hybrid stereo rendering for depth extension in dynamic light field displays |
| EP3596702A4 (en) * | 2017-03-17 | 2020-07-22 | Magic Leap, Inc. | Mixed reality system with multi-source virtual content compositing and method of generating virtual content using same |
| US10762598B2 (en) | 2017-03-17 | 2020-09-01 | Magic Leap, Inc. | Mixed reality system with color virtual content warping and method of generating virtual content using same |
| EP3703003A1 (en) * | 2019-02-28 | 2020-09-02 | Dolby Laboratories Licensing Corp. | Hole filling for depth image based rendering |
| US10812936B2 (en) | 2017-01-23 | 2020-10-20 | Magic Leap, Inc. | Localization determination for mixed reality systems |
| US10838207B2 (en) | 2015-03-05 | 2020-11-17 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US10861130B2 (en) | 2017-03-17 | 2020-12-08 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| US10909711B2 (en) | 2015-12-04 | 2021-02-02 | Magic Leap, Inc. | Relocalization systems and methods |
| US10939085B2 (en) | 2017-10-19 | 2021-03-02 | Intel Corporation | Three dimensional glasses free light field display using eye location |
| US11073699B2 (en) | 2016-08-02 | 2021-07-27 | Magic Leap, Inc. | Fixed-distance virtual and augmented reality systems and methods |
| US11379948B2 (en) | 2018-07-23 | 2022-07-05 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| US11429183B2 (en) | 2015-03-05 | 2022-08-30 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US11670039B2 (en) | 2019-03-04 | 2023-06-06 | Dolby Laboratories Licensing Corporation | Temporal hole filling for depth image based video rendering |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1612732A2 (en) * | 2004-06-28 | 2006-01-04 | Microsoft Corporation | Interactive viewpoint video system and process |
-
2009
- 2009-01-15 WO PCT/US2009/000245 patent/WO2009091563A1/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1612732A2 (en) * | 2004-06-28 | 2006-01-04 | Microsoft Corporation | Interactive viewpoint video system and process |
Non-Patent Citations (7)
| Title |
|---|
| CRIMINISI A ET AL: "Efficient Dense Stereo with Occlusions for New View-Synthesis by Four-State Dynamic Programming", INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, BO, vol. 71, no. 1, 1 June 2006 (2006-06-01), pages 89 - 110, XP019410163, ISSN: 1573-1405 * |
| FEHN C: "Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 5291, 31 May 2004 (2004-05-31), pages 93 - 104, XP002444222, ISSN: 0277-786X * |
| SEBASTIAN KNORR ET AL: "Super-Resolution Stereo and Multi-View Synthesis from Monocular Video Sequences", 3-D DIGITAL IMAGING AND MODELING, 2007. 3DIM '07. SIXTH INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 1 August 2007 (2007-08-01), pages 55 - 64, XP031130980, ISBN: 978-0-7695-2939-4 * |
| SING BING KANG ET AL: "Handling occlusions in dense multi-view stereo", PROCEEDINGS 2001 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. CVPR 2001. KAUAI, HAWAII, DEC. 8 - 14, 2001; [PROCEEDINGS OF THE IEEE COMPUTER CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION], LOS ALAMITOS, CA, IEEE COMP. SOC, US, vol. 1, 8 December 2001 (2001-12-08), pages 103 - 110, XP010583734, ISBN: 978-0-7695-1272-3 * |
| VEDULA S.: "Multi-view Spatial and Temporal Interpolation for Dynamic Event Visualization", TECH. REPORT CMU-RI-TR-99-1, June 1999 (1999-06-01), Carnegie Mellon University, Pittsburgh, XP002531387, Retrieved from the Internet <URL:http://www.ri.cmu.edu/publication_view.html?pub_id=3107> [retrieved on 20090608] * |
| YU HUANG ET AL: "A layered method of visibility resolving in depth image-based rendering", 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, 2008: ICPR 2008; 8 - 11 DEC. 2008, TAMPA, FLORIDA, USA, IEEE, PISCATAWAY, NJ, 8 December 2008 (2008-12-08), pages 1 - 4, XP031412254, ISBN: 978-1-4244-2174-9 * |
| ZITNICK C L ET AL: "High-quality video view interpolation using a layered representation", ACM TRANSACTIONS ON GRAPHICS, ACM, US, vol. 23, no. 3, 8 August 2004 (2004-08-08), pages 600 - 608, XP002354522, ISSN: 0730-0301 * |
Cited By (47)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2011030298A1 (en) * | 2009-09-09 | 2011-03-17 | Nokia Corporation | Rendering multiview content in a 3d video system |
| US8284237B2 (en) | 2009-09-09 | 2012-10-09 | Nokia Corporation | Rendering multiview content in a 3D video system |
| CN101969564A (en) * | 2010-10-29 | 2011-02-09 | 清华大学 | Upsampling method for depth video compression of three-dimensional television |
| EP2458879A1 (en) * | 2010-11-26 | 2012-05-30 | Thomson Licensing | Occlusion layer extension |
| EP2458877A1 (en) * | 2010-11-26 | 2012-05-30 | Thomson Licensing | Occlusion layer extension |
| US20140198182A1 (en) * | 2011-09-29 | 2014-07-17 | Dolby Laboratories Licensing Corporation | Representation and Coding of Multi-View Images Using Tapestry Encoding |
| US9451232B2 (en) | 2011-09-29 | 2016-09-20 | Dolby Laboratories Licensing Corporation | Representation and coding of multi-view images using tapestry encoding |
| TWI493963B (en) * | 2011-11-01 | 2015-07-21 | Acer Inc | Image generating device and image adjusting method |
| WO2015164636A1 (en) * | 2014-04-25 | 2015-10-29 | Sony Computer Entertainment America Llc | Computer graphics with enhanced depth effect |
| US11619988B2 (en) | 2015-03-05 | 2023-04-04 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US12386417B2 (en) | 2015-03-05 | 2025-08-12 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US11429183B2 (en) | 2015-03-05 | 2022-08-30 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US10838207B2 (en) | 2015-03-05 | 2020-11-17 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US11256090B2 (en) | 2015-03-05 | 2022-02-22 | Magic Leap, Inc. | Systems and methods for augmented reality |
| US10909711B2 (en) | 2015-12-04 | 2021-02-02 | Magic Leap, Inc. | Relocalization systems and methods |
| US11288832B2 (en) | 2015-12-04 | 2022-03-29 | Magic Leap, Inc. | Relocalization systems and methods |
| US11536973B2 (en) | 2016-08-02 | 2022-12-27 | Magic Leap, Inc. | Fixed-distance virtual and augmented reality systems and methods |
| US11073699B2 (en) | 2016-08-02 | 2021-07-27 | Magic Leap, Inc. | Fixed-distance virtual and augmented reality systems and methods |
| WO2018029399A1 (en) * | 2016-08-11 | 2018-02-15 | Teknologian Tutkimuskeskus Vtt Oy | Apparatus, method, and computer program code for producing composite image |
| US10650488B2 (en) | 2016-08-11 | 2020-05-12 | Teknologian Tutkimuskeskus Vtt Oy | Apparatus, method, and computer program code for producing composite image |
| WO2018047033A1 (en) * | 2016-09-07 | 2018-03-15 | Nokia Technologies Oy | Method and apparatus for facilitating stereo vision through the use of multi-layer shifting |
| CN109983504A (en) * | 2016-09-07 | 2019-07-05 | 诺基亚技术有限公司 | Method and apparatus for promoting stereoscopic vision by using multiple layers of movement |
| WO2018063579A1 (en) * | 2016-09-29 | 2018-04-05 | Intel Corporation | Hybrid stereo rendering for depth extension in dynamic light field displays |
| US11483543B2 (en) | 2016-09-29 | 2022-10-25 | Intel Corporation | Hybrid stereo rendering for depth extension in dynamic light field displays |
| US10623723B2 (en) | 2016-09-29 | 2020-04-14 | Intel Corporation | Hybrid stereo rendering for depth extension in dynamic light field displays |
| US11711668B2 (en) | 2017-01-23 | 2023-07-25 | Magic Leap, Inc. | Localization determination for mixed reality systems |
| US11206507B2 (en) | 2017-01-23 | 2021-12-21 | Magic Leap, Inc. | Localization determination for mixed reality systems |
| US10812936B2 (en) | 2017-01-23 | 2020-10-20 | Magic Leap, Inc. | Localization determination for mixed reality systems |
| US10762598B2 (en) | 2017-03-17 | 2020-09-01 | Magic Leap, Inc. | Mixed reality system with color virtual content warping and method of generating virtual content using same |
| US10861237B2 (en) * | 2017-03-17 | 2020-12-08 | Magic Leap, Inc. | Mixed reality system with multi-source virtual content compositing and method of generating virtual content using same |
| US11315214B2 (en) | 2017-03-17 | 2022-04-26 | Magic Leap, Inc. | Mixed reality system with color virtual content warping and method of generating virtual con tent using same |
| US10964119B2 (en) | 2017-03-17 | 2021-03-30 | Magic Leap, Inc. | Mixed reality system with multi-source virtual content compositing and method of generating virtual content using same |
| US11978175B2 (en) | 2017-03-17 | 2024-05-07 | Magic Leap, Inc. | Mixed reality system with color virtual content warping and method of generating virtual content using same |
| US11410269B2 (en) | 2017-03-17 | 2022-08-09 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| US10861130B2 (en) | 2017-03-17 | 2020-12-08 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| AU2018233733B2 (en) * | 2017-03-17 | 2021-11-11 | Magic Leap, Inc. | Mixed reality system with multi-source virtual content compositing and method of generating virtual content using same |
| EP3596702A4 (en) * | 2017-03-17 | 2020-07-22 | Magic Leap, Inc. | Mixed reality system with multi-source virtual content compositing and method of generating virtual content using same |
| US11423626B2 (en) | 2017-03-17 | 2022-08-23 | Magic Leap, Inc. | Mixed reality system with multi-source virtual content compositing and method of generating virtual content using same |
| US10939085B2 (en) | 2017-10-19 | 2021-03-02 | Intel Corporation | Three dimensional glasses free light field display using eye location |
| US11438566B2 (en) | 2017-10-19 | 2022-09-06 | Intel Corporation | Three dimensional glasses free light field display using eye location |
| US12028502B2 (en) | 2017-10-19 | 2024-07-02 | Intel Corporation | Three dimensional glasses free light field display using eye location |
| US11790482B2 (en) | 2018-07-23 | 2023-10-17 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| US12190468B2 (en) | 2018-07-23 | 2025-01-07 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| US11379948B2 (en) | 2018-07-23 | 2022-07-05 | Magic Leap, Inc. | Mixed reality system with virtual content warping and method of generating virtual content using same |
| EP3703003A1 (en) * | 2019-02-28 | 2020-09-02 | Dolby Laboratories Licensing Corp. | Hole filling for depth image based rendering |
| US11393113B2 (en) | 2019-02-28 | 2022-07-19 | Dolby Laboratories Licensing Corporation | Hole filling for depth image based rendering |
| US11670039B2 (en) | 2019-03-04 | 2023-06-06 | Dolby Laboratories Licensing Corporation | Temporal hole filling for depth image based video rendering |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2009091563A1 (en) | Depth-image-based rendering | |
| EP3669333B1 (en) | Sequential encoding and decoding of volymetric video | |
| EP2150065B1 (en) | Method and system for video rendering, computer program product therefor | |
| Zinger et al. | Free-viewpoint depth image based rendering | |
| CN102598674B (en) | Depth map generation techniques for conversion of 2D video data to 3D video data | |
| US8284237B2 (en) | Rendering multiview content in a 3D video system | |
| US9525858B2 (en) | Depth or disparity map upscaling | |
| JP7344988B2 (en) | Methods, apparatus, and computer program products for volumetric video encoding and decoding | |
| US20130182184A1 (en) | Video background inpainting | |
| US12356006B2 (en) | Method and apparatus for encoding volumetric video represented as a multiplane image | |
| CN103828359A (en) | Representation and coding of multi-view images using tapestry encoding | |
| WO2014037603A1 (en) | An apparatus, a method and a computer program for image processing | |
| EP3939315A1 (en) | A method and apparatus for encoding and rendering a 3d scene with inpainting patches | |
| EP2803041B1 (en) | Method for multi-view mesh texturing and corresponding device | |
| EP4038884A1 (en) | A method and apparatus for encoding, transmitting and decoding volumetric video | |
| Do et al. | Quality improving techniques for free-viewpoint DIBR | |
| Mieloch et al. | Graph-based multiview depth estimation using segmentation | |
| Muller et al. | Compressing time-varying visual content | |
| Lai et al. | An efficient depth image-based rendering with depth reliability maps for view synthesis | |
| Colleu et al. | A polygon soup representation for multiview coding | |
| Sebai et al. | Piece-wise linear function estimation for platelet-based depth maps coding using edge detection | |
| Maceira et al. | Region-based depth map coding using a 3D scene representation | |
| Zhao et al. | Virtual view synthesis and artifact reduction techniques | |
| WO2022219230A1 (en) | A method, an apparatus and a computer program product for video encoding and video decoding | |
| EP3598749A1 (en) | A method and apparatus for generating an immersive image from images captured by a plurality of cameras |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09701747 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 09701747 Country of ref document: EP Kind code of ref document: A1 |