[go: up one dir, main page]

US20130321586A1 - Cloud based free viewpoint video streaming - Google Patents

Cloud based free viewpoint video streaming Download PDF

Info

Publication number
US20130321586A1
US20130321586A1 US13/588,917 US201213588917A US2013321586A1 US 20130321586 A1 US20130321586 A1 US 20130321586A1 US 201213588917 A US201213588917 A US 201213588917A US 2013321586 A1 US2013321586 A1 US 2013321586A1
Authority
US
United States
Prior art keywords
scene
fvv
computing device
viewpoint
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/588,917
Inventor
Adam Kirk
Patrick Sweeney
Don Gillett
Neil Fishman
Kanchan Mitra
Amit Mital
David Harnett
Yaron Eshet
Simon Winder
David Eraker
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US13/588,917 priority Critical patent/US20130321586A1/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SWEENEY, PATRICK, ERAKER, DAVID, MITAL, AMIT, WINDER, SIMON, ESHET, Yaron, FISHMAN, NEIL, GILLETT, DON, HARNETT, DAVID, KIRK, ADAM, MITRA, KANCHAN
Publication of US20130321586A1 publication Critical patent/US20130321586A1/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/08Volume rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/246Calibration of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/257Colour aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/56Particle system, point based geometry or rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/005Audio distribution systems for home, i.e. multi-room use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • a given video generally includes one or more scenes, where each scene in the video can be either relatively static (e.g., the objects in the scene do not substantially change or move over time) or dynamic (e.g., the objects in the scene substantially change and/or move over time).
  • the viewpoint of each scene is chosen by the director when the video is recorded/captured and this viewpoint cannot be controlled or changed by an end user while they are viewing the video.
  • the viewpoint of each scene is fixed and cannot be modified when the video is being rendered and displayed.
  • a free viewpoint video an end user can interactively control and change their viewpoint of each scene at will while they are viewing the video.
  • each end user can interactively generate synthetic (i.e., virtual) viewpoints of each scene on-the-fly while the video is being rendered and displayed. This creates a feeling of immersion for any end user who is viewing a rendering of the captured scene, thus enhancing their viewing experience.
  • Cloud based free viewpoint video (FVV) streaming technique embodiments described herein generally involve generating a FVV that provides a consistent and manageable amount of data to a client despite the large amounts of data typically demanded to create and render the FVV.
  • this is accomplished by first capturing a scene using an arrangement of sensors.
  • This sensor arrangement includes a plurality of sensors that generate a plurality of streams of sensor data, where each stream represents the scene from a different geometric perspective.
  • scene proxies are generated from the calibrated streams of sensor data.
  • the scene proxies geometrically describe the scene as a function of time.
  • a current synthetic viewpoint of the scene is received from a client computing device via a data communication network.
  • This current synthetic viewpoint was selected by an end user of the client computing device. Once a current synthetic viewpoint is received, a sequence of frames is generated using the scene proxies. Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, and is transmitted to the client computing device via the data communication network for display to the end user of the client computing device.
  • a FVV produced as described above is played in one general embodiment as follows.
  • a request is received from an end user to display a FVV selection user interface screen that allows the end user to select a FVV available for playing.
  • This FVV selection user interface screen is displayed on a display device, and an end user FVV selection is input.
  • the end user FVV selection is then transmitted to a server via a data communication network.
  • the client computing device receives an instruction from the server via the data communication network to instantiate end user controls appropriate for the type of FVV selected.
  • an appropriate FVV control user interface is provided to the end user.
  • the client computing device then monitors end user inputs via the FVV control user interface, and whenever an end user viewpoint navigation input is received, it is transmitted to the server via the data communication network.
  • FVV frames are then received from the server.
  • Each FVV frame depicts at least a portion of the captured scene as it would be viewed from the last viewpoint the end user input, and is displayed on the aforementioned display device as it is received.
  • FIG. 1 is a diagram illustrating an exemplary embodiment, in simplified form, of the various stages in a free viewpoint video (FVV) pipeline.
  • FVV free viewpoint video
  • FIG. 2 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a FVV of a scene.
  • FIG. 3 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for capturing and inputting scene data.
  • FIG. 4 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by an arrangement of sensors that is being used to capture the scene.
  • FIG. 5 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for processing the calibrated streams of sensor data to generate scene proxies which geometrically describes the scene as a function of time.
  • FIG. 6 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using a point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene.
  • FIG. 7 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene.
  • FIG. 8 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene.
  • FIG. 9 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for processing the scene proxies to render a FVV.
  • FIG. 10 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for implementing the rendering of FIG. 10 .
  • FIG. 11 is a diagram illustrating the various degrees of viewpoint navigation freedom that can be supported by the FVV pipeline.
  • FIG. 12 is a diagram illustrating an exemplary embodiment, in simplified form, of a continuum of the various image-based rendering methods which can be employed by the FVV pipeline.
  • FIGS. 13A-B are a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user.
  • FIG. 14 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user, where the end user can temporally control the playback of the FVV.
  • FIG. 15 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user, where the end user can temporally control the playback of the FVV to play it in reverse.
  • FIG. 16 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user, where the end user can temporally control the playback to pause and restart the FVV.
  • FIGS. 17A-B are a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for rendering a FVV that involves predicting the next new viewpoint to be requested by the end user.
  • FIG. 18 is a diagram illustrating a simplified example of a general-purpose computer system on which various embodiments and elements of the cloud based FVV streaming technique, as described herein, may be implemented.
  • streaming technique embodiments cloud based free viewpoint video (FVV) streaming technique embodiments (hereafter sometimes simply referred to as streaming technique embodiments) reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the streaming technique can be practiced. It is understood that other embodiments can be utilized and structural changes can be made without departing from the scope of the streaming technique.
  • VV free viewpoint video
  • the term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene.
  • the streaming technique embodiments described herein employ a plurality of sensors which can be configured in various arrangements to capture a scene, thus allowing a plurality of streams of sensor data to be generated each of which represents the scene from a different geometric perspective.
  • Each of the sensors can be any type of video capture device (e.g., any type of video camera), or any type of audio capture device, or any combination thereof.
  • Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time).
  • the streaming technique embodiments described herein can employ a combination of different types of sensors to capture a given scene.
  • baseline is used herein to refer to a ratio of the actual physical distance between a given pair of VCDs to the average of the actual physical distance from each sensor in the pair to the viewpoint of the scene.
  • this ratio is larger than a prescribed value the pair of sensors is referred to herein as a “wide baseline stereo pair of sensors”.
  • this ratio is smaller than the prescribed value the pair of sensors is referred to herein as a “narrow baseline stereo pair of sensors”.
  • server is used herein to refer to one or more server computing devices operating in a cloud infrastructure so as to provide FVV services to a client computer over a data communication network.
  • FVV field-view virtual reality
  • a scene is simultaneously recorded from many different perspectives using sensors such as RGB cameras.
  • this data is processed to extract three dimensional (3D) geometric information in the form of scene proxies using, for example, 3D Reconstruction (3DR) algorithms.
  • 3DR 3D Reconstruction
  • the original data and geometric proxies is recombined during rendering, using Image Based rendering (IBR) algorithms to generate synthetic viewpoints.
  • IBR Image Based rendering
  • the amount of data may vary considerably from one FVV to another FVV due to difference in the number of sensors used to record the scene, the length of the FVV, the type of 3DR algorithms used to process the data, and the type of IBR algorithm used to generate synthetic views of the scene.
  • One way in which a FVV can be transferred from a server to a client over a data communication network is to combine the 3D geometry and other data for a specific viewpoint to produce a single image or video frame on the server, and then to transmit this frame from the server to the client. The frame is then displayed by the client in a normal manner.
  • This pre-computed frame transmission approach has the advantage of providing a consistent and manageable amount of data to a client despite the large amounts of data demanded to create and render a FVV, and the fact that the amount of data can be constantly changing.
  • the FVV data stays on a server (or servers) in the cloud, and even clients with limited processing power and/or limited available bandwidth can receive and display a FVV.
  • a cloud based streaming FVV that represents what would be seen from a specific viewpoint is that FVV's can be commercially deployed to end users and use a similar level of bandwidth as a conventional streaming movie would consume. This approach will be referred to herein as cloud based FVV streaming.
  • a new (typically user specified) viewpoint is sent from the client to the server, and a new stream of video data is initiated from the new viewpoint. Frames associated with that viewpoint are created, rendered and transmitted to the client until a new viewpoint request is received.
  • Cloud based FVV streaming technique embodiments described herein generally employ a cloud based FVV pipeline to create, render and transmit FVV frames depicting a captured scene as would be viewed from a current synthetic viewpoint received from a client.
  • An exemplary FVV pipeline will be now be described. It is noted, however, that cloud based FVV streaming technique embodiments described herein are not limited to only the exemplary FVV pipeline to be described. Rather, other FVV pipelines can also be employed to create and render video frames in response to a viewpoint request, as desired.
  • the exemplary FVV pipeline described here involves generating an FVV of a given scene and presenting the FVV to one or more end users.
  • the exemplary FVV pipeline enables optimal viewpoint navigation for up to six degrees of viewpoint navigation freedom.
  • this exemplary FVV pipeline does not rely upon having to constrain the pipeline in order to produce a desired visual result.
  • the pipeline eliminates the need to place constraints in order to generate various synthetic viewpoints of the scene which are photo-realistic and thus are free of discernible artifacts.
  • the pipeline eliminates having to constrain the arrangement of the sensors that are used to capture the scene. Accordingly, the pipeline is operational with any arrangement of sensors.
  • the pipeline eliminates having to constrain the number or types of sensors that are used to capture the scene. Accordingly, the pipeline is operational with any number of sensors and all types of sensors.
  • the pipeline also eliminates having to constrain the number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene. Accordingly, the pipeline can produce visual results having as many as six degrees of viewpoint navigation freedom.
  • the pipeline eliminates having to constrain the complexity or composition of the scene that is being captured (e.g., neither the environment(s) in the scene, nor the types of objects in the scene, nor the number of people of in the scene, among other things has to be constrained). Accordingly, the pipeline is operational with any type of scene, including both relatively static and dynamic scenes.
  • the pipeline does not rely upon having to use a specific 3D reconstruction method to generate a 3D reconstruction of the captured scene. Accordingly, the pipeline supports the use of any one or more 3D reconstruction methods and therefore provides the freedom to use whatever 3D reconstruction method(s) produces the desired visual result (e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom) based on the particular characteristics of the streams of sensor data that are generated by the sensors (e.g., based on factors such as the particular number and types of sensors that are used to capture the scene, and the particular arrangement of these sensors that is used), along with other current pipeline conditions.
  • the desired visual result e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom
  • the desired visual result e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom
  • the desired visual result e.g., the highest degree of photo-realism for the
  • the exemplary pipeline also does not rely upon having to use a specific image-based rendering method during the rendering of a frame of the captured scene. Accordingly, the pipeline supports the use of any image-based rendering method and therefore provides the freedom to use whatever image-based rendering method(s) produces the desired visual result based on the particular characteristics of the streams of sensor data that are generated by the sensors, along with other current pipeline conditions.
  • an image-based rendering method that renders a lower fidelity 3D geometric proxy of the captured scene may produce an optimally photo-realistic visual result when the end user's viewpoint is close to the axis of one of the video capture devices (such as with billboards).
  • a conventional image warping/morphing image-based rendering method may produce an optimally photo-realistic visual result.
  • a conventional view interpolation image-based rendering method may produce an optimally photo-realistic visual result.
  • a conventional lumigraph or light field image-based rendering method may produce an optimally photo-realistic visual result.
  • the exemplary pipeline results in a flexible, robust and commercially viable next generation FVV processing pipeline that meets the needs of today's various creative video producers and editors.
  • the pipeline is applicable to various types of video-based media applications such as consumer entertainment (e.g., movies, television shows, and the like) and videoconference/telepresence, among others.
  • the pipeline can support a broad range of features that provide for the capture, processing, storage, rendering and distribution of any type of FVV that can be generated.
  • Various implementations of the pipeline are possible, where each different implementation supports a different type of FVV. Exemplary types of supported FVV are described in more detail hereafter.
  • the pipeline allows any one or more parameters to be freely modified without introducing artifacts into the FVV. This allows the photo-realism of the FVV that is presented to each end user to be maximized (i.e., the artifacts are minimized) regardless of the characteristics of the various sensors that are used to capture the scene, and the characteristics of the various streams of sensor data that are generated by the sensors.
  • Exemplary pipeline parameters which can be modified include, but are not limited to, the following.
  • the number and types of sensors that are used to capture the scene can be modified.
  • the arrangement of the sensors can also be modified. Which if any of the sensors is static and which is moving can also be modified.
  • the complexity and composition of the scene can also be modified. Whether the scene is relatively static or dynamic can also be modified.
  • the 3D reconstruction methods and image-based rendering methods that are used can also be modified.
  • the number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene can also be modified.
  • FIG. 1 illustrates an exemplary embodiment, in simplified form, of the various stages in a FVV processing pipeline.
  • the FVV processing pipeline 100 starts with a capture stage 102 during which, and generally speaking, the following actions take place.
  • An arrangement of sensors is used to capture a given scene, where the arrangement includes a plurality of sensors and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. These streams of sensor data are input from the sensors and calibrated in a manner which will be described in more detail hereafter.
  • the calibrated streams of sensor data are then output to a processing stage 104 .
  • the exemplary pipeline supports the use of various types, various numbers and various combinations of sensors which can be configured in various arrangements, including both 2D and 3D arrangements, where each of the sensors can be either static or moving.
  • the processing stage 104 of the FVV processing pipeline 100 inputs the calibrated streams of sensor data and processes them to generate scene proxies which geometrically describes the captured scene as a function of time.
  • the scene proxies are then output to a storage stage 106 where they are stored in a memory via conventional means.
  • the scene proxies are generated using one or more different 3D reconstruction methods which extract 3D geometric information from the calibrated streams of sensor data.
  • the particular 3D reconstruction methods that are used and the particular manner in which each scene proxy is generated are determined based on a periodic analysis of a set of current conditions.
  • a rendering stage 108 of the FVV processing pipeline 100 inputs scene proxies from the storage stage 106 and processes them to generate frames of the captured scene as would be viewed from a current synthetic viewpoint.
  • the rendering is designed to maximize the photo-realism of the frame produced.
  • the aforementioned frames are generated using one or more different image-based rendering methods.
  • the particular image-based rendering methods that are used and the particular manner in which each frame is generated are determined based on a periodic analysis of the set of current conditions.
  • the frames of the captured scene are then output to a user viewing experience stage 110 .
  • the user viewing experience stage 110 of the FVV processing pipeline 100 generally provides one or more end users with the ability to view the frames of the captured scene, as the scene would be viewed from a current synthetic viewpoint, on a display device and to navigate/control this viewpoint on-the-fly at will.
  • the user viewing experience stage 110 provides each end user with the ability to continuously and interactively navigate/control their viewpoint of the scene that is being displayed on their display device.
  • the user viewing experience stage 110 may also provide each end user with the ability to interact temporally with the FVV at will, as will be explained in more detail later.
  • each end user can interactively navigate their viewpoint of the scene via a client computing device 112 associated with that user.
  • this new viewpoint is provided to the user viewing experience stage 110 by the user's client computing device 112 via a data communication network 114 that the end user's client computing devices is connected to (such as the Internet or a proprietary intranet).
  • This transfer of the new viewpoint is done in a conventional manner consistent with the network being employed.
  • the user viewing experience stage 110 receives and forwards the new viewpoint to the rendering stage 108 , which will modify the current synthetic viewpoint of the scene accordingly and produce frames of the captured scene as would be viewed from the new synthetic viewpoint. These frames are then provided in turn to the user viewing experience stage 110 .
  • the user viewing experience stage 110 receives each frame and provides it the user's client computing device 112 via the aforementioned network 114 . This transfer of the frames is also done in a conventional manner consistent with the network being employed.
  • each end user can also interact temporally to control the playback of the FVV, and based on this temporal control the rendering stage 108 will provide FVV frames starting with the frame that corresponds to the last user-specified temporal location in the FVV.
  • the foregoing FVV processing pipeline can be employed by the cloud based FVV streaming technique embodiments described herein to generate a free viewpoint video (FVV) of a scene.
  • a computer such as any of the computing devices described in the Exemplary Operating Environments to follow
  • This sensor arrangement includes a plurality of sensors that generate a plurality of streams of sensor data, where each stream represents the scene from a different geometric perspective.
  • These streams of sensor data are input and calibrated (block 202 ), and then scene proxies are generated from the calibrated streams of sensor data (block 204 ).
  • the scene proxies geometrically describe the scene as a function of time.
  • a current synthetic viewpoint of the scene is received from a client computing device via a data communication network (block 206 ). It is noted that this current synthetic viewpoint was selected by an end user of the client computing device.
  • a sequence of frames is generated using the scene proxies (block 208 ). Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, and in block 210 is transmitted to the client computing device via the data communication network for display to the end user of the client computing device.
  • the scene proxies are stored as they are generated. This feature can allow complete FVVs to be recorded for playback at a future time.
  • each different implementation supports a different type of FVV and a different user viewing experience.
  • each of these different implementations differs in terms of the user viewing experience it provides, its latency characteristics (i.e., how rapidly the streams of sensor data have to be processed through the FVV processing pipeline), its storage characteristics, and the types of computing device hardware it necessitates.
  • one implementation of the pipeline supports non-live FVVs, which corresponds to a situation where the streams of sensor data that are generated by the sensors are captured and processed for future playback.
  • non-live FVVs which corresponds to a situation where the streams of sensor data that are generated by the sensors are captured and processed for future playback.
  • This allows an FVV producer to optionally manually “touch-up” the streams of sensor data that are input during the capture stage 102 , and also optionally manually remove any 3D reconstruction artifacts that are introduced in the processing stage 104 .
  • This particular implementation is referred to hereafter as the “recorded FVV implementation”.
  • Exemplary types of video-based media that work well in the recorded FVV implementation include movies, documentaries, sitcoms and other types of television shows, music videos, digital memories, and the like.
  • Another exemplary type of video-based media that works well in the recorded FVV implementation is the use of special effects technology where synthetic objects are realistically modeled, lit, shaded and added to a pre-captured scene.
  • another implementation of the pipeline supports unidirectional (i.e., one-way) live FVV, which corresponds to a situation where the streams of sensor data that are being generated by the sensors are concurrently captured 102 and processed 104 ; and the resulting 3D scene proxy of the captured scene is stored and rendered into frames, and then transmitted in a one-to-many manner on-the-fly (i.e., live) to one or more end users.
  • each end user can view the scene live (i.e., each user can view the scene at substantially the same time it is being captured 102 ).
  • This particular implementation is referred to hereafter as the “unidirectional live FVV implementation”.
  • Exemplary types of video-based media that work well in the unidirectional live FVV implementation include sporting events, news programs, live concerts, and the like.
  • bidirectional live FVV implementation yet another implementation of the supports bidirectional (i.e., two-way) live FVV such as that which is associated with various videoconferencing/telepresence applications.
  • This particular implementation is referred to hereafter as the “bidirectional live FVV implementation”.
  • This bidirectional live FVV implementation is generally the same as the unidirectional live FVV implementation with the following exception.
  • a computing device at each physical location that is participating in a given videoconferencing/telepresence session is able to concurrently capture streams of sensor data that are being generated by sensors which are capturing a local scene and processing these locally captured streams of sensor data into scene proxies, storing the proxies, render the proxies into frames, and transmitting the resulting FVV frames of the local scene in a one-to-many manner on the fly to the other physical locations that are participating in the session.
  • this period of time is one thirtieth of a second per frame.
  • the exemplary pipeline generally employs a plurality of sensors which are configured in a prescribed arrangement to capture a given scene.
  • the pipeline is operable with any type of sensor, any number (two or greater) of sensors, any arrangement of sensors (where this arrangement can include a plurality of different geometries and different geometric relationships between the sensors), and any combination of different types of sensors.
  • the pipeline is also operable with both static and moving sensors.
  • a given sensor can be any type of video capture device (examples of which are described in more detail hereafter), or any type of audio capture device (such as a microphone, or the like), or any combination thereof.
  • Each video capture device generates a stream of video data which includes a stream of images (also known as and referred to herein as “frames”) of the scene from the specific geometric perspective of the video capture device.
  • each audio capture device generates a stream of audio data representing the audio emanating from the scene from the specific geometric perspective of the audio capture device.
  • a given video capture device can be a conventional visible light video camera which generates a stream of video data that includes a stream of color images of the scene.
  • a given video capture device can also be a conventional light-field camera (also known as a “plenoptic camera”) which generates a stream of video data that includes a stream of color light field images of the scene.
  • a given video capture device can also be a conventional infrared structured-light projector combined with a conventional infrared video camera that is matched to the projector, where this projector/camera combination generates a stream of video data that includes a stream of infrared images of the scene.
  • a given video capture device can also be a conventional monochromatic camera which generates a stream of video data that includes a stream of monochrome images of the scene.
  • a given video capture device can also be a conventional time-of-flight camera which generates a stream of video data that includes both a stream of depth map images of the scene and a stream of color images of the scene.
  • the term “color camera” is sometimes used herein to refer to any type of video capture device that generates color images of the scene.
  • the exemplary pipeline generally employs a minimum of one sensor which generates color image data for the scene, along with one or more other sensors that can be used in combination to generate 3D geometry data for the scene.
  • a wide baseline stereo pair of color cameras In situations where an outdoor scene is being captured or the sensors are located far from the scene, it is advantageous to capture the scene using both a wide baseline stereo pair of color cameras and a narrow baseline stereo pair of color cameras.
  • a narrow baseline stereo pair of sensors both of which generate video data that includes a stream of infrared images of the scene in order to eliminate the dependency on scene lighting variables.
  • the use of additional sensors serves to reduce the number of occluded areas within the scene. It may also be advantageous to capture the entire scene using a given arrangement of static sensors, and at the same time also capture a specific higher complexity region of the scene using one or more additional moving sensors. In a situation where a large number of sensors is used to capture a complex scene, different combinations of the sensors can be used during the processing stage of the FVV processing pipeline (e.g., a situation where a specific sensor is part of both a narrow baseline stereo pair and a different wide baseline stereo pair involving a third sensor).
  • FIG. 3 illustrates an exemplary embodiment, in simplified form, of a process for capturing and inputting scene data.
  • the process starts in block 300 with using an arrangement of sensors to capture the scene, where the arrangement includes a plurality of sensors and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective.
  • the streams of sensor data are then input (block 302 ).
  • a given stream of sensor data will include video data whenever the sensor that generated the stream is a video capture device.
  • a given stream of sensor data will include audio data whenever the sensor that generated the stream is an audio capture device.
  • a given stream of sensor data will include both video and audio data whenever the sensor that generated the stream is a combined video and audio capture device.
  • FIG. 4 illustrates an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by the arrangement of sensors.
  • the process starts in block 400 with determining the number of sensors in the arrangement of sensors that is being used to capture the scene. Intrinsic characteristics of each of the sensors are then determined (block 402 ).
  • Exemplary intrinsic characteristics which can be determined for a given sensor include the sensor type, the sensor's frame rate, the sensor's shutter speed, the sensor's mosaic pattern, the sensor's white balance, the bit depth and pixel resolution of the images that are generated by the sensor, the focal length of the VCD's lens, the principal point of the VCD's lens, the VCD's skew coefficient, the distortions of the VCD's lens, the VCD's field of view, among others. It will be appreciated that knowing such intrinsic characteristics for each of the sensors allows the FVV processing pipeline to understand the governing physics and optics of each of the sensors. Extrinsic characteristics of each of the sensors at each point in time during the capture of the scene are also determined (block 404 ).
  • Exemplary extrinsic characteristics which can be determined for a given sensor include the sensor's current rotational orientation (i.e., the direction that the sensor is currently pointing), the sensor's current spatial location (i.e., the sensor's current location within the arrangement), whether the sensor is static or moving, the current geometric relationship between the sensor and each of the other sensors in the arrangement (i.e., the sensor's current position relative to each of the other sensors), the position of the sensor relative to the scene, and whether or not the sensor is genlocked (i.e., temporally synchronized) with the other sensors in the arrangement, among others.
  • the determination of the intrinsic and extrinsic characteristics of each of the sensors can be made using various conventional methods, examples of which will be described in more detail hereafter.
  • the knowledge of the number of sensors in the arrangement, and the intrinsic and extrinsic characteristics of each of the sensors is then used to temporally and spatially calibrate the streams of sensor data (block 406 ).
  • the intrinsic and extrinsic characteristics of each of the sensors in the arrangement are commonly determined by performing one or more calibration procedures which calibrate the sensors, where these procedures are specific to the particular types of sensors that are being used to capture the scene, and the particular number and arrangement of the sensors.
  • the calibration procedures are performed and the streams of sensor data which are generated thereby are input before the scene capture.
  • the calibration procedures can be performed and the streams of sensor data which are generated thereby can be input either before or after the scene capture. Exemplary calibration procedures will now be described.
  • the sensors that are being used to capture the scene are genlocked and include a combination of color cameras, sensors which generate a stream of infrared images of the scene, and one or more time-of-flight cameras, and this combination of cameras is arranged in a static array
  • the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner.
  • a stream of calibration data can be input from each of the cameras in the array while a common physical feature (such as a ball, or the like) is internally illuminated with an incandescent light (which is visible to all of the cameras) and moved throughout the scene.
  • These streams of calibration data can then be analyzed using conventional methods to determine both an intrinsic and extrinsic calibration matrix for each of the cameras.
  • the sensors that are being used to capture the scene include a plurality of color cameras which are arranged in a static array
  • the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner.
  • a stream of calibration data can be input from each camera in the array while it is moved around the scene but in close proximity to its static location (thus allowing each camera in the array to view overlapping parts of the static background of the scene).
  • the streams of sensor data can be analyzed using conventional methods to identify features in the scene, and these features can then be used to calibrate the cameras in the array and determine the intrinsic and extrinsic characteristics of each of the cameras by employing a conventional method (e.g., extrinsic characteristics can be determined using a structure-from-motion method).
  • each of these moving sensors can be calibrated and its intrinsic and extrinsic characteristics can be determined at each point in time during the scene capture by using a conventional background model to register and calibrate relevant individual images that were generated by the sensor.
  • the sensors that are being used to capture the scene include a combination of static and moving sensors, the sensors can be calibrated and the intrinsic and extrinsic characteristics of each of the sensors can be determined by employing conventional multistep calibration procedures.
  • the exemplary pipeline will both spatially and temporally calibrate the streams of sensor data generated by the sensors at all points in time during the scene capture before the streams are processed in the processing stage.
  • this spatial and temporal calibration can be performed as follows. After the scene is captured and the streams of sensor data representing the scene are input, the streams of sensor data can be analyzed using conventional methods to separate the static and moving elements of the scene. The static elements of the scene can then be used to generate a background model.
  • the moving elements of the scene can be used to generate a global timeline that encompasses all of the sensors, and each image in each stream of sensor data is assigned a relative time.
  • the intrinsic characteristics of each of the sensors can be determined by using conventional methods to analyze each of the streams of sensor data.
  • the intrinsic characteristics of each of the sensors can also be determined by reading appropriate hardware parameters directly from each of the sensors.
  • the number of sensors and various intrinsic properties of each of the sensors can be determined by analyzing the streams of sensor data using conventional methods.
  • FIG. 5 illustrates an exemplary embodiment, in simplified form, of a process for processing the calibrated streams of sensor data to generate the scene proxies.
  • the process starts in block 500 with monitoring and periodically analyzing a set of current pipeline conditions.
  • the set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the particular number of sensors that is being (or was) used to capture the scene, or the particular arrangement of these sensors that is being (or was) used, or one or more particular intrinsic characteristics of each of the sensors (e.g., the sensor type, among others), or one or more particular extrinsic characteristics of each of the sensors (e.g., the positioning of the sensor relative to the scene, and whether the sensor is static or moving, among others), or the like.
  • one or more conditions in the capture stage of the FVV processing pipeline such as the particular number of sensors that is being (or was) used to capture the scene, or the particular arrangement of these sensors that is being (or was) used, or one or more particular intrinsic characteristics of each of the sensors (e.g., the sensor type, among others), or one or more particular extrinsic characteristics of each of the sensors (e.g., the positioning of the sensor relative to the scene, and whether the sensor is static or moving, among others),
  • the set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as whether the scene proxies are being generated on-the-fly, or being generated and stored for future playback (i.e., the particular type of FVV that is being processed in the pipeline and the speed at which the streams of sensor data have to be processed through the pipeline), or the like.
  • the set of current pipeline conditions can also include one or more conditions in the storage stage of the FVV processing pipeline such as the amount of storage space that is currently available to store the scene proxy.
  • the set of current pipeline conditions can also include one or more conditions in the rendering stage of the pipeline such as the current viewpoint navigation information and temporal navigation information. In addition to including this viewpoint and temporal navigation information, the second set of current conditions is also generally associated with the specific implementation of the pipeline technique embodiments that is being used.
  • the set of current pipeline conditions can also further include one or more conditions in the user viewing experience stage of the pipeline such as the particular type of display device the rendered frames are being displayed on, and the particular characteristics of the display device (e.g., its aspect ratio, its pixel resolution, and its form factor, among others).
  • the results of this analysis are then used to select one or more different 3D reconstruction methods which are matched to the current pipeline conditions (block 502 ).
  • the selected 3D reconstruction methods are then used to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data (block 504 ).
  • the 3D reconstructions of the scene and the results of the period analysis are then used to generate the scene proxy (block 506 ).
  • the actions of blocks 500 , 502 , 504 and 506 are repeated for the duration of the scene (block 508 , No).
  • the 3D reconstruction methods which are used, the types of 3D reconstructions of the scene which are generated, and thus the types of geometric proxy data in a scene proxy can change over time based upon changes in the current pipeline conditions.
  • the current pipeline conditions can be analyzed using different periodicities.
  • the current pipeline conditions can be analyzed on a frame-by-frame basis (i.e., for each image in the streams of sensor data).
  • the current pipeline conditions can be analyzed using a periodicity of a prescribed number of sequential frames, where this number is greater than one.
  • the current pipeline conditions can be analyzed using a periodicity of a prescribed period of time.
  • the exemplary pipeline can use a wide variety of 3D reconstruction methods in various combinations, where the particular types of 3D reconstruction methods that are being used depend upon various current conditions in the FVV processing pipeline.
  • the scene proxies represent one or more types of geometric proxy data examples of which include, but are not limited to, the following.
  • a scene proxy can include a stream of depth map images of the scene.
  • a scene proxy can also include a stream of calibrated point cloud reconstructions of the scene. As is appreciated in the art of 3D reconstruction, these point cloud reconstructions are a low order geometric representation of the scene.
  • a scene proxy can also include one or more types of high order geometric models such as planes, billboards, and existing (i.e., previously created) generic object models (e.g., human body models) which can be either modified, or animated, or both.
  • a scene proxy can also include other high fidelity proxies such as a stream of mesh models of the scene, and the like. It will further be appreciated that since the particular 3D reconstruction methods that are used and the related manner in which a scene proxy is generated are based upon a period analysis (i.e., monitoring) of the various current conditions in the FVV processing pipeline, the 3D reconstruction methods that are used and the resulting types of data in the scene proxy can change over time based on changes in the pipeline conditions.
  • a scene proxy that is generated will include a stream of calibrated point cloud reconstructions of the scene, and may also include one or more types of higher order geometric models which can be either modified, or animated, or both.
  • 3D reconstruction methods which can be implemented in hardware are also favored in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments.
  • the use of sensors which generate infrared images of the scene is also favored in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments.
  • a scene proxy that is generated can include both a stream of calibrated point cloud reconstructions of the scene, as well as one or more higher fidelity geometric proxies of the scene (such as when the point cloud reconstructions are used to generate a stream of mesh models of the scene, among other possibilities).
  • the recorded FVV implementation of the pipeline also allows a plurality of 3D reconstruction steps to be used in sequence when generating the scene proxy.
  • FIG. 6 illustrates an exemplary embodiment, in simplified form, of a process for using a point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene (hereafter simply referred to as different depth map image streams).
  • the calibrated streams of sensor data include a plurality of different depth map image streams (block 600 , Yes)
  • these different depth map image streams are merged into a stream of calibrated point cloud reconstructions of the scene (block 602 ). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein.
  • the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more types of high fidelity geometric proxies of the scene (block 604 ).
  • the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene, where this mesh model generation can be performed using conventional methods such as Poisson, among others.
  • FIG. 7 illustrates an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene (hereafter simply referred to as different infrared image streams).
  • the calibrated streams of sensor data include a plurality of different infrared image streams (block 700 , Yes)
  • the following actions occur. Any narrow baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of infrared image streams are identified (block 702 ).
  • a first set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified narrow baseline stereo pairs of sensors (block 704 ). Any wide baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of infrared image streams are then identified (block 706 ).
  • a second set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified wide baseline stereo pairs of sensors (block 708 ).
  • the different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 710 ). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein.
  • the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more types of high fidelity geometric proxies of the scene (block 712 ).
  • the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene, although many applications will not need this level of fidelity.
  • FIG. 8 illustrates an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene (hereafter simply referred to as different color image streams).
  • the calibrated streams of sensor data include a plurality of different color image streams (block 800 , Yes)
  • the following actions occur. Any narrow baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of color image streams are identified (block 802 ).
  • a first set of different depth map image streams is then created from the pairs of color image streams generated by the identified narrow baseline stereo pairs of sensors (block 804 ).
  • any wide baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of color image streams are then identified (block 806 ).
  • a second set of different depth map image streams is then created from the pairs of color image streams generated by the identified wide baseline stereo pairs of sensors (block 808 ).
  • the different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 810 ). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein.
  • the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more types of high fidelity geometric proxies of the scene (block 812 ).
  • the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene.
  • a given sensor can be in a plurality of narrow baseline stereo pairs of sensors, and can also be in a plurality of wide baseline stereo pairs of sensors. This serves to maximize the number of different depth map image streams that are created, which in turn serves to maximize the precision of the scene proxy.
  • this section provides an overview description, in simplified form, of several additional implementations of the capture and processing stages 102 and 104 of the FVV processing pipeline 100 . It will be appreciated that the implementations described in this section are merely exemplary. Many other implementations of the capture and processing stages 102 and 104 are also possible which use other types of sensor arrangements and generate other types of scene proxies.
  • a circular arrangement of eight genlocked sensors is used to capture a scene which includes one or more human beings, where each of the sensors includes a combination of one infrared structured-light projector, two infrared video cameras, and one color camera. Accordingly, the sensors each generate a different stream of video data which includes both a stereo pair of infrared image streams and a color image stream.
  • the pair of infrared image streams and the color image stream generated by each sensor are first used to generate different depth map image streams.
  • the different depth map image streams are then merged into a stream of calibrated point cloud reconstructions of the scene.
  • These point cloud reconstructions are then used to generate a stream of mesh models of the scene.
  • a conventional view-dependent texture mapping method which accurately represents specular textures such as skin is then used to extract texture data from the color image stream generated by each sensor and map this texture data to the stream of mesh models of the scene.
  • four genlocked visible light video cameras are used to capture a scene which includes one or more human beings, where the cameras are evenly placed around the scene. Accordingly, the cameras each generate a different stream of video data which includes a color image stream.
  • An existing 3D geometric model of a human body can be used in the scene proxy as follows. Conventional methods can be used to kinematically articulate the model over time in order to fit (i.e., match) the model to the streams of video data generated by the cameras. The kinematically articulated model can then be colored as follows.
  • a conventional view-dependent texture mapping method can be used to extract texture data from the color image stream generated by each camera and map this texture data to the kinematically articulated model.
  • three unsynchronized visible light video cameras are used to capture a soccer game, where each of the cameras is moving and is located far from the game (e.g., rather than the spatial location of each of the cameras being fixed to a specified arrangement, each of the cameras is hand held by a different user who is capturing the game while they freely move about). Accordingly, the cameras each generate a different stream of video data which includes a stream of color images of the game.
  • Articulated billboards can be used to represent the moving players in the scene proxy of the game as follows. For each stream of video data, conventional methods can be used to generate a segmentation mask for each body part of each player in the stream. Conventional methods can then be used to generate an articulated billboard model of each of the moving players in the game from the appropriate segmentation masks. The articulated billboard model can then be colored as just described.
  • FIG. 9 illustrates an exemplary embodiment, in simplified form, of a process for rendering a FVV.
  • the process starts in block 900 with inputting scene proxies which geometrically describes the scene as a function of time.
  • the scene proxies are then processed to generate a frame reflecting the current synthetic viewpoint of the scene, which maximizes the photo-realism thereof based upon a set of current pipeline conditions (block 902 ).
  • These conditions can be in any one or more of the aforementioned stages of the FVV processing pipeline.
  • FIG. 10 illustrates an exemplary embodiment, in simplified form, of a process for implementing the above-described rendering.
  • the process starts in block 1000 with monitoring and periodically analyzing the set of current pipeline conditions.
  • the set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the particular number of sensors that is being (or was) used to capture the scene, or the particular arrangement of these sensors that is being (or was) used, or one or more particular intrinsic characteristics of each of the sensors (e.g., the sensor type, among others), or one or more particular extrinsic characteristics of each of the sensors (e.g., the positioning of the sensor relative to the scene, and whether the sensor is static or moving, among others), or the complexity and composition of the scene, or whether the scene is relatively static or dynamic, or the like.
  • the set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as the particular 3D reconstruction methods that are being (or were) used to generate the scene proxy,
  • the set of current pipeline conditions can also include one or more conditions in the user viewing experience stage of the FVV processing pipeline such as the particular graphics processing capabilities [features] that are available in the computing device hardware which is being used, or the particular type of display device the rendered FVV frames are being displayed on, or the particular characteristics of the display device (described heretofore), or the particular number of degrees of viewpoint navigation freedom that are being provided to the end user, or whether or not the end user's client computing device includes a natural user interface (and if so, the particular natural user interface modalities that are anticipated to be used by the end user), or the like.
  • the set of current pipeline conditions can also include information which is generated by the end user and provided to the user viewing experience stage that specifies desired changes to (i.e., controls) the current synthetic viewpoint of the scene. Such information can include viewpoint navigation information which is being input by this stage based upon the FVV navigation that is being performed by the end user, or temporal navigation information which may also be input to this stage based upon this FVV navigation.
  • the set of current pipeline conditions can also include the particular type of FVV that is being processed in the pipeline.
  • the results of this analysis are then used to select one or more different image-based rendering methods which are matched to the current pipeline conditions (block 1002 ).
  • the selected image-based rendering methods and the results of the period analysis are then used to generate a frame reflecting the current synthetic viewpoint of the scene (block 1004 ).
  • the actions of blocks 1000 , 1002 and 1004 are repeated for the duration of the scene (block 1006 , No).
  • the one or more image-based rendering methods which are used and frames that are generated change over time based upon changes in the current pipeline conditions.
  • the current pipeline conditions can be analyzed using different periodicities.
  • the exemplary pipeline described herein can use a wide variety of image-based rendering methods in various combinations, where the particular types of image-based rendering methods that are being used depend upon various current conditions in the FVV processing pipeline.
  • the image-based rendering methods that are employed by the pipeline techniques described herein can render novel views (i.e., synthetic viewpoints) of the scene directly from a collection of images in the scene proxy without having to know the scene geometry.
  • An overview exemplary image-based rendering methods which can be employed by the pipeline are provided hereafter.
  • the pipeline supports using any type of display device to view the FVV including, but not limited to, the very small form factor display devices used on conventional smart phones and other types of mobile devices, the small form factor display devices used on conventional tablet computers and netbook computers, the display devices used on conventional laptop computers and personal computers, conventional televisions and 3D televisions, conventional autostereoscopic 3D display devices, conventional head-mounted transparent display devices, and conventional wearable heads-up display devices such as those that are used in virtual reality applications.
  • the rendering stage of the FVV processing pipeline will simultaneously generate both left and right current synthetic viewpoints of the scene at an appropriate aspect ratio and resolution in order to create a stereoscopic effect for the end user.
  • the rendering stage will generate just a single current synthetic viewpoint.
  • the rendering stage may generate a current synthetic viewpoint having just the foreground elements of the captured scene, thus enabling objects to be embedded in a natural environment.
  • the pipeline also supports using any type of user interface modality to control the current viewpoint while viewing the FVV including, but not limited to, conventional keyboards, conventional pointing devices (such as a mouse, or a graphics tablet, or the like), and conventional natural user interface modalities (such as a touch-sensitive display screen, or the head tracking functionality that is integrated into wearable heads-up display devices, or a motion and location sensing device (such as the Microsoft KinectTM, among others).
  • conventional keyboards such as a mouse, or a graphics tablet, or the like
  • conventional natural user interface modalities such as a touch-sensitive display screen, or the head tracking functionality that is integrated into wearable heads-up display devices, or a motion and location sensing device (such as the Microsoft KinectTM, among others.
  • the FVV processing pipeline can process the streams of sensor data differently in order to enable different end user viewing experiences based on the particular type(s) of user interface modality that is anticipated to be used by the end user.
  • the particular type(s) of user interface modality that is anticipated to be used by the end user.
  • all six degrees of viewpoint navigation freedom could be provided to the end user.
  • the pipeline technique embodiments if the end user at each physical location that is participating in a given videoconference/telepresence session is using the wearable heads-up display device to view and navigate the FVV, then parallax functionality can be implemented in order to provide each end user with an optimally realistic viewing experience when they control/change their viewpoint of the FVV using head movements; the pipeline can also provide for corrected conversational geometry between two end users, thus providing the appearance that both end users are looking directly at each other.
  • the rendering stage can optimize the current synthetic viewpoint that is being displayed based on the end user's current spatial location in front of their display device. In this way, the end user's current spatial location can be mapped to the 3D geometry within the FVV.
  • FIG. 11 illustrates the various degrees of viewpoint navigation freedom that can be supported by the pipeline techniques describe herein.
  • the pipeline generally supports spatiotemporal (i.e., space-time) navigation of the FVV.
  • the recorded FVV, unidirectional live FVV, and bidirectional live FVV implementations described herein can each support spatial viewpoint navigation of the FVV having as many as six degrees of freedom, which can be appropriate when the end user is viewing and navigating an FVV that includes high fidelity geometric information.
  • FIG. 11 illustrates the various degrees of viewpoint navigation freedom that can be supported by the pipeline techniques describe herein.
  • the pipeline generally supports spatiotemporal (i.e., space-time) navigation of the FVV.
  • the recorded FVV, unidirectional live FVV, and bidirectional live FVV implementations described herein can each support spatial viewpoint navigation of the FVV having as many as six degrees of freedom, which can be appropriate when the end user is viewing and navigating an FVV that includes high fidelity geometric information.
  • these six degrees of freedom include viewpoint navigation along the x axis, viewpoint navigation rotationally about the x axis (8 ⁇ ), viewpoint navigation along the y axis, viewpoint navigation rotationally about the y axis ( ⁇ y), viewpoint navigation along the z axis, and viewpoint navigation rotationally about the z axis ( ⁇ z).
  • the recorded FVV, unidirectional live FVV, and bidirectional live FVV implementations can also each support spatial viewpoint navigation of the FVV having just one degree of viewpoint navigation freedom.
  • the recorded FVV implementation can also support temporal navigation of the FVV.
  • a producer or editor of the FVV may want to specify the particular types of viewpoint navigation that is possible at different times during the FVV.
  • a movie director may want to confine the end user's viewpoint navigation to a limited area of the scene or a specific axis, but in another scene the director may want to allow the end user to freely navigate their viewpoint throughout the entire area of the scene.
  • the current synthetic viewpoint of the scene is generated using one or more image-based rendering methods which are selected based upon a periodic analysis of the aforementioned set of current pipeline conditions. Accordingly, the particular image-based rendering methods that are used can change over time based upon changes in the current pipeline conditions. It will thus be appreciated that in one situation where the scene has a low degree of complexity and the arrangement of sensors which is being (or was) used to capture the scene are located close to the scene, just a single image-based rendering method may be used to generate the current synthetic viewpoint of the scene.
  • a plurality of image-based rendering methods may be used to generate the current synthetic viewpoint of the scene depending on the location of the current viewpoint relative to the scene and the particular types of geometric proxy data that are in the scene proxy.
  • FIG. 12 illustrates an exemplary embodiment, in simplified form, of a continuum of the various image-based rendering methods which can be employed by the pipeline technique embodiments described herein.
  • these various image-based rendering methods can be classified into three categories according to the amount and type of scene geometry information that is included in the scene proxy and thus is available to be used in the rendering stage, namely rendering with without scene geometry 1206 (i.e., the scene geometry is unknown), rendering with implicit scene geometry 1204 (i.e., correspondence), and rendering with explicit scene geometry 1202 (which can be either approximate or accurate).
  • These categories 1202 , 1204 and 1206 are to be viewed as a continuum 1200 rather than strict and discrete categories since it will be appreciated that certain of the image-based rendering methods defy strict categorization.
  • having less scene geometry information in the scene proxy will generally decrease the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints will generally be limited to positions between sensors or near sensors.
  • the lower the sensor density i.e., the smaller the number of sensors that is used in the arrangement
  • the smaller the number of images that is available in the scene proxy the smaller the number of images that is available in the scene proxy, and thus the more scene geometry information that is needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic.
  • having more scene geometry information in the scene proxy will generally increase the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints can be navigated to positions which are far away from the real sensor viewpoints).
  • the scene proxy includes a large number of images but does not include any scene geometry or correspondence information.
  • a conventional light field method or a conventional lumigraph method, or a conventional concentric mosaics method, among others, can be used to process the scene proxy in order to generate the current synthetic viewpoint of the scene.
  • each of these methods relies on the characterization of the conventional plenoptic function, and constructs a continuous representation of the plenoptic function from the images in the scene proxy.
  • the light field method is generally applicable when the images of the scene are uniformly captured. The light field method generates new images of the scene by appropriately filtering and interpolating the images in the scene proxy.
  • the lumigraph method is similar to the light field method except that the lumigraph method is generally applicable when the images of the scene are not uniformly captured.
  • the lumigraph method enhances the rendering performance by applying approximated geometry to compensate for this non-uniform capture.
  • the concentric mosaics method is applicable when the arrangement of sensors is circular.
  • Conventional image mosaicing methods can also be used to construct a complete plenoptic function at a fixed viewpoint from an incomplete set of images of the scene.
  • the scene proxy does not include explicit scene geometry information, but rather it includes implicit scene geometry information in the form of feature (e.g., point) correspondences between images, where these correspondences can be computed using conventional computer vision methods.
  • various conventional transfer methods can be used to process the scene proxy in order to generate the current synthetic viewpoint of the scene.
  • transfer methods are characterized by the use of a relatively small number of images with the application of geometric constraints (which are either recovered or known a priori) to project image pixels appropriately at a given synthetic viewpoint.
  • the view interpolation method generates synthetic viewpoints of the scene by interpolating optical flow between corresponding points.
  • the view morphing method generates synthetic viewpoints that reside on a line which links the optical centers of two different sensors based on point correspondences.
  • the scene proxy includes explicit and accurate scene geometry information and a small number of images, where this geometry information can be in form of either depth along known lines-of-sight, or 3D coordinates, among other things.
  • this geometry information can be in form of either depth along known lines-of-sight, or 3D coordinates, among other things.
  • conventional 3D warping methods or a conventional layered depth images method, or a conventional layered depth images tree method, or a conventional view-dependent texture mapping method, or a conventional view-dependent geometry method, among others, can be used to process the scene proxy in order to generate the current synthetic viewpoint of the scene.
  • the 3D warping methods can be used when the scene proxy includes both depth map images and color (or monochrome) images of the scene.
  • the 3D warping methods can be used to render the image from any nearby point of view by projecting the pixels of the image to their proper 3D locations and then re-projecting them onto a new picture.
  • the rendering speed of such 3D warping methods can be increased by using conventional relief texture methods which factor the warping process into a relatively simple pre-warping operation and a conventional texture mapping operation (which may be performed by conventional graphics processing hardware).
  • the 3D warping methods can be applied to both traditional perspective images as well as multi-perspective images.
  • the view-dependent geometry method was first used in the context of 3D cartoons and trades off geometry and images, and may be used to represent the current synthetic viewpoint of the scene more compactly.
  • a conventional texture-mapped models method can also be used to generate the current synthetic viewpoint of the scene.
  • each end user interactively navigates their viewpoint of the scene via their client computing device, and each time an end user chooses a different viewpoint, this new viewpoint is provided to the user viewing experience stage by the user's client computing device.
  • each end user has a FVV player operating on their client computing device.
  • the FVV player facilitates the display of FVV related items (e.g., FVV frames or user interface screens), accepts end user inputs, and causes the client computing device to communicate with the FVV user experience stage.
  • FVV related items e.g., FVV frames or user interface screens
  • the player can be initiated by the end user and requested to display a user interface screen that allows the user to select a FVV available for playing (block 1300 ). As noted previously, this can be a live FVV or a previously recorded FVV.
  • the end user FVV selection is then input (block 1302 ), and transmitted via the client computing device to the server upon which the user experience stage is operating (block 1304 ).
  • the user experience stage receives the user selection (block 1306 ), and in response, initiates the FVV pipeline (more particularly, the rendering stage) to produce the selected FVV (block 1308 ), and in block 1310 instructs the client device to instantiate the end user controls in the FVV player appropriate to the FVV type.
  • the client computing device receives the instruction (block 1312 ), and causes a user interface to be displayed that allows the user to interactively control the viewpoint of the FVV scene (block 1314 ). As indicated previously, this viewpoint control can be spatial, temporal, or both, depending on the FVV selected.
  • the rendering stage of the FVV pipeline begins rendering frames of the selected FVV (block 1316 ) and providing them for display via the user experience stage and the end user's client device (block 1318 ).
  • the initial viewpoint of the FVV scene depicted in the initial frames rendered can be a default viewpoint assigned to the FVV if the end user has not yet specified a viewpoint.
  • the end user's client computing device receives them (block 1320 ) and in block 1322 displays them on a resident display device (such as one described previously).
  • This procedure of rendering, providing and displaying of FVV frames at the current viewpoint is repeated at an appropriate frame rate for the FVV, while at the same time the client computing device monitors end user inputs to the FVV player to determine if an end user viewpoint navigation input has been received (block 1324 ). If such an input is detected, the client computing device transmits it to the user experience stage of the FVV pipeline (block 1326 ), which receives it and forwards the new viewpoint to the rendering stage (block 1328 ). The rendering stage then renders FVV frames depicting the scene from the new (now current) viewpoint (block 1330 ) and provides them for display as described previously (block 1332 ). As the FVV frames are provided, the end user's client computing device receives them (block 1334 ) and in block 1336 displays them on a resident display device. This monitoring and rendering procedure is then repeated for the duration of the FVV.
  • the aforementioned scene is first captured by a server using an arrangement of sensors (block 1400 ). These streams of sensor data are input and calibrated (block 1402 ), and then scene proxies are generated from the calibrated streams of sensor data (block 1404 ). The scene proxies are stored as they are generated (block 1406 ).
  • a client computing device monitors navigational inputs from an end user (block 1408 ).
  • a synthetic viewpoint of the scene is input by the end user and the client computing device transmits the current viewpoint input to the server via the data communication network (block 1410 ).
  • a temporal navigation instruction is input by the end user and the client computing device transmits it to the server as well (block 1412 ).
  • This temporal navigation input represents an instruction to provide FVV frames from a user-specified temporal location in the FVV.
  • the current synthetic viewpoint of the scene and the temporal navigation instruction are received from the client computing device (block 1414 ).
  • a sequence of frames is then generated using the scene proxies (block 1416 ).
  • Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, the first of which corresponds to the last user-specified temporal location in the FVV.
  • the generated frames are then transmitted to the client computing device via the data communication network (block 1418 ).
  • the client computing device receives the frames (block 1420 ), and displays them in a conventional manner to the end user (block 1422 ).
  • the FVV can be played in reverse, thus rewinding the FVV while still allowing the end user to watch.
  • the aforementioned scene is captured by a server using an arrangement of sensors (block 1500 ). These streams of sensor data are input and calibrated (block 1502 ), and then scene proxies are generated from the calibrated streams of sensor data (block 1504 ). The scene proxies are stored as they are generated (block 1506 ).
  • a client computing device monitors navigational inputs from an end user (block 1508 ). A synthetic viewpoint of the scene is input by the end user and the client computing device transmits the current viewpoint input to the server via the data communication network (block 1510 ).
  • a reverse-action temporal navigation instruction is input by the end user and the client computing device transmits it to the server as well (block 1512 ).
  • This reverse-action temporal navigation input represents an instruction to provide FVV frames in reverse order from a specified temporal location in the FVV, thereby rewinding the FVV.
  • the current synthetic viewpoint of the scene and the reverse-action temporal navigation instruction are received from the client computing device (block 1514 ).
  • a sequence of frames is then generated using the scene proxies (block 1516 ). Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, and are generated in reverse order, the first of which corresponds to the last user-specified temporal location in the FVV.
  • the generated frames are then transmitted to the client computing device via the data communication network (block 1518 ).
  • the client computing device receives the frames (block 1520 ), and displays them in a conventional manner to the end user (block 1522 ).
  • the FVV can be paused and restarted by the end user. More particularly, referring to FIG. 16 , in one embodiment, as the aforementioned FVV frames are received by the client computing device and displayed to the end user, the client computing device monitors inputs from the end user (block 1600 ). A pause instruction is input by the end user and the client computing device transmits it to the server via the data communication network (block 1602 ). The pause instruction is received from the client computing device (block 1604 ). The server then suspends the generation and transmission of FVV frames to the client computing device (block 1606 ). While the FVV is paused, the client computing device continues to monitor inputs from the end user (block 1608 ).
  • a restart instruction is input by the end user and the client computing device transmits it to the server via the data communication network (block 1610 ).
  • the restart instruction is received from the client computing device (block 1612 ).
  • the server then restarts the generation and transmission of FVV frames to the client computing device (block 1614 ).
  • the client computing device receives the frames (block 1616 ), and displays them in a conventional manner to the end user (block 1618 ).
  • each frame transmitted to a client computer is also accompanied with at least some of the scene proxies used by the renderer to generate the frame. This allows the client device to locally generate a new frame of the depicted scene from a different viewpoint in the same manner the renderer produces frames when a new viewpoint is requested by the end user (as described previously).
  • the client computing device whenever a same-frame end user viewpoint navigation input is received via the aforementioned FVV control user interface which represents an instruction to view a scene depicted in the last-displayed FVV frame from a different viewpoint, the client computing device generates a new FVV frame using the scene proxy or proxies received with the last-displayed frame, and displays the new FVV frame on the aforementioned display device.
  • This new FVV frame depicts the scene depicted in the last-displayed FVV frame from a viewpoint specified in the same-frame end user viewpoint navigation input.
  • the frame transmitted to a client computing device would depict all or a larger portion of the captured scene, than the display device associated with the client computing device is capable of displaying.
  • the display device associated with the client computing device is capable of displaying.
  • a “same-frame” end user viewpoint navigation input is received via the FVV control user interface which represents an instruction to view a portion of the scene depicted in the last-received FVV frame that was not shown in the last-displayed portion of the frame
  • at least the portion of the scene depicted in the last-received FVV frame specified in the same-frame end user viewpoint navigation input is displayed on the display device.
  • Still another additional embodiment involves the FVV pipeline, and more particularly, the rendering stage predicting the next new viewpoint to be requested. For example, this can be accomplished based on past viewpoint change requests received from a end user.
  • the rendering stage then renders and stores a new frame (or a sequence of frames) from the predicted viewpoint, and provides it the client computing device of the end-user if that end user requests the predicted viewpoint. It is further noted that the rendering stage could render multiple frames based on multiple predictions of what viewpoint the end user might request next. Then, if the end user's next viewpoint request matches one of the rendered frames, that frame is sent to the end user's client computing device. More particularly, referring to FIGS. 17A-B , in one embodiment, the foregoing procedure is accomplished as follows.
  • the server predicts one or more synthetic viewpoints of the scene that may be received from a client computing device in the future (block 1700 ).
  • a previously unselected one of the predicted synthetic viewpoints of the scene is then selected (block 1702 ), and one or more frames are generated using the aforementioned stored scene proxies which depict at least a portion of the scene as viewed from the selected predicted synthetic viewpoint of the scene (block 1704 ).
  • the generated frame or frames are then stored (block 1706 ). It is then determined if all the predicted synthetic viewpoints have been selected (block 1708 ). If not, blocks 1702 through 1708 are repeated.
  • the incoming messages from the client computing device are monitored (block 1710 ), and when a message is received it is determined if the message includes a current synthetic viewpoint of the scene that matches one of the predicted synthetic viewpoints (block 1712 ). If the message includes a current synthetic viewpoint of the scene that matches one of the predicted synthetic viewpoints, then each frame generated based on the matched predicted synthetic viewpoint of the scene is transmitted to the client computing device via the data communication network for display to the end user of the client computing device (block 1714 ). Then, the previously described process (as described in connection with FIG. 2 ) resumes. This is also the case if the message does not include a current synthetic viewpoint that matches one of the predicted synthetic viewpoints.
  • a sequence of frames is generated using the scene proxies where each frame depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene (block 1716 ), and each frame is transmitted to the client computing device via the data communication network for display to the end user (block 1718 ).
  • FIG. 18 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the pipeline technique, as described herein, may be implemented. It is noted that any boxes that are represented by broken or dashed lines in FIG. 18 represent alternate embodiments of the simplified computing device, and that any or all of these alternate embodiments, as described below, may be used in combination with other alternate embodiments that are described throughout this document.
  • FIG. 18 shows a general system diagram showing a simplified computing device 10 .
  • Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers (PCs), server computers, handheld computing devices, laptop or mobile computers, communications devices such as cell phones and personal digital assistants (PDAs), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and audio or video media players.
  • PCs personal computers
  • server computers handheld computing devices
  • laptop or mobile computers communications devices such as cell phones and personal digital assistants (PDAs), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and audio or video media players.
  • PDAs personal digital assistants
  • the device should have a sufficient computational capability and system memory to enable basic computational operations.
  • the computational capability is generally illustrated by one or more processing unit(s) 12 , and may also include one or more graphics processing units (GPUs) 14 , either or both in communication with system memory 16 .
  • the processing unit(s) 12 may be specialized microprocessors (such as a digital signal processor (DSP), a very long instruction word (VLIW) processor, an FPGA or other micro-controller) or can be conventional central processing units (CPUs) having one or more processing cores including, but not limited to, specialized GPU-based cores in a multi-core CPU.
  • DSP digital signal processor
  • VLIW very long instruction word
  • FPGA field-programmable gate array
  • the simplified computing device 10 of FIG. 18 may also include other components, such as, for example, a communications interface 18 .
  • the simplified computing device 10 of FIG. 18 may also include one or more conventional computer input devices 20 (e.g., pointing devices, keyboards, audio input/capture devices, video input/capture devices, haptic input devices, devices for receiving wired or wireless data transmissions, and the like).
  • the simplified computing device 10 of FIG. 18 may also include other components, such as, for example, display device(s) 24 , and one or more conventional computer output devices 22 (e.g., audio output devices, video output devices, devices for transmitting wired or wireless data transmissions, and the like).
  • Exemplary types of input devices (herein also referred to as user interface modalities) and display devices that are operable with the pipeline technique embodiments described herein have been described heretofore.
  • typical communications interfaces 18 additional types of input and output devices 20 and 22 , and storage devices 26 for general-purpose computers are well known to those skilled in the art, and will not be described in detail herein.
  • the simplified computing device 10 of FIG. 18 may also include a variety of computer readable media.
  • Computer readable media can be any available media that can be accessed by the computer 10 via storage devices 26 , and includes both volatile and nonvolatile media that is either removable 28 and/or non-removable 30 , for storage of information such as computer-readable or computer-executable instructions, data structures, program modules, or other data.
  • Computer readable media may include computer storage media and communication media.
  • Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as digital versatile disks (DVDs), compact discs (CDs), floppy disks, tape drives, hard drives, optical drives, solid state memory devices, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.
  • DVDs digital versatile disks
  • CDs compact discs
  • floppy disks tape drives
  • hard drives optical drives
  • solid state memory devices random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing
  • modulated data signal or “carrier wave” generally refer a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves.
  • wired media such as a wired network or direct-wired connection carrying one or more modulated data signals
  • wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves.
  • software, programs, and/or computer program products embodying the some or all of the various embodiments of the pipeline technique described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
  • cloud based FVV streaming technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device.
  • program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types.
  • the cloud based FVV streaming technique embodiments may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks.
  • program modules may be located in both local and remote computer storage media including media storage devices.
  • the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Generation (AREA)
  • Studio Devices (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Cloud based FVV streaming technique embodiments presented herein generally employ a cloud based FVV pipeline to create, render and transmit FVV frames depicting a captured scene as would be viewed from a current synthetic viewpoint selected by an end user and received from a client computing device. The FVV frames use a similar level of bandwidth as a conventional streaming movie would consume. To change viewpoints, a new viewpoint is sent from the client to the cloud, and a new streaming movie is initiated from the new viewpoint. Frames associated with that viewpoint are created, rendered and transmitted to the client until a new viewpoint request is received.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This application claims the benefit of and priority to provisional U.S. patent application Ser. No. 61/653,983 filed May 31, 2012.
  • BACKGROUND
  • A given video generally includes one or more scenes, where each scene in the video can be either relatively static (e.g., the objects in the scene do not substantially change or move over time) or dynamic (e.g., the objects in the scene substantially change and/or move over time). In a traditional video the viewpoint of each scene is chosen by the director when the video is recorded/captured and this viewpoint cannot be controlled or changed by an end user while they are viewing the video. In other words, in a traditional video the viewpoint of each scene is fixed and cannot be modified when the video is being rendered and displayed. In a free viewpoint video (FVV) an end user can interactively control and change their viewpoint of each scene at will while they are viewing the video. In other words, in a FVV each end user can interactively generate synthetic (i.e., virtual) viewpoints of each scene on-the-fly while the video is being rendered and displayed. This creates a feeling of immersion for any end user who is viewing a rendering of the captured scene, thus enhancing their viewing experience.
  • SUMMARY
  • Cloud based free viewpoint video (FVV) streaming technique embodiments described herein generally involve generating a FVV that provides a consistent and manageable amount of data to a client despite the large amounts of data typically demanded to create and render the FVV. In one general embodiment, this is accomplished by first capturing a scene using an arrangement of sensors. This sensor arrangement includes a plurality of sensors that generate a plurality of streams of sensor data, where each stream represents the scene from a different geometric perspective. These streams of sensor data are input and calibrated, and then scene proxies are generated from the calibrated streams of sensor data. The scene proxies geometrically describe the scene as a function of time. Next, a current synthetic viewpoint of the scene is received from a client computing device via a data communication network. This current synthetic viewpoint was selected by an end user of the client computing device. Once a current synthetic viewpoint is received, a sequence of frames is generated using the scene proxies. Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, and is transmitted to the client computing device via the data communication network for display to the end user of the client computing device.
  • From the perspective of a client computing device, a FVV produced as described above is played in one general embodiment as follows. A request is received from an end user to display a FVV selection user interface screen that allows the end user to select a FVV available for playing. This FVV selection user interface screen is displayed on a display device, and an end user FVV selection is input. The end user FVV selection is then transmitted to a server via a data communication network. The client computing device then receives an instruction from the server via the data communication network to instantiate end user controls appropriate for the type of FVV selected. In response, an appropriate FVV control user interface is provided to the end user. The client computing device then monitors end user inputs via the FVV control user interface, and whenever an end user viewpoint navigation input is received, it is transmitted to the server via the data communication network. FVV frames are then received from the server. Each FVV frame depicts at least a portion of the captured scene as it would be viewed from the last viewpoint the end user input, and is displayed on the aforementioned display device as it is received.
  • It is noted that this Summary is provided to introduce a selection of concepts, in a simplified form, that are further described hereafter in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • DESCRIPTION OF THE DRAWINGS
  • The specific features, aspects, and advantages of the free viewpoint video processing pipeline technique embodiments described herein will become better understood with regard to the following description, appended claims, and accompanying drawings where:
  • FIG. 1 is a diagram illustrating an exemplary embodiment, in simplified form, of the various stages in a free viewpoint video (FVV) pipeline.
  • FIG. 2 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for generating a FVV of a scene.
  • FIG. 3 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for capturing and inputting scene data.
  • FIG. 4 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by an arrangement of sensors that is being used to capture the scene.
  • FIG. 5 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for processing the calibrated streams of sensor data to generate scene proxies which geometrically describes the scene as a function of time.
  • FIG. 6 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using a point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene.
  • FIG. 7 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene.
  • FIG. 8 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene.
  • FIG. 9 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for processing the scene proxies to render a FVV.
  • FIG. 10 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for implementing the rendering of FIG. 10.
  • FIG. 11 is a diagram illustrating the various degrees of viewpoint navigation freedom that can be supported by the FVV pipeline.
  • FIG. 12 is a diagram illustrating an exemplary embodiment, in simplified form, of a continuum of the various image-based rendering methods which can be employed by the FVV pipeline.
  • FIGS. 13A-B are a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user.
  • FIG. 14 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user, where the end user can temporally control the playback of the FVV.
  • FIG. 15 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user, where the end user can temporally control the playback of the FVV to play it in reverse.
  • FIG. 16 is a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for presenting a FVV to an end user, where the end user can temporally control the playback to pause and restart the FVV.
  • FIGS. 17A-B are a flow diagram illustrating an exemplary embodiment, in simplified form, of a process for rendering a FVV that involves predicting the next new viewpoint to be requested by the end user.
  • FIG. 18 is a diagram illustrating a simplified example of a general-purpose computer system on which various embodiments and elements of the cloud based FVV streaming technique, as described herein, may be implemented.
  • DETAILED DESCRIPTION
  • In the following description of cloud based free viewpoint video (FVV) streaming technique embodiments (hereafter sometimes simply referred to as streaming technique embodiments) reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the streaming technique can be practiced. It is understood that other embodiments can be utilized and structural changes can be made without departing from the scope of the streaming technique.
  • It is also noted that for the sake of clarity specific terminology will be resorted to in describing the streaming technique embodiments described herein and it is not intended for these embodiments to be limited to the specific terms so chosen. Furthermore, it is to be understood that each specific term includes all its technical equivalents that operate in a broadly similar manner to achieve a similar purpose. Reference herein to “one embodiment”, or “another embodiment”, or an “exemplary embodiment”, or an “alternate embodiment”, or “one implementation”, or “another implementation”, or an “exemplary implementation”, or an “alternate implementation” means that a particular feature, a particular structure, or particular characteristics described in connection with the embodiment or implementation can be included in at least one embodiment of the streaming technique. The appearances of the phrases “in one embodiment”, “in another embodiment”, “in an exemplary embodiment”, “in an alternate embodiment”, “in one implementation”, “in another implementation”, “in an exemplary implementation”, “in an alternate implementation” in various places in the specification are not necessarily all referring to the same embodiment or implementation, nor are separate or alternative embodiments/implementations mutually exclusive of other embodiments/implementations. Yet furthermore, the order of process flow representing one or more embodiments or implementations of the streaming technique does not inherently indicate any particular order nor imply any limitations of the streaming technique.
  • The term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene. Generally speaking and as will be described in more detail hereafter, the streaming technique embodiments described herein employ a plurality of sensors which can be configured in various arrangements to capture a scene, thus allowing a plurality of streams of sensor data to be generated each of which represents the scene from a different geometric perspective. Each of the sensors can be any type of video capture device (e.g., any type of video camera), or any type of audio capture device, or any combination thereof. Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time). The streaming technique embodiments described herein can employ a combination of different types of sensors to capture a given scene.
  • The term “baseline” is used herein to refer to a ratio of the actual physical distance between a given pair of VCDs to the average of the actual physical distance from each sensor in the pair to the viewpoint of the scene. When this ratio is larger than a prescribed value the pair of sensors is referred to herein as a “wide baseline stereo pair of sensors”. When this ratio is smaller than the prescribed value the pair of sensors is referred to herein as a “narrow baseline stereo pair of sensors”.
  • The term “server” is used herein to refer to one or more server computing devices operating in a cloud infrastructure so as to provide FVV services to a client computer over a data communication network.
  • 1.0 Cloud Based Free Viewpoint Video (FVV) Streaming
  • The creation and playback of FVV involves working with a substantial amount of data. Firstly, a scene is simultaneously recorded from many different perspectives using sensors such as RGB cameras. Second, this data is processed to extract three dimensional (3D) geometric information in the form of scene proxies using, for example, 3D Reconstruction (3DR) algorithms. Finally, the original data and geometric proxies is recombined during rendering, using Image Based rendering (IBR) algorithms to generate synthetic viewpoints.
  • Moreover, the amount of data may vary considerably from one FVV to another FVV due to difference in the number of sensors used to record the scene, the length of the FVV, the type of 3DR algorithms used to process the data, and the type of IBR algorithm used to generate synthetic views of the scene. In addition, there exists a wide variety of different combinations of both bandwidth and local processing power that can be used for viewing at a client.
  • One way in which a FVV can be transferred from a server to a client over a data communication network (such as the Internet or a proprietary intranet) is to combine the 3D geometry and other data for a specific viewpoint to produce a single image or video frame on the server, and then to transmit this frame from the server to the client. The frame is then displayed by the client in a normal manner. This pre-computed frame transmission approach has the advantage of providing a consistent and manageable amount of data to a client despite the large amounts of data demanded to create and render a FVV, and the fact that the amount of data can be constantly changing. In other words, the FVV data stays on a server (or servers) in the cloud, and even clients with limited processing power and/or limited available bandwidth can receive and display a FVV. More particularly, an advantage of creating a cloud based streaming FVV that represents what would be seen from a specific viewpoint is that FVV's can be commercially deployed to end users and use a similar level of bandwidth as a conventional streaming movie would consume. This approach will be referred to herein as cloud based FVV streaming.
  • To change viewpoints, a new (typically user specified) viewpoint is sent from the client to the server, and a new stream of video data is initiated from the new viewpoint. Frames associated with that viewpoint are created, rendered and transmitted to the client until a new viewpoint request is received.
  • Cloud based FVV streaming technique embodiments described herein generally employ a cloud based FVV pipeline to create, render and transmit FVV frames depicting a captured scene as would be viewed from a current synthetic viewpoint received from a client. An exemplary FVV pipeline will be now be described. It is noted, however, that cloud based FVV streaming technique embodiments described herein are not limited to only the exemplary FVV pipeline to be described. Rather, other FVV pipelines can also be employed to create and render video frames in response to a viewpoint request, as desired.
  • 1.1 Exemplary FVV Pipeline for Cloud Based FVV Streaming
  • Generally speaking, the exemplary FVV pipeline described here involves generating an FVV of a given scene and presenting the FVV to one or more end users. Generally speaking and as will be appreciated from the more detailed description that follows, the exemplary FVV pipeline enables optimal viewpoint navigation for up to six degrees of viewpoint navigation freedom. Furthermore, this exemplary FVV pipeline does not rely upon having to constrain the pipeline in order to produce a desired visual result. In other words, the pipeline eliminates the need to place constraints in order to generate various synthetic viewpoints of the scene which are photo-realistic and thus are free of discernible artifacts. More particularly and by way of example but not limitation, the pipeline eliminates having to constrain the arrangement of the sensors that are used to capture the scene. Accordingly, the pipeline is operational with any arrangement of sensors. For example, the pipeline eliminates having to constrain the number or types of sensors that are used to capture the scene. Accordingly, the pipeline is operational with any number of sensors and all types of sensors. The pipeline also eliminates having to constrain the number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene. Accordingly, the pipeline can produce visual results having as many as six degrees of viewpoint navigation freedom. Further, the pipeline eliminates having to constrain the complexity or composition of the scene that is being captured (e.g., neither the environment(s) in the scene, nor the types of objects in the scene, nor the number of people of in the scene, among other things has to be constrained). Accordingly, the pipeline is operational with any type of scene, including both relatively static and dynamic scenes.
  • Yet further, the pipeline does not rely upon having to use a specific 3D reconstruction method to generate a 3D reconstruction of the captured scene. Accordingly, the pipeline supports the use of any one or more 3D reconstruction methods and therefore provides the freedom to use whatever 3D reconstruction method(s) produces the desired visual result (e.g., the highest degree of photo-realism for the particular scene being captured and the desired number of degrees of viewpoint navigation freedom) based on the particular characteristics of the streams of sensor data that are generated by the sensors (e.g., based on factors such as the particular number and types of sensors that are used to capture the scene, and the particular arrangement of these sensors that is used), along with other current pipeline conditions.
  • The exemplary pipeline also does not rely upon having to use a specific image-based rendering method during the rendering of a frame of the captured scene. Accordingly, the pipeline supports the use of any image-based rendering method and therefore provides the freedom to use whatever image-based rendering method(s) produces the desired visual result based on the particular characteristics of the streams of sensor data that are generated by the sensors, along with other current pipeline conditions. By way of example but not limitation, in an exemplary situation where just two video capture devices are used to capture a scene, an image-based rendering method that renders a lower fidelity 3D geometric proxy of the captured scene may produce an optimally photo-realistic visual result when the end user's viewpoint is close to the axis of one of the video capture devices (such as with billboards). In another exemplary situation where 36 video capture devices configured in a circular arrangement are used to capture a scene, a conventional image warping/morphing image-based rendering method may produce an optimally photo-realistic visual result. In yet another exemplary situation where 96 video capture devices configured in either a 2D (two-dimensional) or 3D array arrangement are used to capture a scene, a conventional view interpolation image-based rendering method may produce an optimally photo-realistic visual result. In yet another exemplary situation where an even larger number of video capture devices is used, a conventional lumigraph or light field image-based rendering method may produce an optimally photo-realistic visual result.
  • It will thus be appreciated that the exemplary pipeline results in a flexible, robust and commercially viable next generation FVV processing pipeline that meets the needs of today's various creative video producers and editors. By way of example but not limitation and as will be appreciated from the more detailed description that follows, the pipeline is applicable to various types of video-based media applications such as consumer entertainment (e.g., movies, television shows, and the like) and videoconference/telepresence, among others. The pipeline can support a broad range of features that provide for the capture, processing, storage, rendering and distribution of any type of FVV that can be generated. Various implementations of the pipeline are possible, where each different implementation supports a different type of FVV. Exemplary types of supported FVV are described in more detail hereafter.
  • Additionally, the pipeline allows any one or more parameters to be freely modified without introducing artifacts into the FVV. This allows the photo-realism of the FVV that is presented to each end user to be maximized (i.e., the artifacts are minimized) regardless of the characteristics of the various sensors that are used to capture the scene, and the characteristics of the various streams of sensor data that are generated by the sensors. Exemplary pipeline parameters which can be modified include, but are not limited to, the following. The number and types of sensors that are used to capture the scene can be modified. The arrangement of the sensors can also be modified. Which if any of the sensors is static and which is moving can also be modified. The complexity and composition of the scene can also be modified. Whether the scene is relatively static or dynamic can also be modified. The 3D reconstruction methods and image-based rendering methods that are used can also be modified. The number of degrees of viewpoint navigation freedom that are provided during the rendering and end user viewing of the captured scene can also be modified.
  • 1.1.1 Pipeline Stages Overview and Exemplary Process
  • FIG. 1 illustrates an exemplary embodiment, in simplified form, of the various stages in a FVV processing pipeline. As exemplified in FIG. 1, the FVV processing pipeline 100 starts with a capture stage 102 during which, and generally speaking, the following actions take place. An arrangement of sensors is used to capture a given scene, where the arrangement includes a plurality of sensors and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. These streams of sensor data are input from the sensors and calibrated in a manner which will be described in more detail hereafter. The calibrated streams of sensor data are then output to a processing stage 104. As described heretofore, the exemplary pipeline supports the use of various types, various numbers and various combinations of sensors which can be configured in various arrangements, including both 2D and 3D arrangements, where each of the sensors can be either static or moving.
  • Referring again to FIG. 1, the processing stage 104 of the FVV processing pipeline 100 inputs the calibrated streams of sensor data and processes them to generate scene proxies which geometrically describes the captured scene as a function of time. The scene proxies are then output to a storage stage 106 where they are stored in a memory via conventional means. As will be described in more detail hereafter, the scene proxies are generated using one or more different 3D reconstruction methods which extract 3D geometric information from the calibrated streams of sensor data. The particular 3D reconstruction methods that are used and the particular manner in which each scene proxy is generated are determined based on a periodic analysis of a set of current conditions.
  • Referring again to FIG. 1, a rendering stage 108 of the FVV processing pipeline 100 inputs scene proxies from the storage stage 106 and processes them to generate frames of the captured scene as would be viewed from a current synthetic viewpoint. The rendering is designed to maximize the photo-realism of the frame produced. Generally speaking, the aforementioned frames are generated using one or more different image-based rendering methods. The particular image-based rendering methods that are used and the particular manner in which each frame is generated are determined based on a periodic analysis of the set of current conditions. The frames of the captured scene are then output to a user viewing experience stage 110.
  • Referring again to FIG. 1, the user viewing experience stage 110 of the FVV processing pipeline 100 generally provides one or more end users with the ability to view the frames of the captured scene, as the scene would be viewed from a current synthetic viewpoint, on a display device and to navigate/control this viewpoint on-the-fly at will. In other words, the user viewing experience stage 110 provides each end user with the ability to continuously and interactively navigate/control their viewpoint of the scene that is being displayed on their display device. The user viewing experience stage 110 may also provide each end user with the ability to interact temporally with the FVV at will, as will be explained in more detail later.
  • More particularly, each end user can interactively navigate their viewpoint of the scene via a client computing device 112 associated with that user. Each time an end user chooses a different viewpoint, this new viewpoint is provided to the user viewing experience stage 110 by the user's client computing device 112 via a data communication network 114 that the end user's client computing devices is connected to (such as the Internet or a proprietary intranet). This transfer of the new viewpoint is done in a conventional manner consistent with the network being employed. The user viewing experience stage 110 receives and forwards the new viewpoint to the rendering stage 108, which will modify the current synthetic viewpoint of the scene accordingly and produce frames of the captured scene as would be viewed from the new synthetic viewpoint. These frames are then provided in turn to the user viewing experience stage 110. The user viewing experience stage 110 receives each frame and provides it the user's client computing device 112 via the aforementioned network 114. This transfer of the frames is also done in a conventional manner consistent with the network being employed.
  • In situations where FVVs have been stored for future playback, each end user can also interact temporally to control the playback of the FVV, and based on this temporal control the rendering stage 108 will provide FVV frames starting with the frame that corresponds to the last user-specified temporal location in the FVV.
  • The foregoing FVV processing pipeline can be employed by the cloud based FVV streaming technique embodiments described herein to generate a free viewpoint video (FVV) of a scene. More particularly, in one general implementation outlined in FIG. 2, a computer (such as any of the computing devices described in the Exemplary Operating Environments to follow) is used to perform the following actions. First, the aforementioned scene is captured using an arrangement of sensors (block 200). This sensor arrangement includes a plurality of sensors that generate a plurality of streams of sensor data, where each stream represents the scene from a different geometric perspective. These streams of sensor data are input and calibrated (block 202), and then scene proxies are generated from the calibrated streams of sensor data (block 204). The scene proxies geometrically describe the scene as a function of time. Next, a current synthetic viewpoint of the scene is received from a client computing device via a data communication network (block 206). It is noted that this current synthetic viewpoint was selected by an end user of the client computing device. Once a current synthetic viewpoint is received, a sequence of frames is generated using the scene proxies (block 208). Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, and in block 210 is transmitted to the client computing device via the data communication network for display to the end user of the client computing device.
  • It is noted, however, that in situations where a current synthetic viewpoint is not selected by the end user prior to playing a FVV, the performance of blocks 208 and 210 are deferred until the viewpoint is selected, and in the meantime, a sequence of frames is generated using the scene proxies, where each frame depicts at least a portion of the scene as viewed from a prescribed default viewpoint of the scene. These frames are transmitted in turn to the client computing device via the data communication network for display to the end user of the client computing device.
  • It is further noted that, as indicated previously, in one embodiment the scene proxies are stored as they are generated. This feature can allow complete FVVs to be recorded for playback at a future time.
  • 1.1.2 Supported FVV Types
  • As noted heretofore, various implementations of the pipeline are possible, where each different implementation supports a different type of FVV and a different user viewing experience. As will now be described in more detail, each of these different implementations differs in terms of the user viewing experience it provides, its latency characteristics (i.e., how rapidly the streams of sensor data have to be processed through the FVV processing pipeline), its storage characteristics, and the types of computing device hardware it necessitates.
  • Referring again to FIG. 1, one implementation of the pipeline supports non-live FVVs, which corresponds to a situation where the streams of sensor data that are generated by the sensors are captured and processed for future playback. As such, there is effectively an unlimited amount of time available for the processing stage 104. This allows an FVV producer to optionally manually “touch-up” the streams of sensor data that are input during the capture stage 102, and also optionally manually remove any 3D reconstruction artifacts that are introduced in the processing stage 104. This particular implementation is referred to hereafter as the “recorded FVV implementation”. Exemplary types of video-based media that work well in the recorded FVV implementation include movies, documentaries, sitcoms and other types of television shows, music videos, digital memories, and the like. Another exemplary type of video-based media that works well in the recorded FVV implementation is the use of special effects technology where synthetic objects are realistically modeled, lit, shaded and added to a pre-captured scene.
  • Referring again to FIG. 1, another implementation of the pipeline supports unidirectional (i.e., one-way) live FVV, which corresponds to a situation where the streams of sensor data that are being generated by the sensors are concurrently captured 102 and processed 104; and the resulting 3D scene proxy of the captured scene is stored and rendered into frames, and then transmitted in a one-to-many manner on-the-fly (i.e., live) to one or more end users. As such, each end user can view the scene live (i.e., each user can view the scene at substantially the same time it is being captured 102). This particular implementation is referred to hereafter as the “unidirectional live FVV implementation”. Exemplary types of video-based media that work well in the unidirectional live FVV implementation include sporting events, news programs, live concerts, and the like.
  • Referring again to FIG. 1, yet another implementation of the supports bidirectional (i.e., two-way) live FVV such as that which is associated with various videoconferencing/telepresence applications. This particular implementation is referred to hereafter as the “bidirectional live FVV implementation”. This bidirectional live FVV implementation is generally the same as the unidirectional live FVV implementation with the following exception. In the bidirectional live FVV implementation a computing device at each physical location that is participating in a given videoconferencing/telepresence session is able to concurrently capture streams of sensor data that are being generated by sensors which are capturing a local scene and processing these locally captured streams of sensor data into scene proxies, storing the proxies, render the proxies into frames, and transmitting the resulting FVV frames of the local scene in a one-to-many manner on the fly to the other physical locations that are participating in the session.
  • Referring again to FIG. 1, it will be appreciated that in the unidirectional and bidirectional live FVV implementations of the pipeline, in order for an end user to be able to view the scene live the capture, processing, storing, rendering and frame distribution have to be completed within a prescribed very short period of time. In an exemplary embodiment of the pipeline this period of time is one thirtieth of a second per frame.
  • 1.1.3 FVV Capture and Processing Stages
  • This section provides a more detailed description of the capture and processing stages of the FVV processing pipeline. The exemplary pipeline generally employs a plurality of sensors which are configured in a prescribed arrangement to capture a given scene. The pipeline is operable with any type of sensor, any number (two or greater) of sensors, any arrangement of sensors (where this arrangement can include a plurality of different geometries and different geometric relationships between the sensors), and any combination of different types of sensors. The pipeline is also operable with both static and moving sensors. A given sensor can be any type of video capture device (examples of which are described in more detail hereafter), or any type of audio capture device (such as a microphone, or the like), or any combination thereof. Each video capture device generates a stream of video data which includes a stream of images (also known as and referred to herein as “frames”) of the scene from the specific geometric perspective of the video capture device. Similarly, each audio capture device generates a stream of audio data representing the audio emanating from the scene from the specific geometric perspective of the audio capture device.
  • Exemplary types of video capture devices that can be employed include, but are not limited to, the following. A given video capture device can be a conventional visible light video camera which generates a stream of video data that includes a stream of color images of the scene. A given video capture device can also be a conventional light-field camera (also known as a “plenoptic camera”) which generates a stream of video data that includes a stream of color light field images of the scene. A given video capture device can also be a conventional infrared structured-light projector combined with a conventional infrared video camera that is matched to the projector, where this projector/camera combination generates a stream of video data that includes a stream of infrared images of the scene. This projector/camera combination is also known as a “structured-light 3D scanner”. A given video capture device can also be a conventional monochromatic camera which generates a stream of video data that includes a stream of monochrome images of the scene. A given video capture device can also be a conventional time-of-flight camera which generates a stream of video data that includes both a stream of depth map images of the scene and a stream of color images of the scene. For simplicity sake, the term “color camera” is sometimes used herein to refer to any type of video capture device that generates color images of the scene.
  • It will be appreciated that variability in factors such as the composition and complexity of a given scene, and each end user's viewpoint navigation, among other factors, can impact the determination of how many sensors to use to capture the scene, the particular type(s) of sensors to use, and the particular arrangement of the sensors. The exemplary pipeline generally employs a minimum of one sensor which generates color image data for the scene, along with one or more other sensors that can be used in combination to generate 3D geometry data for the scene. In situations where an outdoor scene is being captured or the sensors are located far from the scene, it is advantageous to capture the scene using both a wide baseline stereo pair of color cameras and a narrow baseline stereo pair of color cameras. In situations where an indoor scene is being captured, it is advantageous to capture the scene using a narrow baseline stereo pair of sensors both of which generate video data that includes a stream of infrared images of the scene in order to eliminate the dependency on scene lighting variables.
  • Generally speaking, it is advantageous to increase the number of sensors being used as the complexity of the scene increases. In other words, as the scene becomes more complex (e.g., as additional people are added to the scene), the use of additional sensors serves to reduce the number of occluded areas within the scene. It may also be advantageous to capture the entire scene using a given arrangement of static sensors, and at the same time also capture a specific higher complexity region of the scene using one or more additional moving sensors. In a situation where a large number of sensors is used to capture a complex scene, different combinations of the sensors can be used during the processing stage of the FVV processing pipeline (e.g., a situation where a specific sensor is part of both a narrow baseline stereo pair and a different wide baseline stereo pair involving a third sensor).
  • FIG. 3 illustrates an exemplary embodiment, in simplified form, of a process for capturing and inputting scene data. As exemplified in FIG. 3, the process starts in block 300 with using an arrangement of sensors to capture the scene, where the arrangement includes a plurality of sensors and generates a plurality of streams of sensor data each of which represents the scene from a different geometric perspective. The streams of sensor data are then input (block 302). It will be appreciated that a given stream of sensor data will include video data whenever the sensor that generated the stream is a video capture device. A given stream of sensor data will include audio data whenever the sensor that generated the stream is an audio capture device. A given stream of sensor data will include both video and audio data whenever the sensor that generated the stream is a combined video and audio capture device.
  • FIG. 4 illustrates an exemplary embodiment, in simplified form, of a process for calibrating the streams of sensor data which are generated by the arrangement of sensors. As exemplified in FIG. 4, the process starts in block 400 with determining the number of sensors in the arrangement of sensors that is being used to capture the scene. Intrinsic characteristics of each of the sensors are then determined (block 402). Exemplary intrinsic characteristics which can be determined for a given sensor include the sensor type, the sensor's frame rate, the sensor's shutter speed, the sensor's mosaic pattern, the sensor's white balance, the bit depth and pixel resolution of the images that are generated by the sensor, the focal length of the VCD's lens, the principal point of the VCD's lens, the VCD's skew coefficient, the distortions of the VCD's lens, the VCD's field of view, among others. It will be appreciated that knowing such intrinsic characteristics for each of the sensors allows the FVV processing pipeline to understand the governing physics and optics of each of the sensors. Extrinsic characteristics of each of the sensors at each point in time during the capture of the scene are also determined (block 404). Exemplary extrinsic characteristics which can be determined for a given sensor include the sensor's current rotational orientation (i.e., the direction that the sensor is currently pointing), the sensor's current spatial location (i.e., the sensor's current location within the arrangement), whether the sensor is static or moving, the current geometric relationship between the sensor and each of the other sensors in the arrangement (i.e., the sensor's current position relative to each of the other sensors), the position of the sensor relative to the scene, and whether or not the sensor is genlocked (i.e., temporally synchronized) with the other sensors in the arrangement, among others. The determination of the intrinsic and extrinsic characteristics of each of the sensors can be made using various conventional methods, examples of which will be described in more detail hereafter. The knowledge of the number of sensors in the arrangement, and the intrinsic and extrinsic characteristics of each of the sensors, is then used to temporally and spatially calibrate the streams of sensor data (block 406).
  • As is appreciated in the art of video recording, the intrinsic and extrinsic characteristics of each of the sensors in the arrangement are commonly determined by performing one or more calibration procedures which calibrate the sensors, where these procedures are specific to the particular types of sensors that are being used to capture the scene, and the particular number and arrangement of the sensors. In the unidirectional and bidirectional live FVV implementations of the pipeline, the calibration procedures are performed and the streams of sensor data which are generated thereby are input before the scene capture. In the recorded FVV implementation of the pipeline, the calibration procedures can be performed and the streams of sensor data which are generated thereby can be input either before or after the scene capture. Exemplary calibration procedures will now be described.
  • In a situation where the sensors that are being used to capture the scene are genlocked and include a combination of color cameras, sensors which generate a stream of infrared images of the scene, and one or more time-of-flight cameras, and this combination of cameras is arranged in a static array, the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner. A stream of calibration data can be input from each of the cameras in the array while a common physical feature (such as a ball, or the like) is internally illuminated with an incandescent light (which is visible to all of the cameras) and moved throughout the scene. These streams of calibration data can then be analyzed using conventional methods to determine both an intrinsic and extrinsic calibration matrix for each of the cameras.
  • In another situation where the sensors that are being used to capture the scene include a plurality of color cameras which are arranged in a static array, the cameras in the array can be calibrated and the intrinsic and extrinsic characteristics of each of the cameras can be determined in the following manner. A stream of calibration data can be input from each camera in the array while it is moved around the scene but in close proximity to its static location (thus allowing each camera in the array to view overlapping parts of the static background of the scene). After the scene is captured by the static array of color cameras and the streams of sensor data generated thereby are input, the streams of sensor data can be analyzed using conventional methods to identify features in the scene, and these features can then be used to calibrate the cameras in the array and determine the intrinsic and extrinsic characteristics of each of the cameras by employing a conventional method (e.g., extrinsic characteristics can be determined using a structure-from-motion method).
  • In yet another situation where one or more of the sensors that are being used to capture the scene are moving sensors (such as when the spatial location of a given sensor changes over time, or when controls on a given sensor are used to optically zoom in on the scene while it is being captured (which is commonly done during the recording of sporting events, among other things)), each of these moving sensors can be calibrated and its intrinsic and extrinsic characteristics can be determined at each point in time during the scene capture by using a conventional background model to register and calibrate relevant individual images that were generated by the sensor. In yet another situation where the sensors that are being used to capture the scene include a combination of static and moving sensors, the sensors can be calibrated and the intrinsic and extrinsic characteristics of each of the sensors can be determined by employing conventional multistep calibration procedures.
  • In yet another situation where there is no temporal synchronization between the sensors that are being used to capture the scene and the arrangement of the sensors can randomly change over time (such as when a plurality of mobile devices are held up by different users and the sensors on these devices are used to capture the scene), the exemplary pipeline will both spatially and temporally calibrate the streams of sensor data generated by the sensors at all points in time during the scene capture before the streams are processed in the processing stage. In an exemplary embodiment of the pipeline technique this spatial and temporal calibration can be performed as follows. After the scene is captured and the streams of sensor data representing the scene are input, the streams of sensor data can be analyzed using conventional methods to separate the static and moving elements of the scene. The static elements of the scene can then be used to generate a background model. Additionally, the moving elements of the scene can be used to generate a global timeline that encompasses all of the sensors, and each image in each stream of sensor data is assigned a relative time. The intrinsic characteristics of each of the sensors can be determined by using conventional methods to analyze each of the streams of sensor data.
  • In an implementation of the exemplary pipeline where the capture stage of the FVV processing pipeline is directly connected to the sensors that are being used to capture the scene, the intrinsic characteristics of each of the sensors can also be determined by reading appropriate hardware parameters directly from each of the sensors. In another embodiment of the pipeline technique where the capture stage is not directly connected to the sensors but rather the streams of sensor data are pre-recorded and then imported into the capture stage, the number of sensors and various intrinsic properties of each of the sensors can be determined by analyzing the streams of sensor data using conventional methods.
  • FIG. 5 illustrates an exemplary embodiment, in simplified form, of a process for processing the calibrated streams of sensor data to generate the scene proxies. As exemplified in FIG. 5, the process starts in block 500 with monitoring and periodically analyzing a set of current pipeline conditions. The set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the particular number of sensors that is being (or was) used to capture the scene, or the particular arrangement of these sensors that is being (or was) used, or one or more particular intrinsic characteristics of each of the sensors (e.g., the sensor type, among others), or one or more particular extrinsic characteristics of each of the sensors (e.g., the positioning of the sensor relative to the scene, and whether the sensor is static or moving, among others), or the like. The set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as whether the scene proxies are being generated on-the-fly, or being generated and stored for future playback (i.e., the particular type of FVV that is being processed in the pipeline and the speed at which the streams of sensor data have to be processed through the pipeline), or the like.
  • The set of current pipeline conditions can also include one or more conditions in the storage stage of the FVV processing pipeline such as the amount of storage space that is currently available to store the scene proxy. The set of current pipeline conditions can also include one or more conditions in the rendering stage of the pipeline such as the current viewpoint navigation information and temporal navigation information. In addition to including this viewpoint and temporal navigation information, the second set of current conditions is also generally associated with the specific implementation of the pipeline technique embodiments that is being used. The set of current pipeline conditions can also further include one or more conditions in the user viewing experience stage of the pipeline such as the particular type of display device the rendered frames are being displayed on, and the particular characteristics of the display device (e.g., its aspect ratio, its pixel resolution, and its form factor, among others).
  • Referring again to FIG. 5, after the analysis of the set of current pipeline conditions has been completed (block 500), the results of this analysis are then used to select one or more different 3D reconstruction methods which are matched to the current pipeline conditions (block 502). The selected 3D reconstruction methods are then used to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data (block 504). The 3D reconstructions of the scene and the results of the period analysis are then used to generate the scene proxy (block 506). The actions of blocks 500, 502, 504 and 506 are repeated for the duration of the scene (block 508, No). As such, it will be appreciated that the 3D reconstruction methods which are used, the types of 3D reconstructions of the scene which are generated, and thus the types of geometric proxy data in a scene proxy can change over time based upon changes in the current pipeline conditions. It will also be appreciated that the current pipeline conditions can be analyzed using different periodicities. By way of example but not limitation, in one embodiment of the exemplary pipeline the current pipeline conditions can be analyzed on a frame-by-frame basis (i.e., for each image in the streams of sensor data). In another embodiment of the pipeline, the current pipeline conditions can be analyzed using a periodicity of a prescribed number of sequential frames, where this number is greater than one. In yet another embodiment of the pipeline, the current pipeline conditions can be analyzed using a periodicity of a prescribed period of time.
  • It will thus be appreciated that the exemplary pipeline can use a wide variety of 3D reconstruction methods in various combinations, where the particular types of 3D reconstruction methods that are being used depend upon various current conditions in the FVV processing pipeline. Accordingly and as will be described in more detail hereafter, the scene proxies represent one or more types of geometric proxy data examples of which include, but are not limited to, the following. A scene proxy can include a stream of depth map images of the scene. A scene proxy can also include a stream of calibrated point cloud reconstructions of the scene. As is appreciated in the art of 3D reconstruction, these point cloud reconstructions are a low order geometric representation of the scene. A scene proxy can also include one or more types of high order geometric models such as planes, billboards, and existing (i.e., previously created) generic object models (e.g., human body models) which can be either modified, or animated, or both. A scene proxy can also include other high fidelity proxies such as a stream of mesh models of the scene, and the like. It will further be appreciated that since the particular 3D reconstruction methods that are used and the related manner in which a scene proxy is generated are based upon a period analysis (i.e., monitoring) of the various current conditions in the FVV processing pipeline, the 3D reconstruction methods that are used and the resulting types of data in the scene proxy can change over time based on changes in the pipeline conditions.
  • Generally speaking, for the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments described herein, due to the fact that the capture, processing, storage, rendering, user viewing experience stages of the FVV processing pipeline have to be completed within a prescribed very short period of time, the types of 3D reconstruction methods that can be used in these implementations are limited to high speed 3D reconstruction methods. By way of example but not limitation, in the unidirectional and bidirectional live FVV implementations of the pipeline, a scene proxy that is generated will include a stream of calibrated point cloud reconstructions of the scene, and may also include one or more types of higher order geometric models which can be either modified, or animated, or both. It will be appreciated that 3D reconstruction methods which can be implemented in hardware are also favored in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments. The use of sensors which generate infrared images of the scene is also favored in the unidirectional and bidirectional live FVV implementations of the pipeline technique embodiments.
  • For the recorded FVV implementation of the pipeline, lower speed 3D reconstruction methods can be used. By way of example but not limitation, in the recorded FVV implementation of the pipeline, a scene proxy that is generated can include both a stream of calibrated point cloud reconstructions of the scene, as well as one or more higher fidelity geometric proxies of the scene (such as when the point cloud reconstructions are used to generate a stream of mesh models of the scene, among other possibilities). The recorded FVV implementation of the pipeline also allows a plurality of 3D reconstruction steps to be used in sequence when generating the scene proxy. By way of example but not limitation, consider a situation where a stream of calibrated point cloud reconstructions of the scene has been generated, but there are some noisy or error prone stereo matches present in these reconstructions that extend beyond a human silhouette boundary in the scene. It will be appreciated that these noisy or error prone stereo matches can lead to the wrong texture data appearing in the mesh models of the scene, thus resulting in artifacts in the rendered scene. These artifacts can be eliminated by running a segmentation process to separate the foreground from the background, and then points outside of the human silhouette can be rejected as outliers.
  • FIG. 6 illustrates an exemplary embodiment, in simplified form, of a process for using a point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of depth map images of the scene (hereafter simply referred to as different depth map image streams). As exemplified in FIG. 6, whenever the calibrated streams of sensor data include a plurality of different depth map image streams (block 600, Yes), these different depth map image streams are merged into a stream of calibrated point cloud reconstructions of the scene (block 602). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein. Depending on the current pipeline conditions, the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more types of high fidelity geometric proxies of the scene (block 604). By way of example but not limitation, the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene, where this mesh model generation can be performed using conventional methods such as Poisson, among others.
  • FIG. 7 illustrates an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of infrared images of the scene (hereafter simply referred to as different infrared image streams). As exemplified in FIG. 7, whenever the calibrated streams of sensor data include a plurality of different infrared image streams (block 700, Yes), the following actions occur. Any narrow baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of infrared image streams are identified (block 702). A first set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified narrow baseline stereo pairs of sensors (block 704). Any wide baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of infrared image streams are then identified (block 706). A second set of different depth map image streams is then created from the pairs of infrared image streams generated by the identified wide baseline stereo pairs of sensors (block 708). The different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 710). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein. Depending on the current pipeline conditions, the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more types of high fidelity geometric proxies of the scene (block 712). By way of example but not limitation and as just described, the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene, although many applications will not need this level of fidelity.
  • FIG. 8 illustrates an exemplary embodiment, in simplified form, of a process for using the point cloud 3D reconstruction method to generate one or more different 3D reconstructions of the scene from the calibrated streams of sensor data when these calibrated streams include a plurality of different streams of color images of the scene (hereafter simply referred to as different color image streams). As exemplified in FIG. 8, whenever the calibrated streams of sensor data include a plurality of different color image streams (block 800, Yes), the following actions occur. Any narrow baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of color image streams are identified (block 802). A first set of different depth map image streams is then created from the pairs of color image streams generated by the identified narrow baseline stereo pairs of sensors (block 804). Any wide baseline stereo pairs of sensors that exist in the arrangement of sensors and generate pairs of color image streams are then identified (block 806). A second set of different depth map image streams is then created from the pairs of color image streams generated by the identified wide baseline stereo pairs of sensors (block 808). The different depth map image streams in the first set and the second set are then merged into a stream of calibrated point cloud reconstructions of the scene (block 810). It is noted that these point cloud reconstructions are unordered and as such, overlaps may exist therein. Depending on the current pipeline conditions, the stream of calibrated point cloud reconstructions of the scene can then optionally be used to generate one or more types of high fidelity geometric proxies of the scene (block 812). By way of example but not limitation and as just described, the stream of calibrated point cloud reconstructions of the scene can be used to generate a stream of mesh models of the scene.
  • It will be appreciated that depending on the particular arrangement of sensors that is used to capture the scene, a given sensor can be in a plurality of narrow baseline stereo pairs of sensors, and can also be in a plurality of wide baseline stereo pairs of sensors. This serves to maximize the number of different depth map image streams that are created, which in turn serves to maximize the precision of the scene proxy.
  • 1.1.3.1 FVV Generation Examples
  • Referring again to FIG. 1, this section provides an overview description, in simplified form, of several additional implementations of the capture and processing stages 102 and 104 of the FVV processing pipeline 100. It will be appreciated that the implementations described in this section are merely exemplary. Many other implementations of the capture and processing stages 102 and 104 are also possible which use other types of sensor arrangements and generate other types of scene proxies.
  • In one implementation of the capture and processing stages of the FVV processing pipeline a circular arrangement of eight genlocked sensors is used to capture a scene which includes one or more human beings, where each of the sensors includes a combination of one infrared structured-light projector, two infrared video cameras, and one color camera. Accordingly, the sensors each generate a different stream of video data which includes both a stereo pair of infrared image streams and a color image stream. As described heretofore, the pair of infrared image streams and the color image stream generated by each sensor are first used to generate different depth map image streams. The different depth map image streams are then merged into a stream of calibrated point cloud reconstructions of the scene. These point cloud reconstructions are then used to generate a stream of mesh models of the scene. A conventional view-dependent texture mapping method which accurately represents specular textures such as skin is then used to extract texture data from the color image stream generated by each sensor and map this texture data to the stream of mesh models of the scene.
  • In another implementation of the capture and processing stages of the FVV processing pipeline four genlocked visible light video cameras are used to capture a scene which includes one or more human beings, where the cameras are evenly placed around the scene. Accordingly, the cameras each generate a different stream of video data which includes a color image stream. An existing 3D geometric model of a human body can be used in the scene proxy as follows. Conventional methods can be used to kinematically articulate the model over time in order to fit (i.e., match) the model to the streams of video data generated by the cameras. The kinematically articulated model can then be colored as follows. A conventional view-dependent texture mapping method can be used to extract texture data from the color image stream generated by each camera and map this texture data to the kinematically articulated model.
  • In another implementation of the capture and processing stages of the FVV processing pipeline three unsynchronized visible light video cameras are used to capture a soccer game, where each of the cameras is moving and is located far from the game (e.g., rather than the spatial location of each of the cameras being fixed to a specified arrangement, each of the cameras is hand held by a different user who is capturing the game while they freely move about). Accordingly, the cameras each generate a different stream of video data which includes a stream of color images of the game. Articulated billboards can be used to represent the moving players in the scene proxy of the game as follows. For each stream of video data, conventional methods can be used to generate a segmentation mask for each body part of each player in the stream. Conventional methods can then be used to generate an articulated billboard model of each of the moving players in the game from the appropriate segmentation masks. The articulated billboard model can then be colored as just described.
  • 1.1.4 FVV Rendering Stage
  • This section provides a more detailed description of the rendering stage of the FVV processing pipeline. FIG. 9 illustrates an exemplary embodiment, in simplified form, of a process for rendering a FVV. As exemplified in FIG. 9, the process starts in block 900 with inputting scene proxies which geometrically describes the scene as a function of time. The scene proxies are then processed to generate a frame reflecting the current synthetic viewpoint of the scene, which maximizes the photo-realism thereof based upon a set of current pipeline conditions (block 902). These conditions can be in any one or more of the aforementioned stages of the FVV processing pipeline.
  • More particularly, FIG. 10 illustrates an exemplary embodiment, in simplified form, of a process for implementing the above-described rendering. As exemplified in FIG. 10, the process starts in block 1000 with monitoring and periodically analyzing the set of current pipeline conditions. The set of current pipeline conditions can include one or more conditions in the capture stage of the FVV processing pipeline such as the particular number of sensors that is being (or was) used to capture the scene, or the particular arrangement of these sensors that is being (or was) used, or one or more particular intrinsic characteristics of each of the sensors (e.g., the sensor type, among others), or one or more particular extrinsic characteristics of each of the sensors (e.g., the positioning of the sensor relative to the scene, and whether the sensor is static or moving, among others), or the complexity and composition of the scene, or whether the scene is relatively static or dynamic, or the like. The set of current pipeline conditions can also include one or more conditions in the processing stage of the pipeline such as the particular 3D reconstruction methods that are being (or were) used to generate the scene proxy, or the particular types of geometric proxy data that are in the scene proxy, or the like.
  • The set of current pipeline conditions can also include one or more conditions in the user viewing experience stage of the FVV processing pipeline such as the particular graphics processing capabilities [features] that are available in the computing device hardware which is being used, or the particular type of display device the rendered FVV frames are being displayed on, or the particular characteristics of the display device (described heretofore), or the particular number of degrees of viewpoint navigation freedom that are being provided to the end user, or whether or not the end user's client computing device includes a natural user interface (and if so, the particular natural user interface modalities that are anticipated to be used by the end user), or the like. The set of current pipeline conditions can also include information which is generated by the end user and provided to the user viewing experience stage that specifies desired changes to (i.e., controls) the current synthetic viewpoint of the scene. Such information can include viewpoint navigation information which is being input by this stage based upon the FVV navigation that is being performed by the end user, or temporal navigation information which may also be input to this stage based upon this FVV navigation. The set of current pipeline conditions can also include the particular type of FVV that is being processed in the pipeline.
  • Referring again to FIG. 10, after the analysis of the set of current pipeline conditions has been completed (block 1000), the results of this analysis are then used to select one or more different image-based rendering methods which are matched to the current pipeline conditions (block 1002). The selected image-based rendering methods and the results of the period analysis are then used to generate a frame reflecting the current synthetic viewpoint of the scene (block 1004). The actions of blocks 1000, 1002 and 1004 are repeated for the duration of the scene (block 1006, No). As such, it will be appreciated that the one or more image-based rendering methods which are used and frames that are generated change over time based upon changes in the current pipeline conditions. As described heretofore, the current pipeline conditions can be analyzed using different periodicities.
  • It will thus be appreciated that the exemplary pipeline described herein can use a wide variety of image-based rendering methods in various combinations, where the particular types of image-based rendering methods that are being used depend upon various current conditions in the FVV processing pipeline. The image-based rendering methods that are employed by the pipeline techniques described herein can render novel views (i.e., synthetic viewpoints) of the scene directly from a collection of images in the scene proxy without having to know the scene geometry. An overview exemplary image-based rendering methods which can be employed by the pipeline are provided hereafter.
  • The pipeline supports using any type of display device to view the FVV including, but not limited to, the very small form factor display devices used on conventional smart phones and other types of mobile devices, the small form factor display devices used on conventional tablet computers and netbook computers, the display devices used on conventional laptop computers and personal computers, conventional televisions and 3D televisions, conventional autostereoscopic 3D display devices, conventional head-mounted transparent display devices, and conventional wearable heads-up display devices such as those that are used in virtual reality applications. In a situation where the end user is using an autostereoscopic 3D display device to view the FVV, then the rendering stage of the FVV processing pipeline will simultaneously generate both left and right current synthetic viewpoints of the scene at an appropriate aspect ratio and resolution in order to create a stereoscopic effect for the end user. In another situation where the end user is using a conventional television to view the FVV, then the rendering stage will generate just a single current synthetic viewpoint. In yet another situation where the end user is viewing the FVV in an augmented reality context, (e.g., in a situation where the end user is wearing a head-mounted transparent display), then the rendering stage may generate a current synthetic viewpoint having just the foreground elements of the captured scene, thus enabling objects to be embedded in a natural environment.
  • The pipeline also supports using any type of user interface modality to control the current viewpoint while viewing the FVV including, but not limited to, conventional keyboards, conventional pointing devices (such as a mouse, or a graphics tablet, or the like), and conventional natural user interface modalities (such as a touch-sensitive display screen, or the head tracking functionality that is integrated into wearable heads-up display devices, or a motion and location sensing device (such as the Microsoft Kinect™, among others). It will be appreciated that if the end user is (or will be) using of one or more natural user interface modalities while they are viewing the FVV, this can influence the spatiotemporal navigation capabilities that are provided to the end user. In other words, the FVV processing pipeline can process the streams of sensor data differently in order to enable different end user viewing experiences based on the particular type(s) of user interface modality that is anticipated to be used by the end user. By way of example but not limitation, in a situation where a given end user is using the wearable heads-up display device to view and navigate the FVV, then all six degrees of viewpoint navigation freedom could be provided to the end user. In the bidirectional live FVV implementation of the pipeline technique embodiments, if the end user at each physical location that is participating in a given videoconference/telepresence session is using the wearable heads-up display device to view and navigate the FVV, then parallax functionality can be implemented in order to provide each end user with an optimally realistic viewing experience when they control/change their viewpoint of the FVV using head movements; the pipeline can also provide for corrected conversational geometry between two end users, thus providing the appearance that both end users are looking directly at each other. In another situation where a given end user is using the motion and location sensing device navigate the FVV, then the rendering stage can optimize the current synthetic viewpoint that is being displayed based on the end user's current spatial location in front of their display device. In this way, the end user's current spatial location can be mapped to the 3D geometry within the FVV.
  • FIG. 11 illustrates the various degrees of viewpoint navigation freedom that can be supported by the pipeline techniques describe herein. As described heretofore, the pipeline generally supports spatiotemporal (i.e., space-time) navigation of the FVV. More particularly, the recorded FVV, unidirectional live FVV, and bidirectional live FVV implementations described herein can each support spatial viewpoint navigation of the FVV having as many as six degrees of freedom, which can be appropriate when the end user is viewing and navigating an FVV that includes high fidelity geometric information. As exemplified in FIG. 11, these six degrees of freedom include viewpoint navigation along the x axis, viewpoint navigation rotationally about the x axis (8×), viewpoint navigation along the y axis, viewpoint navigation rotationally about the y axis (θy), viewpoint navigation along the z axis, and viewpoint navigation rotationally about the z axis (θz). The recorded FVV, unidirectional live FVV, and bidirectional live FVV implementations can also each support spatial viewpoint navigation of the FVV having just one degree of viewpoint navigation freedom. The recorded FVV implementation can also support temporal navigation of the FVV.
  • In some implementations of the pipeline, such as the recorded FVV implementation described herein, a producer or editor of the FVV may want to specify the particular types of viewpoint navigation that is possible at different times during the FVV. By way of example but not limitation, in one scene a movie director may want to confine the end user's viewpoint navigation to a limited area of the scene or a specific axis, but in another scene the director may want to allow the end user to freely navigate their viewpoint throughout the entire area of the scene.
  • 1.1.4.1 Image-Based Rendering Methods
  • As described heretofore, the current synthetic viewpoint of the scene is generated using one or more image-based rendering methods which are selected based upon a periodic analysis of the aforementioned set of current pipeline conditions. Accordingly, the particular image-based rendering methods that are used can change over time based upon changes in the current pipeline conditions. It will thus be appreciated that in one situation where the scene has a low degree of complexity and the arrangement of sensors which is being (or was) used to capture the scene are located close to the scene, just a single image-based rendering method may be used to generate the current synthetic viewpoint of the scene. In another situation where the scene has a high degree of complexity and the arrangement of sensors which is being (or was) used to capture the scene are located far from the scene, a plurality of image-based rendering methods may be used to generate the current synthetic viewpoint of the scene depending on the location of the current viewpoint relative to the scene and the particular types of geometric proxy data that are in the scene proxy.
  • FIG. 12 illustrates an exemplary embodiment, in simplified form, of a continuum of the various image-based rendering methods which can be employed by the pipeline technique embodiments described herein. As exemplified in FIG. 12, for didactic purposes these various image-based rendering methods can be classified into three categories according to the amount and type of scene geometry information that is included in the scene proxy and thus is available to be used in the rendering stage, namely rendering with without scene geometry 1206 (i.e., the scene geometry is unknown), rendering with implicit scene geometry 1204 (i.e., correspondence), and rendering with explicit scene geometry 1202 (which can be either approximate or accurate). These categories 1202, 1204 and 1206 are to be viewed as a continuum 1200 rather than strict and discrete categories since it will be appreciated that certain of the image-based rendering methods defy strict categorization.
  • As also exemplified in FIG. 12 a trade-off exists between the amount and type of scene geometry information that is available to be used in the rendering stage, and the number of images that are needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic. Generally speaking, the higher the sensor density in the arrangement of sensors that is being used to capture the scene (i.e., the larger the number of sensors that is used in the arrangement), the larger the number of images that is available in the scene proxy, and thus the less scene geometry information that is needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic. However, it is noted that having less scene geometry information in the scene proxy will generally decrease the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints will generally be limited to positions between sensors or near sensors. Correspondingly, the lower the sensor density (i.e., the smaller the number of sensors that is used in the arrangement), the smaller the number of images that is available in the scene proxy, and thus the more scene geometry information that is needed to be available in the scene proxy in order to generate synthetic viewpoints of the scene which are photo-realistic. However, it is noted that having more scene geometry information in the scene proxy will generally increase the end user's options for navigating the current synthetic viewpoint of the scene (i.e., the synthetic viewpoints can be navigated to positions which are far away from the real sensor viewpoints).
  • On the left side 1206 of the continuum 1200 exemplified in FIG. 12 the scene proxy includes a large number of images but does not include any scene geometry or correspondence information. In this situation a conventional light field method, or a conventional lumigraph method, or a conventional concentric mosaics method, among others, can be used to process the scene proxy in order to generate the current synthetic viewpoint of the scene. As is appreciated in the art of image-based rendering, each of these methods relies on the characterization of the conventional plenoptic function, and constructs a continuous representation of the plenoptic function from the images in the scene proxy. The light field method is generally applicable when the images of the scene are uniformly captured. The light field method generates new images of the scene by appropriately filtering and interpolating the images in the scene proxy. The lumigraph method is similar to the light field method except that the lumigraph method is generally applicable when the images of the scene are not uniformly captured. The lumigraph method enhances the rendering performance by applying approximated geometry to compensate for this non-uniform capture. Unlike the light field and lumigraph methods which are applicable when the arrangement of sensors is a 2D grid, the concentric mosaics method is applicable when the arrangement of sensors is circular. Conventional image mosaicing methods can also be used to construct a complete plenoptic function at a fixed viewpoint from an incomplete set of images of the scene.
  • In the middle 1204 of the continuum 1200 exemplified in FIG. 12 the scene proxy does not include explicit scene geometry information, but rather it includes implicit scene geometry information in the form of feature (e.g., point) correspondences between images, where these correspondences can be computed using conventional computer vision methods. In this situation various conventional transfer methods (such as a conventional view interpolation method, or a conventional view morphing method, among others) can be used to process the scene proxy in order to generate the current synthetic viewpoint of the scene. As is appreciated in the art of image-based rendering, such transfer methods are characterized by the use of a relatively small number of images with the application of geometric constraints (which are either recovered or known a priori) to project image pixels appropriately at a given synthetic viewpoint. These geometric constraints can be in the form of known depth values at each pixel, or epipolar constraints between stereo pairs of images, or trifocal/tri-linear tensors that link correspondences between triplets of images. The view interpolation method generates synthetic viewpoints of the scene by interpolating optical flow between corresponding points. The view morphing method generates synthetic viewpoints that reside on a line which links the optical centers of two different sensors based on point correspondences.
  • On the right side 1202 of the continuum 1200 exemplified in FIG. 12 the scene proxy includes explicit and accurate scene geometry information and a small number of images, where this geometry information can be in form of either depth along known lines-of-sight, or 3D coordinates, among other things. In this situation conventional 3D warping methods, or a conventional layered depth images method, or a conventional layered depth images tree method, or a conventional view-dependent texture mapping method, or a conventional view-dependent geometry method, among others, can be used to process the scene proxy in order to generate the current synthetic viewpoint of the scene. As is appreciated in the art of image-based rendering, the 3D warping methods, or the layered depth images method, or the layered depth images tree method can be used when the scene proxy includes both depth map images and color (or monochrome) images of the scene. When the scene proxy includes depth information for all the points in an image, the 3D warping methods can be used to render the image from any nearby point of view by projecting the pixels of the image to their proper 3D locations and then re-projecting them onto a new picture. The rendering speed of such 3D warping methods can be increased by using conventional relief texture methods which factor the warping process into a relatively simple pre-warping operation and a conventional texture mapping operation (which may be performed by conventional graphics processing hardware). It is noted that the 3D warping methods can be applied to both traditional perspective images as well as multi-perspective images. The view-dependent geometry method was first used in the context of 3D cartoons and trades off geometry and images, and may be used to represent the current synthetic viewpoint of the scene more compactly. A conventional texture-mapped models method can also be used to generate the current synthetic viewpoint of the scene.
  • 1.1.5 FVV User Viewing Experience Stage and Interactive FVV Presentation
  • This section provides a more detailed description of the user viewing experience stage of the FVV processing pipeline, and the presentation of a FVV to one or more end users. As described previously, each end user interactively navigates their viewpoint of the scene via their client computing device, and each time an end user chooses a different viewpoint, this new viewpoint is provided to the user viewing experience stage by the user's client computing device. To this end, each end user has a FVV player operating on their client computing device. The FVV player facilitates the display of FVV related items (e.g., FVV frames or user interface screens), accepts end user inputs, and causes the client computing device to communicate with the FVV user experience stage. For example, as outlined in the exemplary embodiment FIGS. 13A-B, once an end user installs the FVV player on his or her client computing device (via conventional methods), the player can be initiated by the end user and requested to display a user interface screen that allows the user to select a FVV available for playing (block 1300). As noted previously, this can be a live FVV or a previously recorded FVV. The end user FVV selection is then input (block 1302), and transmitted via the client computing device to the server upon which the user experience stage is operating (block 1304). The user experience stage receives the user selection (block 1306), and in response, initiates the FVV pipeline (more particularly, the rendering stage) to produce the selected FVV (block 1308), and in block 1310 instructs the client device to instantiate the end user controls in the FVV player appropriate to the FVV type. The client computing device receives the instruction (block 1312), and causes a user interface to be displayed that allows the user to interactively control the viewpoint of the FVV scene (block 1314). As indicated previously, this viewpoint control can be spatial, temporal, or both, depending on the FVV selected. Meanwhile, the rendering stage of the FVV pipeline begins rendering frames of the selected FVV (block 1316) and providing them for display via the user experience stage and the end user's client device (block 1318). It is noted that the initial viewpoint of the FVV scene depicted in the initial frames rendered can be a default viewpoint assigned to the FVV if the end user has not yet specified a viewpoint. As the FVV frames are provided, the end user's client computing device receives them (block 1320) and in block 1322 displays them on a resident display device (such as one described previously). This procedure of rendering, providing and displaying of FVV frames at the current viewpoint is repeated at an appropriate frame rate for the FVV, while at the same time the client computing device monitors end user inputs to the FVV player to determine if an end user viewpoint navigation input has been received (block 1324). If such an input is detected, the client computing device transmits it to the user experience stage of the FVV pipeline (block 1326), which receives it and forwards the new viewpoint to the rendering stage (block 1328). The rendering stage then renders FVV frames depicting the scene from the new (now current) viewpoint (block 1330) and provides them for display as described previously (block 1332). As the FVV frames are provided, the end user's client computing device receives them (block 1334) and in block 1336 displays them on a resident display device. This monitoring and rendering procedure is then repeated for the duration of the FVV.
  • It is noted that in situations where a previously recorded FVV is being viewed, as indicated above an end user can temporally control the playback of the FVV, and based on this temporal control the rendering stage will provide FVV frames starting with the frame that corresponds the last user-specified temporal location in the FVV. More particularly, referring to FIG. 14, in one embodiment, the aforementioned scene is first captured by a server using an arrangement of sensors (block 1400). These streams of sensor data are input and calibrated (block 1402), and then scene proxies are generated from the calibrated streams of sensor data (block 1404). The scene proxies are stored as they are generated (block 1406). Meanwhile, a client computing device monitors navigational inputs from an end user (block 1408). A synthetic viewpoint of the scene is input by the end user and the client computing device transmits the current viewpoint input to the server via the data communication network (block 1410). Additionally, a temporal navigation instruction is input by the end user and the client computing device transmits it to the server as well (block 1412). This temporal navigation input represents an instruction to provide FVV frames from a user-specified temporal location in the FVV. The current synthetic viewpoint of the scene and the temporal navigation instruction are received from the client computing device (block 1414). A sequence of frames is then generated using the scene proxies (block 1416). Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, the first of which corresponds to the last user-specified temporal location in the FVV. The generated frames are then transmitted to the client computing device via the data communication network (block 1418). The client computing device receives the frames (block 1420), and displays them in a conventional manner to the end user (block 1422).
  • In another embodiment, the FVV can be played in reverse, thus rewinding the FVV while still allowing the end user to watch. More particularly, referring to FIG. 15, in one embodiment, the aforementioned scene is captured by a server using an arrangement of sensors (block 1500). These streams of sensor data are input and calibrated (block 1502), and then scene proxies are generated from the calibrated streams of sensor data (block 1504). The scene proxies are stored as they are generated (block 1506). Meanwhile, a client computing device monitors navigational inputs from an end user (block 1508). A synthetic viewpoint of the scene is input by the end user and the client computing device transmits the current viewpoint input to the server via the data communication network (block 1510). Additionally, a reverse-action temporal navigation instruction is input by the end user and the client computing device transmits it to the server as well (block 1512). This reverse-action temporal navigation input represents an instruction to provide FVV frames in reverse order from a specified temporal location in the FVV, thereby rewinding the FVV. The current synthetic viewpoint of the scene and the reverse-action temporal navigation instruction are received from the client computing device (block 1514). A sequence of frames is then generated using the scene proxies (block 1516). Each frame of the sequence depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene, and are generated in reverse order, the first of which corresponds to the last user-specified temporal location in the FVV. The generated frames are then transmitted to the client computing device via the data communication network (block 1518). The client computing device receives the frames (block 1520), and displays them in a conventional manner to the end user (block 1522).
  • In yet another embodiment, the FVV can be paused and restarted by the end user. More particularly, referring to FIG. 16, in one embodiment, as the aforementioned FVV frames are received by the client computing device and displayed to the end user, the client computing device monitors inputs from the end user (block 1600). A pause instruction is input by the end user and the client computing device transmits it to the server via the data communication network (block 1602). The pause instruction is received from the client computing device (block 1604). The server then suspends the generation and transmission of FVV frames to the client computing device (block 1606). While the FVV is paused, the client computing device continues to monitor inputs from the end user (block 1608). A restart instruction is input by the end user and the client computing device transmits it to the server via the data communication network (block 1610). The restart instruction is received from the client computing device (block 1612). The server then restarts the generation and transmission of FVV frames to the client computing device (block 1614). The client computing device receives the frames (block 1616), and displays them in a conventional manner to the end user (block 1618).
  • 2.0 Additional Embodiments
  • While the foregoing cloud based FVV streaming technique embodiments have been described by specific reference to embodiments thereof, it is understood that variations and modifications thereof can be made without departing from the true spirit and scope of the pipeline technique. For example, additional embodiments can be designed to reduce latency times and employed when latency issues are a concern.
  • By way of example but not limitation, in one such additional embodiment, each frame transmitted to a client computer is also accompanied with at least some of the scene proxies used by the renderer to generate the frame. This allows the client device to locally generate a new frame of the depicted scene from a different viewpoint in the same manner the renderer produces frames when a new viewpoint is requested by the end user (as described previously). More particularly, whenever a same-frame end user viewpoint navigation input is received via the aforementioned FVV control user interface which represents an instruction to view a scene depicted in the last-displayed FVV frame from a different viewpoint, the client computing device generates a new FVV frame using the scene proxy or proxies received with the last-displayed frame, and displays the new FVV frame on the aforementioned display device. This new FVV frame depicts the scene depicted in the last-displayed FVV frame from a viewpoint specified in the same-frame end user viewpoint navigation input.
  • In another additional embodiment, the frame transmitted to a client computing device would depict all or a larger portion of the captured scene, than the display device associated with the client computing device is capable of displaying. Thus, only a portion of the received frame could be displayed at one time. This allows an end user to translate through the depicted scene without having to request a new frame from the FVV pipeline. More particularly, whenever a “same-frame” end user viewpoint navigation input is received via the FVV control user interface which represents an instruction to view a portion of the scene depicted in the last-received FVV frame that was not shown in the last-displayed portion of the frame, at least the portion of the scene depicted in the last-received FVV frame specified in the same-frame end user viewpoint navigation input is displayed on the display device.
  • Still another additional embodiment involves the FVV pipeline, and more particularly, the rendering stage predicting the next new viewpoint to be requested. For example, this can be accomplished based on past viewpoint change requests received from a end user. The rendering stage then renders and stores a new frame (or a sequence of frames) from the predicted viewpoint, and provides it the client computing device of the end-user if that end user requests the predicted viewpoint. It is further noted that the rendering stage could render multiple frames based on multiple predictions of what viewpoint the end user might request next. Then, if the end user's next viewpoint request matches one of the rendered frames, that frame is sent to the end user's client computing device. More particularly, referring to FIGS. 17A-B, in one embodiment, the foregoing procedure is accomplished as follows. First, the server predicts one or more synthetic viewpoints of the scene that may be received from a client computing device in the future (block 1700). A previously unselected one of the predicted synthetic viewpoints of the scene is then selected (block 1702), and one or more frames are generated using the aforementioned stored scene proxies which depict at least a portion of the scene as viewed from the selected predicted synthetic viewpoint of the scene (block 1704). The generated frame or frames are then stored (block 1706). It is then determined if all the predicted synthetic viewpoints have been selected (block 1708). If not, blocks 1702 through 1708 are repeated. However, if all the predicted synthetic viewpoints have been selected and frames therefor generated and stored, the incoming messages from the client computing device are monitored (block 1710), and when a message is received it is determined if the message includes a current synthetic viewpoint of the scene that matches one of the predicted synthetic viewpoints (block 1712). If the message includes a current synthetic viewpoint of the scene that matches one of the predicted synthetic viewpoints, then each frame generated based on the matched predicted synthetic viewpoint of the scene is transmitted to the client computing device via the data communication network for display to the end user of the client computing device (block 1714). Then, the previously described process (as described in connection with FIG. 2) resumes. This is also the case if the message does not include a current synthetic viewpoint that matches one of the predicted synthetic viewpoints. More particularly, a sequence of frames is generated using the scene proxies where each frame depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene (block 1716), and each frame is transmitted to the client computing device via the data communication network for display to the end user (block 1718).
  • It is also noted that any or all of the aforementioned embodiments can be used in any combination desired to form additional hybrid embodiments. Although the pipeline technique embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described heretofore. Rather, the specific features and acts described heretofore are disclosed as example forms of implementing the claims.
  • 3.0 Computing Environment
  • The cloud based FVV streaming technique embodiments described herein are operational within numerous types of general purpose or special purpose computing system environments or configurations. FIG. 18 illustrates a simplified example of a general-purpose computer system on which various embodiments and elements of the pipeline technique, as described herein, may be implemented. It is noted that any boxes that are represented by broken or dashed lines in FIG. 18 represent alternate embodiments of the simplified computing device, and that any or all of these alternate embodiments, as described below, may be used in combination with other alternate embodiments that are described throughout this document.
  • For example, FIG. 18 shows a general system diagram showing a simplified computing device 10. Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers (PCs), server computers, handheld computing devices, laptop or mobile computers, communications devices such as cell phones and personal digital assistants (PDAs), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and audio or video media players.
  • To allow a device to implement the cloud based FVV streaming technique embodiments described herein, the device should have a sufficient computational capability and system memory to enable basic computational operations. In particular, as illustrated by FIG. 18, the computational capability is generally illustrated by one or more processing unit(s) 12, and may also include one or more graphics processing units (GPUs) 14, either or both in communication with system memory 16. Note that that the processing unit(s) 12 may be specialized microprocessors (such as a digital signal processor (DSP), a very long instruction word (VLIW) processor, an FPGA or other micro-controller) or can be conventional central processing units (CPUs) having one or more processing cores including, but not limited to, specialized GPU-based cores in a multi-core CPU.
  • In addition, the simplified computing device 10 of FIG. 18 may also include other components, such as, for example, a communications interface 18. The simplified computing device 10 of FIG. 18 may also include one or more conventional computer input devices 20 (e.g., pointing devices, keyboards, audio input/capture devices, video input/capture devices, haptic input devices, devices for receiving wired or wireless data transmissions, and the like). The simplified computing device 10 of FIG. 18 may also include other components, such as, for example, display device(s) 24, and one or more conventional computer output devices 22 (e.g., audio output devices, video output devices, devices for transmitting wired or wireless data transmissions, and the like). Exemplary types of input devices (herein also referred to as user interface modalities) and display devices that are operable with the pipeline technique embodiments described herein have been described heretofore. Note that typical communications interfaces 18, additional types of input and output devices 20 and 22, and storage devices 26 for general-purpose computers are well known to those skilled in the art, and will not be described in detail herein.
  • The simplified computing device 10 of FIG. 18 may also include a variety of computer readable media. Computer readable media can be any available media that can be accessed by the computer 10 via storage devices 26, and includes both volatile and nonvolatile media that is either removable 28 and/or non-removable 30, for storage of information such as computer-readable or computer-executable instructions, data structures, program modules, or other data. By way of example but not limitation, computer readable media may include computer storage media and communication media. Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as digital versatile disks (DVDs), compact discs (CDs), floppy disks, tape drives, hard drives, optical drives, solid state memory devices, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.
  • Storage of information such as computer-readable or computer-executable instructions, data structures, program modules, and the like, can also be accomplished by using any of a variety of the aforementioned communication media to encode one or more modulated data signals or carrier waves, or other transport mechanisms or communications protocols, and includes any wired or wireless information delivery mechanism. Note that the terms “modulated data signal” or “carrier wave” generally refer a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves. Combinations of the any of the above should also be included within the scope of communication media.
  • Furthermore, software, programs, and/or computer program products embodying the some or all of the various embodiments of the pipeline technique described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
  • Finally, the cloud based FVV streaming technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types. The cloud based FVV streaming technique embodiments may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Additionally, the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.

Claims (20)

Wherefore, what is claimed is:
1. A computer-implemented process for generating a free viewpoint video (FVV) of a scene, comprising:
using one or more computing devices to perform the following process actions:
capturing the scene using an arrangement of sensors, said arrangement comprises a plurality of sensors that generate a plurality of streams of sensor data each of which represents the scene from a different geometric perspective;
inputting and calibrating the streams of sensor data;
generating scene proxies from the calibrated streams of sensor data, which geometrically describe the scene as a function of time;
receiving a current synthetic viewpoint of the scene from a client computing device via a data communication network, said current synthetic viewpoint having been selected by an end user of the client computing device;
generating a sequence of frames using the scene proxies, each frame of which depicts at least a portion of the scene as viewed from the current synthetic viewpoint of the scene; and
transmitting each frame generated to the client computing device via the data communication network for display to the end user of the client computing device.
2. The process of claim 1, further comprising, prior to performing the process action of receiving a current synthetic viewpoint of the scene from the client computing device, performing the actions of:
generating a sequence of frames using the scene proxies, each frame of which depicts at least a portion of the scene as viewed from a prescribed default viewpoint of the scene; and
transmitting each frame generated to the client computing device via the data communication network for display to the end user of the client computing device.
3. The process of claim 1, wherein the process action of generating a scene proxy comprises the actions of:
monitoring and periodically analyzing current conditions related to at least one of capturing streams of sensor data, or calibrating the streams of sensor data, or generating scene proxies, or generating a sequence of frames, or transmitting each frame generated; and
each time the current conditions are analyzed,
using results of the periodic analysis to select one or more different 3D reconstruction methods which are matched to the current conditions,
using each selected 3D reconstruction method to generate a 3D reconstruction of at least part of the scene from the calibrated streams of sensor data, and
using the 3D reconstructions and the results of the periodic analysis to generate scene proxies.
4. The process of claim 1, wherein each scene proxy comprises one or more types of geometric proxy data, and wherein the process action of generating scene proxies, comprises an action of, for each scene proxy generated, matching the scene proxy to a set of current conditions related to at least one of capturing streams of sensor data or calibrating the streams of sensor data, so as to maximize the photo-realism of frames generated from the scene proxy.
5. The process of claim 1, wherein the process action of generating a sequence of frames, comprises the actions of,
monitoring and periodically analyzing current conditions related to at least one of capturing streams of sensor data, or calibrating the streams of sensor data, or generating scene proxies, or generating a sequence of frames, or transmitting each frame generated; and
each time the current conditions are analyzed,
using results of the periodic analysis to select one or more different image-based rendering methods which are matched to the current conditions, and
using each selected image-based rendering method and the results of the period analysis to generate the sequence of frames.
6. The process of claim 1, further comprising an action of, for each frame transmitted to the client computing device, also transmitting at least some of the scene proxies used to generate the frame.
7. The process of claim 1, wherein the process action of generating a sequence of frames, comprises generating each frame so as to depict at least a portion of the scene as viewed from the current synthetic viewpoint of the scene that is larger than a display device associated with the client computing device is capable of displaying.
8. The process of claim 1, further comprising the actions of:
predicting one or more synthetic viewpoints of the scene that may be received from the client computing device in the future;
for each predicted synthetic viewpoint of the scene,
generating one or more frames using the scene proxies, each frame of which depicts at least a portion of the scene as viewed from the predicted synthetic viewpoint of the scene, and
storing each frame generated based on the predicted synthetic viewpoint of the scene; and
whenever a current synthetic viewpoint of the scene is received from the client computing device that matches one of the predicted synthetic viewpoints, prior to performing the process action of generating a sequence of frames using the scene proxies, transmitting each frame generated based on the matched predicted synthetic viewpoint of the scene to the client computing device via the data communication network for display to the end user of the client computing device.
9. The process of claim 1, further comprising:
prior to performing the process actions of generating and transmitting a sequence of frames, performing the process actions of,
storing each scene proxy generated, and
receiving a temporal navigation instruction from the client computing device via the data communication network, said temporal navigation instruction having been specified by the end user of the client computing device and representing an instruction to provide FVV frames from a specified temporal location in the FVV; and wherein
the process action of generating a sequence of frames further comprises using the stored scene proxies to generate frames, the first of which corresponds to the last user-specified temporal location in the FVV.
10. The process of claim 1, further comprising:
performing prior to performing the process actions of generating and transmitting a sequence of frames, the process actions of,
storing each scene proxy generated, and
receiving a temporal navigation instruction from the client computing device via the data communication network, said temporal navigation instruction having been specified by the end user of the client computing device and representing an instruction to provide FVV frames in reverse order from a specified temporal location in the FVV thereby rewinding the FVV; and wherein
the process action of generating a sequence of frames comprises using the stored scene proxies to generate frames in reverse order, the first of which corresponds to the last user-specified temporal location in the FVV.
11. The process of claim 1, further comprising the process actions of:
receiving a temporal navigation instruction from the client computing device via the data communication network, said temporal navigation instruction having been specified by the end user of the client computing device and representing an instruction to pause the FVV; and
suspending the generation and transmission of FVV frames to the client computing device.
12. The process of claim 11, further comprising the process actions of:
receiving a temporal navigation instruction from the client computing device via the data communication network, said temporal navigation instruction having been specified by the end user of the client computing device and representing an instruction to restart the paused FVV; and
restarting the generation and transmission of FVV frames to the client computing device.
13. A system for generating a free viewpoint video (FVV) of a scene, comprising:
an arrangement of sensors used to capture video of the scene, or audio, or both, each of said sensors producing a stream of sensor data representing the scene from a different geometric perspective;
at least one general purpose computing device; and
a computer program comprising program modules executed by the computing device or devices, wherein the computing device or devices are directed by the program modules of the computer program to,
input and calibrate the streams of sensor data produced by the arrangement of sensors,
generate scene proxies from the calibrated streams of sensor data, which geometrically describe the scene as a function of time,
store the scene proxies as they are generated,
generate a sequence of frames using the stored scene proxies, each frame of which depicts at least a portion of the scene as viewed from a current synthetic viewpoint of the scene, and
transmit each frame generated to a client computing device via a data communication network for display to the end user of the client computing device.
14. The system of claim 13, further comprising a program module for receiving current synthetic viewpoints of the scene from a client computing device via a data communication network, each of said current synthetic viewpoints having been selected by an end user of the client computing device, and wherein said program module for generating a sequence of frames using the stored scene proxies, employs each current synthetic viewpoint received to generate a sequence of frames, each frame of which depicts at least a portion of the scene as viewed from the last-received current synthetic viewpoint.
15. The system of claim 14, wherein prior to receiving a first current synthetic viewpoint of the scene from a client computing device, said program module for generating a sequence of frames using the stored scene proxies, employs a prescribed default synthetic viewpoint as the current synthetic viewpoint to generate a sequence of frames of the scene.
16. The system of claim 13, wherein prior to executing the program module for transmitting each frame generated, executing additional program modules for:
storing each frame generated; and
receiving a temporal navigation instruction from the client computing device via the data communication network, said temporal navigation instruction having been specified by the end user of the client computing device and representing an instruction to provide FVV frames from a specified temporal location in the FVV; and wherein
the program module for transmitting each frame generated to the client computing device, comprises transmitting frames, the first of which corresponds to a frame assigned to the last user-specified temporal location in the FVV.
17. A computer-implemented process for playing a free viewpoint video (FVV) of a scene, comprising:
using a client computing device to perform the following process actions:
inputting a request from an end user to display a FVV selection user interface screen that allows the end user to select a FVV available for playing;
displaying the FVV selection user interface screen on a display device;
inputting an end user FVV selection;
transmitting the end user FVV selection to a server via a data communication network;
receiving an instruction from the server via the data communication network that instructs the client computing device to instantiate end user controls appropriate for the type of FVV selected;
displaying a FVV control user interface on said display device;
monitoring for end user inputs via the FVV control user interface;
whenever an end user viewpoint navigation input is received via the FVV control user interface, transmitting the input to the server via the data communication network;
receiving FVV frames from the server via the data communication network, each FVV frame depicting at least a portion of the scene as it would be viewed from either an initial viewpoint if the end user has not yet input a viewpoint navigation input via the FVV control user interface or the last viewpoint the end user input using a viewpoint navigation input via the FVV control user interface; and
displaying each FVV frame on said display device as it is received.
18. The process of claim 17, further comprising process actions of:
whenever an end user temporal navigation input is received via the FVV control user interface, transmitting the input to the server via the data communication network, said temporal navigation input representing an instruction to provide FVV frames from a user-specified temporal location in the FVV; and
receiving a sequence of FVV frames from the server via the data communication network, the first of which corresponds to the last user-specified temporal location in the FVV.
19. The process of claim 17, further comprising the process actions of:
receiving with each FVV frame received from the server, one or more scene proxies which were used by the server to generate the received FVV frame; and
whenever a same-frame end user viewpoint navigation input is received via the FVV control user interface which represents an instruction to view a scene depicted in the last-displayed FVV frame from a different viewpoint,
generating a new FVV frame using the scene proxy or proxies received with the last-displayed frame, said new FVV frame depicting the scene depicted in the last-displayed FVV frame from a viewpoint specified in the same-frame end user viewpoint navigation input, and
displaying the new FVV frame on said display device.
20. The process of claim 17, wherein the FVV frames received from the server depict at least a portion of the scene as viewed from the current synthetic viewpoint of the scene that is larger than said display device is capable of displaying, and only a portion of each FVV frame is displayed on said display device, the process further comprising a process actions of, whenever a same-frame end user viewpoint navigation input is received via the FVV control user interface which represents an instruction to view a portion of the scene depicted in the last-received FVV frame that was not shown in the last-displayed portion of the frame, displaying at least the portion of the scene depicted in the last-received FVV frame specified in the same-frame end user viewpoint navigation input on said display device.
US13/588,917 2012-05-31 2012-08-17 Cloud based free viewpoint video streaming Abandoned US20130321586A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/588,917 US20130321586A1 (en) 2012-05-31 2012-08-17 Cloud based free viewpoint video streaming

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261653983P 2012-05-31 2012-05-31
US13/588,917 US20130321586A1 (en) 2012-05-31 2012-08-17 Cloud based free viewpoint video streaming

Publications (1)

Publication Number Publication Date
US20130321586A1 true US20130321586A1 (en) 2013-12-05

Family

ID=49669652

Family Applications (10)

Application Number Title Priority Date Filing Date
US13/566,877 Active 2034-02-16 US9846960B2 (en) 2012-05-31 2012-08-03 Automated camera array calibration
US13/588,917 Abandoned US20130321586A1 (en) 2012-05-31 2012-08-17 Cloud based free viewpoint video streaming
US13/598,536 Abandoned US20130321593A1 (en) 2012-05-31 2012-08-29 View frustum culling for free viewpoint video (fvv)
US13/599,263 Active 2033-02-25 US8917270B2 (en) 2012-05-31 2012-08-30 Video generation using three-dimensional hulls
US13/599,170 Abandoned US20130321396A1 (en) 2012-05-31 2012-08-30 Multi-input free viewpoint video processing pipeline
US13/599,436 Active 2034-05-03 US9251623B2 (en) 2012-05-31 2012-08-30 Glancing angle exclusion
US13/599,678 Abandoned US20130321566A1 (en) 2012-05-31 2012-08-30 Audio source positioning using a camera
US13/598,747 Abandoned US20130321575A1 (en) 2012-05-31 2012-08-30 High definition bubbles for rendering free viewpoint video
US13/614,852 Active 2033-10-29 US9256980B2 (en) 2012-05-31 2012-09-13 Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds
US13/790,158 Abandoned US20130321413A1 (en) 2012-05-31 2013-03-08 Video generation using convict hulls

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/566,877 Active 2034-02-16 US9846960B2 (en) 2012-05-31 2012-08-03 Automated camera array calibration

Family Applications After (8)

Application Number Title Priority Date Filing Date
US13/598,536 Abandoned US20130321593A1 (en) 2012-05-31 2012-08-29 View frustum culling for free viewpoint video (fvv)
US13/599,263 Active 2033-02-25 US8917270B2 (en) 2012-05-31 2012-08-30 Video generation using three-dimensional hulls
US13/599,170 Abandoned US20130321396A1 (en) 2012-05-31 2012-08-30 Multi-input free viewpoint video processing pipeline
US13/599,436 Active 2034-05-03 US9251623B2 (en) 2012-05-31 2012-08-30 Glancing angle exclusion
US13/599,678 Abandoned US20130321566A1 (en) 2012-05-31 2012-08-30 Audio source positioning using a camera
US13/598,747 Abandoned US20130321575A1 (en) 2012-05-31 2012-08-30 High definition bubbles for rendering free viewpoint video
US13/614,852 Active 2033-10-29 US9256980B2 (en) 2012-05-31 2012-09-13 Interpolating oriented disks in 3D space for constructing high fidelity geometric proxies from point clouds
US13/790,158 Abandoned US20130321413A1 (en) 2012-05-31 2013-03-08 Video generation using convict hulls

Country Status (1)

Country Link
US (10) US9846960B2 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150224648A1 (en) * 2014-02-13 2015-08-13 GM Global Technology Operations LLC Robotic system with 3d box location functionality
US9144905B1 (en) * 2013-03-13 2015-09-29 Hrl Laboratories, Llc Device and method to identify functional parts of tools for robotic manipulation
US9191643B2 (en) 2013-04-15 2015-11-17 Microsoft Technology Licensing, Llc Mixing infrared and color component data point clouds
US20160217760A1 (en) * 2015-01-22 2016-07-28 Microsoft Technology Licensing, Llc. Reconstructing viewport upon user viewpoint misprediction
US20160275987A1 (en) * 2015-03-17 2016-09-22 Thomson Licensing Method and apparatus for displaying light field video data
US20160381348A1 (en) * 2013-09-11 2016-12-29 Sony Corporation Image processing device and method
US9661312B2 (en) * 2015-01-22 2017-05-23 Microsoft Technology Licensing, Llc Synthesizing second eye viewport using interleaving
US20180034882A1 (en) * 2016-07-27 2018-02-01 R-Stor Inc. Method and apparatus for bonding communication technologies
US9980078B2 (en) 2016-10-14 2018-05-22 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US10079968B2 (en) 2012-12-01 2018-09-18 Qualcomm Incorporated Camera having additional functionality based on connectivity with a host device
US10165386B2 (en) 2017-05-16 2018-12-25 Nokia Technologies Oy VR audio superzoom
WO2019045473A1 (en) * 2017-08-30 2019-03-07 Samsung Electronics Co., Ltd. Method and apparatus for point-cloud streaming
US20190306651A1 (en) 2018-03-27 2019-10-03 Nokia Technologies Oy Audio Content Modification for Playback Audio
US20190311526A1 (en) * 2016-12-28 2019-10-10 Panasonic Intellectual Property Corporation Of America Three-dimensional model distribution method, three-dimensional model receiving method, three-dimensional model distribution device, and three-dimensional model receiving device
US10510111B2 (en) 2013-10-25 2019-12-17 Appliance Computing III, Inc. Image-based rendering of real spaces
US10531219B2 (en) 2017-03-20 2020-01-07 Nokia Technologies Oy Smooth rendering of overlapping audio-object interactions
US10554713B2 (en) 2015-06-19 2020-02-04 Microsoft Technology Licensing, Llc Low latency application streaming using temporal frame transformation
US10726574B2 (en) * 2017-04-11 2020-07-28 Dolby Laboratories Licensing Corporation Passive multi-wearable-devices tracking
US11074036B2 (en) 2017-05-05 2021-07-27 Nokia Technologies Oy Metadata-free audio-object interactions
US11096004B2 (en) 2017-01-23 2021-08-17 Nokia Technologies Oy Spatial audio rendering point extension
US11315326B2 (en) * 2019-10-15 2022-04-26 At&T Intellectual Property I, L.P. Extended reality anchor caching based on viewport prediction
US11395087B2 (en) 2017-09-29 2022-07-19 Nokia Technologies Oy Level-based audio-object interactions
US11430412B2 (en) * 2017-12-19 2022-08-30 Sony Interactive Entertainment Inc. Freely selected point of view image generating apparatus, reference image data generating apparatus, freely selected point of view image generating method, and reference image data generating method
US20220368858A1 (en) * 2015-08-14 2022-11-17 Pcms Holdings, Inc. System and method for augmented reality multi-view telepresence
US11632489B2 (en) 2017-01-31 2023-04-18 Tetavi, Ltd. System and method for rendering free viewpoint video for studio applications
US20230393800A1 (en) * 2022-03-01 2023-12-07 Tencent Technology (Shenzhen) Company Limited Online meeting interface display method and apparatus, medium, and computer program product
EP4503602A3 (en) * 2014-09-03 2025-04-16 Nevermind Capital LLC Methods and apparatus for capturing, streaming and/or playing back content

Families Citing this family (238)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5001286B2 (en) * 2005-10-11 2012-08-15 プライム センス リミティド Object reconstruction method and system
US8866920B2 (en) 2008-05-20 2014-10-21 Pelican Imaging Corporation Capturing and processing of images using monolithic camera array with heterogeneous imagers
US11792538B2 (en) 2008-05-20 2023-10-17 Adeia Imaging Llc Capturing and processing of images including occlusions focused on an image sensor by a lens stack array
US20150373153A1 (en) * 2010-06-30 2015-12-24 Primal Space Systems, Inc. System and method to reduce bandwidth requirement for visibility event packet streaming using a predicted maximal view frustum and predicted maximal viewpoint extent, each computed at runtime
US9892546B2 (en) * 2010-06-30 2018-02-13 Primal Space Systems, Inc. Pursuit path camera model method and system
US8878950B2 (en) 2010-12-14 2014-11-04 Pelican Imaging Corporation Systems and methods for synthesizing high resolution images using super-resolution processes
US9129183B2 (en) 2011-09-28 2015-09-08 Pelican Imaging Corporation Systems and methods for encoding light field image files
US9001960B2 (en) * 2012-01-04 2015-04-07 General Electric Company Method and apparatus for reducing noise-related imaging artifacts
US9300841B2 (en) * 2012-06-25 2016-03-29 Yoldas Askan Method of generating a smooth image from point cloud data
EP3869797B1 (en) 2012-08-21 2023-07-19 Adeia Imaging LLC Method for depth detection in images captured using array cameras
US9519968B2 (en) * 2012-12-13 2016-12-13 Hewlett-Packard Development Company, L.P. Calibrating visual sensors using homography operators
US9224227B2 (en) * 2012-12-21 2015-12-29 Nvidia Corporation Tile shader for screen space, a method of rendering and a graphics processing unit employing the tile shader
US8866912B2 (en) 2013-03-10 2014-10-21 Pelican Imaging Corporation System and methods for calibration of an array camera using a single captured image
US9578259B2 (en) 2013-03-14 2017-02-21 Fotonation Cayman Limited Systems and methods for reducing motion blur in images or video in ultra low light with array cameras
US9445003B1 (en) * 2013-03-15 2016-09-13 Pelican Imaging Corporation Systems and methods for synthesizing high resolution images using image deconvolution based on motion and depth information
JP6304240B2 (en) * 2013-04-04 2018-04-04 ソニー株式会社 Display control apparatus, display control method, and program
US10262462B2 (en) 2014-04-18 2019-04-16 Magic Leap, Inc. Systems and methods for augmented and virtual reality
US9208609B2 (en) * 2013-07-01 2015-12-08 Mitsubishi Electric Research Laboratories, Inc. Method for fitting primitive shapes to 3D point clouds using distance fields
CN105308953A (en) * 2013-07-19 2016-02-03 谷歌技术控股有限责任公司 Asymmetric sensor array for capturing images
US10140751B2 (en) * 2013-08-08 2018-11-27 Imagination Technologies Limited Normal offset smoothing
CN104424655A (en) * 2013-09-10 2015-03-18 鸿富锦精密工业(深圳)有限公司 System and method for reconstructing point cloud curved surface
US9286718B2 (en) * 2013-09-27 2016-03-15 Ortery Technologies, Inc. Method using 3D geometry data for virtual reality image presentation and control in 3D space
US10591969B2 (en) 2013-10-25 2020-03-17 Google Technology Holdings LLC Sensor-based near-field communication authentication
US9888333B2 (en) * 2013-11-11 2018-02-06 Google Technology Holdings LLC Three-dimensional audio rendering techniques
US10119808B2 (en) 2013-11-18 2018-11-06 Fotonation Limited Systems and methods for estimating depth from projected texture using camera arrays
EP3075140B1 (en) 2013-11-26 2018-06-13 FotoNation Cayman Limited Array camera configurations incorporating multiple constituent array cameras
EP2881918B1 (en) * 2013-12-06 2018-02-07 My Virtual Reality Software AS Method for visualizing three-dimensional data
US9530226B2 (en) * 2014-02-18 2016-12-27 Par Technology Corporation Systems and methods for optimizing N dimensional volume data for transmission
US10241616B2 (en) 2014-02-28 2019-03-26 Hewlett-Packard Development Company, L.P. Calibration of sensors and projector
US9396586B2 (en) 2014-03-14 2016-07-19 Matterport, Inc. Processing and/or transmitting 3D data
US9332285B1 (en) * 2014-05-28 2016-05-03 Lucasfilm Entertainment Company Ltd. Switching modes of a media content item
CN104089628B (en) * 2014-06-30 2017-02-08 中国科学院光电研究院 Self-adaption geometric calibration method of light field camera
US11051000B2 (en) 2014-07-14 2021-06-29 Mitsubishi Electric Research Laboratories, Inc. Method for calibrating cameras with non-overlapping views
US10169909B2 (en) * 2014-08-07 2019-01-01 Pixar Generating a volumetric projection for an object
US11205305B2 (en) 2014-09-22 2021-12-21 Samsung Electronics Company, Ltd. Presentation of three-dimensional video
US10750153B2 (en) 2014-09-22 2020-08-18 Samsung Electronics Company, Ltd. Camera system for three-dimensional video
KR20170063827A (en) 2014-09-29 2017-06-08 포토네이션 케이맨 리미티드 Systems and methods for dynamic calibration of array cameras
US9600892B2 (en) * 2014-11-06 2017-03-21 Symbol Technologies, Llc Non-parametric method of and system for estimating dimensions of objects of arbitrary shape
US10154246B2 (en) * 2014-11-20 2018-12-11 Cappasity Inc. Systems and methods for 3D capturing of objects and motion sequences using multiple range and RGB cameras
US9396554B2 (en) 2014-12-05 2016-07-19 Symbol Technologies, Llc Apparatus for and method of estimating dimensions of an object associated with a code in automatic response to reading the code
DE102014118989A1 (en) * 2014-12-18 2016-06-23 Connaught Electronics Ltd. Method for calibrating a camera system, camera system and motor vehicle
US11019330B2 (en) * 2015-01-19 2021-05-25 Aquifi, Inc. Multiple camera system with auto recalibration
EP3780589A1 (en) * 2015-02-03 2021-02-17 Dolby Laboratories Licensing Corporation Post-conference playback system having higher perceived quality than originally heard in the conference
US10397538B2 (en) * 2015-03-01 2019-08-27 Nextvr Inc. Methods and apparatus for supporting content generation, transmission and/or playback
US10878278B1 (en) * 2015-05-16 2020-12-29 Sturfee, Inc. Geo-localization based on remotely sensed visual features
EP3308361B1 (en) * 2015-06-11 2024-04-10 Continental Autonomous Mobility Germany GmbH Method for generating a virtual image of vehicle surroundings
US9460513B1 (en) 2015-06-17 2016-10-04 Mitsubishi Electric Research Laboratories, Inc. Method for reconstructing a 3D scene as a 3D model using images acquired by 3D sensors and omnidirectional cameras
KR101835434B1 (en) * 2015-07-08 2018-03-09 고려대학교 산학협력단 Method and Apparatus for generating a protection image, Method for mapping between image pixel and depth value
US9848212B2 (en) * 2015-07-10 2017-12-19 Futurewei Technologies, Inc. Multi-view video streaming with fast and smooth view switch
GB2543776B (en) * 2015-10-27 2019-02-06 Imagination Tech Ltd Systems and methods for processing images of objects
US10812778B1 (en) 2015-11-09 2020-10-20 Cognex Corporation System and method for calibrating one or more 3D sensors mounted on a moving manipulator
US11562502B2 (en) * 2015-11-09 2023-01-24 Cognex Corporation System and method for calibrating a plurality of 3D sensors with respect to a motion conveyance
US20180374239A1 (en) * 2015-11-09 2018-12-27 Cognex Corporation System and method for field calibration of a vision system imaging two opposite sides of a calibration object
US10757394B1 (en) * 2015-11-09 2020-08-25 Cognex Corporation System and method for calibrating a plurality of 3D sensors with respect to a motion conveyance
WO2017100487A1 (en) * 2015-12-11 2017-06-15 Jingyi Yu Method and system for image-based image rendering using a multi-camera and depth camera array
US10352689B2 (en) 2016-01-28 2019-07-16 Symbol Technologies, Llc Methods and systems for high precision locationing with depth values
US10145955B2 (en) 2016-02-04 2018-12-04 Symbol Technologies, Llc Methods and systems for processing point-cloud data with a line scanner
KR20170095030A (en) * 2016-02-12 2017-08-22 삼성전자주식회사 Scheme for supporting virtual reality content display in communication system
CN107097698B (en) * 2016-02-22 2021-10-01 福特环球技术公司 Inflatable airbag system for vehicle seat, seat assembly and adjustment method thereof
JP6987797B2 (en) 2016-03-11 2022-01-05 カールタ インコーポレイテッド Laser scanner with real-time online egomotion estimation
US11567201B2 (en) 2016-03-11 2023-01-31 Kaarta, Inc. Laser scanner with real-time, online ego-motion estimation
US11573325B2 (en) 2016-03-11 2023-02-07 Kaarta, Inc. Systems and methods for improvements in scanning and mapping
US10989542B2 (en) 2016-03-11 2021-04-27 Kaarta, Inc. Aligning measured signal data with slam localization data and uses thereof
US10721451B2 (en) 2016-03-23 2020-07-21 Symbol Technologies, Llc Arrangement for, and method of, loading freight into a shipping container
CA2961921C (en) 2016-03-29 2020-05-12 Institut National D'optique Camera calibration method using a calibration target
US10762712B2 (en) 2016-04-01 2020-09-01 Pcms Holdings, Inc. Apparatus and method for supporting interactive augmented reality functionalities
US9805240B1 (en) 2016-04-18 2017-10-31 Symbol Technologies, Llc Barcode scanning and dimensioning
CN107341768B (en) * 2016-04-29 2022-03-11 微软技术许可有限责任公司 Grid noise reduction
WO2017197114A1 (en) 2016-05-11 2017-11-16 Affera, Inc. Anatomical model generation
EP3455756B1 (en) 2016-05-12 2025-04-23 Affera, Inc. Anatomical model controlling
EP3264759A1 (en) 2016-06-30 2018-01-03 Thomson Licensing An apparatus and a method for generating data representative of a pixel beam
US10192345B2 (en) * 2016-07-19 2019-01-29 Qualcomm Incorporated Systems and methods for improved surface normal estimation
US10574909B2 (en) 2016-08-08 2020-02-25 Microsoft Technology Licensing, Llc Hybrid imaging sensor for structured light object capture
US10776661B2 (en) 2016-08-19 2020-09-15 Symbol Technologies, Llc Methods, systems and apparatus for segmenting and dimensioning objects
US10229533B2 (en) * 2016-11-03 2019-03-12 Mitsubishi Electric Research Laboratories, Inc. Methods and systems for fast resampling method and apparatus for point cloud data
US11042161B2 (en) 2016-11-16 2021-06-22 Symbol Technologies, Llc Navigation control method and apparatus in a mobile automation system
US10451405B2 (en) 2016-11-22 2019-10-22 Symbol Technologies, Llc Dimensioning system for, and method of, dimensioning freight in motion along an unconstrained path in a venue
JP6948171B2 (en) * 2016-11-30 2021-10-13 キヤノン株式会社 Image processing equipment and image processing methods, programs
WO2018100928A1 (en) 2016-11-30 2018-06-07 キヤノン株式会社 Image processing device and method
EP3336801A1 (en) * 2016-12-19 2018-06-20 Thomson Licensing Method and apparatus for constructing lighting environment representations of 3d scenes
US10354411B2 (en) 2016-12-20 2019-07-16 Symbol Technologies, Llc Methods, systems and apparatus for segmenting objects
JP7159057B2 (en) * 2017-02-10 2022-10-24 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Free-viewpoint video generation method and free-viewpoint video generation system
JP7086522B2 (en) * 2017-02-28 2022-06-20 キヤノン株式会社 Image processing equipment, information processing methods and programs
WO2018172614A1 (en) 2017-03-22 2018-09-27 Nokia Technologies Oy A method and an apparatus and a computer program product for adaptive streaming
JP6922369B2 (en) * 2017-04-14 2021-08-18 富士通株式会社 Viewpoint selection support program, viewpoint selection support method and viewpoint selection support device
US10939038B2 (en) * 2017-04-24 2021-03-02 Intel Corporation Object pre-encoding for 360-degree view for optimal quality and latency
DE112018002314T5 (en) 2017-05-01 2020-01-23 Symbol Technologies, Llc METHOD AND DEVICE FOR DETECTING AN OBJECT STATUS
US11093896B2 (en) 2017-05-01 2021-08-17 Symbol Technologies, Llc Product status detection system
US10949798B2 (en) 2017-05-01 2021-03-16 Symbol Technologies, Llc Multimodal localization and mapping for a mobile automation apparatus
US10726273B2 (en) 2017-05-01 2020-07-28 Symbol Technologies, Llc Method and apparatus for shelf feature and object placement detection from shelf images
US10663590B2 (en) 2017-05-01 2020-05-26 Symbol Technologies, Llc Device and method for merging lidar data
US11367092B2 (en) 2017-05-01 2022-06-21 Symbol Technologies, Llc Method and apparatus for extracting and processing price text from an image set
US10591918B2 (en) 2017-05-01 2020-03-17 Symbol Technologies, Llc Fixed segmented lattice planning for a mobile automation apparatus
US11449059B2 (en) 2017-05-01 2022-09-20 Symbol Technologies, Llc Obstacle detection for a mobile automation apparatus
US11600084B2 (en) 2017-05-05 2023-03-07 Symbol Technologies, Llc Method and apparatus for detecting and interpreting price label text
CN108881784B (en) * 2017-05-12 2020-07-03 腾讯科技(深圳)有限公司 Virtual scene implementation method and device, terminal and server
US10154176B1 (en) * 2017-05-30 2018-12-11 Intel Corporation Calibrating depth cameras using natural objects with expected shapes
KR102376948B1 (en) * 2017-06-07 2022-03-21 구글 엘엘씨 High-speed, high-performance face tracking
EP3635949B1 (en) 2017-06-09 2025-08-27 InterDigital VC Holdings, Inc. Spatially faithful telepresence supporting varying geometries and moving users
BR102017012517A2 (en) * 2017-06-12 2018-12-26 Samsung Eletrônica da Amazônia Ltda. method for 360 ° media display or bubble interface
CN110832553B (en) 2017-06-29 2024-05-14 索尼公司 Image processing apparatus and image processing method
JP6948175B2 (en) * 2017-07-06 2021-10-13 キヤノン株式会社 Image processing device and its control method
US11049218B2 (en) 2017-08-11 2021-06-29 Samsung Electronics Company, Ltd. Seamless image stitching
EP3669330B1 (en) 2017-08-15 2025-07-02 Nokia Technologies Oy Encoding and decoding of volumetric video
WO2019034807A1 (en) 2017-08-15 2019-02-21 Nokia Technologies Oy Sequential encoding and decoding of volymetric video
JP6409107B1 (en) 2017-09-06 2018-10-17 キヤノン株式会社 Information processing apparatus, information processing method, and program
US10572763B2 (en) 2017-09-07 2020-02-25 Symbol Technologies, Llc Method and apparatus for support surface edge detection
US10521914B2 (en) 2017-09-07 2019-12-31 Symbol Technologies, Llc Multi-sensor object recognition system and method
US10861196B2 (en) 2017-09-14 2020-12-08 Apple Inc. Point cloud compression
US11818401B2 (en) 2017-09-14 2023-11-14 Apple Inc. Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables
US10897269B2 (en) 2017-09-14 2021-01-19 Apple Inc. Hierarchical point cloud compression
US11113845B2 (en) 2017-09-18 2021-09-07 Apple Inc. Point cloud compression using non-cubic projections and masks
US10909725B2 (en) 2017-09-18 2021-02-02 Apple Inc. Point cloud compression
JP6433559B1 (en) * 2017-09-19 2018-12-05 キヤノン株式会社 Providing device, providing method, and program
CN107610182B (en) * 2017-09-22 2018-09-11 哈尔滨工业大学 A kind of scaling method at light-field camera microlens array center
JP6425780B1 (en) * 2017-09-22 2018-11-21 キヤノン株式会社 Image processing system, image processing apparatus, image processing method and program
EP3467777A1 (en) * 2017-10-06 2019-04-10 Thomson Licensing A method and apparatus for encoding/decoding the colors of a point cloud representing a 3d object
WO2019099605A1 (en) 2017-11-17 2019-05-23 Kaarta, Inc. Methods and systems for geo-referencing mapping systems
US10607373B2 (en) 2017-11-22 2020-03-31 Apple Inc. Point cloud compression with closed-loop color conversion
US10951879B2 (en) 2017-12-04 2021-03-16 Canon Kabushiki Kaisha Method, system and apparatus for capture of image data for free viewpoint video
KR102334070B1 (en) 2018-01-18 2021-12-03 삼성전자주식회사 Electric apparatus and method for control thereof
US11158124B2 (en) 2018-01-30 2021-10-26 Gaia3D, Inc. Method of providing 3D GIS web service
US10417806B2 (en) * 2018-02-15 2019-09-17 JJK Holdings, LLC Dynamic local temporal-consistent textured mesh compression
JP2019144958A (en) * 2018-02-22 2019-08-29 キヤノン株式会社 Image processing device, image processing method, and program
WO2019165194A1 (en) 2018-02-23 2019-08-29 Kaarta, Inc. Methods and systems for processing and colorizing point clouds and meshes
WO2019195270A1 (en) 2018-04-03 2019-10-10 Kaarta, Inc. Methods and systems for real or near real-time point cloud map data confidence evaluation
US11308577B2 (en) * 2018-04-04 2022-04-19 Sony Interactive Entertainment Inc. Reference image generation apparatus, display image generation apparatus, reference image generation method, and display image generation method
US10740911B2 (en) 2018-04-05 2020-08-11 Symbol Technologies, Llc Method, system and apparatus for correcting translucency artifacts in data representing a support structure
US11327504B2 (en) 2018-04-05 2022-05-10 Symbol Technologies, Llc Method, system and apparatus for mobile automation apparatus localization
US10809078B2 (en) 2018-04-05 2020-10-20 Symbol Technologies, Llc Method, system and apparatus for dynamic path generation
US10823572B2 (en) 2018-04-05 2020-11-03 Symbol Technologies, Llc Method, system and apparatus for generating navigational data
US10832436B2 (en) 2018-04-05 2020-11-10 Symbol Technologies, Llc Method, system and apparatus for recovering label positions
US10939129B2 (en) 2018-04-10 2021-03-02 Apple Inc. Point cloud compression
US10909727B2 (en) 2018-04-10 2021-02-02 Apple Inc. Hierarchical point cloud compression with smoothing
US10909726B2 (en) 2018-04-10 2021-02-02 Apple Inc. Point cloud compression
US11010928B2 (en) 2018-04-10 2021-05-18 Apple Inc. Adaptive distance based point cloud compression
US10867414B2 (en) 2018-04-10 2020-12-15 Apple Inc. Point cloud attribute transfer algorithm
US11017566B1 (en) 2018-07-02 2021-05-25 Apple Inc. Point cloud compression with adaptive filtering
US11202098B2 (en) 2018-07-05 2021-12-14 Apple Inc. Point cloud compression with multi-resolution video encoding
WO2020009826A1 (en) 2018-07-05 2020-01-09 Kaarta, Inc. Methods and systems for auto-leveling of point clouds and 3d models
US11012713B2 (en) 2018-07-12 2021-05-18 Apple Inc. Bit stream structure for compressed point cloud data
US11367224B2 (en) 2018-10-02 2022-06-21 Apple Inc. Occupancy map block-to-patch information compression
US11430155B2 (en) 2018-10-05 2022-08-30 Apple Inc. Quantized depths for projection point cloud compression
US11010920B2 (en) 2018-10-05 2021-05-18 Zebra Technologies Corporation Method, system and apparatus for object detection in point clouds
US11506483B2 (en) 2018-10-05 2022-11-22 Zebra Technologies Corporation Method, system and apparatus for support structure depth determination
US10972835B2 (en) * 2018-11-01 2021-04-06 Sennheiser Electronic Gmbh & Co. Kg Conference system with a microphone array system and a method of speech acquisition in a conference system
US11090811B2 (en) 2018-11-13 2021-08-17 Zebra Technologies Corporation Method and apparatus for labeling of support structures
US11003188B2 (en) 2018-11-13 2021-05-11 Zebra Technologies Corporation Method, system and apparatus for obstacle handling in navigational path generation
WO2020103040A1 (en) * 2018-11-21 2020-05-28 Boe Technology Group Co., Ltd. A method for generating and displaying panorama images based on rendering engine and a display apparatus
US11079240B2 (en) 2018-12-07 2021-08-03 Zebra Technologies Corporation Method, system and apparatus for adaptive particle filter localization
CN109618122A (en) * 2018-12-07 2019-04-12 合肥万户网络技术有限公司 A kind of virtual office conference system
US11416000B2 (en) 2018-12-07 2022-08-16 Zebra Technologies Corporation Method and apparatus for navigational ray tracing
US11100303B2 (en) 2018-12-10 2021-08-24 Zebra Technologies Corporation Method, system and apparatus for auxiliary label detection and association
US11015938B2 (en) 2018-12-12 2021-05-25 Zebra Technologies Corporation Method, system and apparatus for navigational assistance
US11423572B2 (en) 2018-12-12 2022-08-23 Analog Devices, Inc. Built-in calibration of time-of-flight depth imaging systems
KR20210096285A (en) * 2018-12-13 2021-08-04 삼성전자주식회사 Method, apparatus and computer readable recording medium for compressing 3D mesh content
US10731970B2 (en) 2018-12-13 2020-08-04 Zebra Technologies Corporation Method, system and apparatus for support structure detection
US10818077B2 (en) 2018-12-14 2020-10-27 Canon Kabushiki Kaisha Method, system and apparatus for controlling a virtual camera
CA3028708A1 (en) 2018-12-28 2020-06-28 Zih Corp. Method, system and apparatus for dynamic loop closure in mapping trajectories
US12268456B2 (en) 2019-01-23 2025-04-08 Affera, Inc. Systems and methods for therapy annotation
JP7211835B2 (en) * 2019-02-04 2023-01-24 i-PRO株式会社 IMAGING SYSTEM AND SYNCHRONIZATION CONTROL METHOD
US11368661B2 (en) * 2019-02-14 2022-06-21 Peking University Shenzhen Graduate School Image synthesis method, apparatus and device for free-viewpoint
JP6647433B1 (en) * 2019-02-19 2020-02-14 株式会社メディア工房 Point cloud data communication system, point cloud data transmission device, and point cloud data transmission method
US10797090B2 (en) 2019-02-27 2020-10-06 Semiconductor Components Industries, Llc Image sensor with near-infrared and visible light phase detection pixels
US11037365B2 (en) 2019-03-07 2021-06-15 Alibaba Group Holding Limited Method, apparatus, medium, terminal, and device for processing multi-angle free-perspective data
US11057564B2 (en) 2019-03-28 2021-07-06 Apple Inc. Multiple layer flexure for supporting a moving image sensor
JP7479793B2 (en) * 2019-04-11 2024-05-09 キヤノン株式会社 Image processing device, system for generating virtual viewpoint video, and method and program for controlling the image processing device
US11402846B2 (en) 2019-06-03 2022-08-02 Zebra Technologies Corporation Method, system and apparatus for mitigating data capture light leakage
US11960286B2 (en) 2019-06-03 2024-04-16 Zebra Technologies Corporation Method, system and apparatus for dynamic task sequencing
US11662739B2 (en) 2019-06-03 2023-05-30 Zebra Technologies Corporation Method, system and apparatus for adaptive ceiling-based localization
US11151743B2 (en) 2019-06-03 2021-10-19 Zebra Technologies Corporation Method, system and apparatus for end of aisle detection
US11200677B2 (en) 2019-06-03 2021-12-14 Zebra Technologies Corporation Method, system and apparatus for shelf edge detection
US11341663B2 (en) 2019-06-03 2022-05-24 Zebra Technologies Corporation Method, system and apparatus for detecting support structure obstructions
US11080566B2 (en) 2019-06-03 2021-08-03 Zebra Technologies Corporation Method, system and apparatus for gap detection in support structures with peg regions
US11711544B2 (en) 2019-07-02 2023-07-25 Apple Inc. Point cloud compression with supplemental information messages
CN110624220B (en) * 2019-09-04 2021-05-04 福建师范大学 How to Obtain the Optimal Standing Long Jump Technical Template
BR112022004811A2 (en) 2019-09-17 2022-06-21 Boston Polarimetrics Inc Systems and methods for surface modeling using polarization indications
US11627314B2 (en) 2019-09-27 2023-04-11 Apple Inc. Video-based point cloud compression with non-normative smoothing
US11562507B2 (en) 2019-09-27 2023-01-24 Apple Inc. Point cloud compression using video encoding with time consistent patches
CN114450719B (en) 2019-09-30 2025-05-09 Oppo广东移动通信有限公司 Human body model reconstruction method, reconstruction system and storage medium
US11538196B2 (en) 2019-10-02 2022-12-27 Apple Inc. Predictive coding for point cloud compression
US11895307B2 (en) 2019-10-04 2024-02-06 Apple Inc. Block-based predictive coding for point cloud compression
US12099148B2 (en) 2019-10-07 2024-09-24 Intrinsic Innovation Llc Systems and methods for surface normals sensing with polarization
US12058510B2 (en) * 2019-10-18 2024-08-06 Sphere Entertainment Group, Llc Mapping audio to visual images on a display device having a curved screen
US11202162B2 (en) 2019-10-18 2021-12-14 Msg Entertainment Group, Llc Synthesizing audio of a venue
CN110769241B (en) * 2019-11-05 2022-02-01 广州虎牙科技有限公司 Video frame processing method and device, user side and storage medium
US11302012B2 (en) 2019-11-30 2022-04-12 Boston Polarimetrics, Inc. Systems and methods for transparent object segmentation using polarization cues
US11507103B2 (en) 2019-12-04 2022-11-22 Zebra Technologies Corporation Method, system and apparatus for localization-based historical obstacle handling
US11107238B2 (en) 2019-12-13 2021-08-31 Zebra Technologies Corporation Method, system and apparatus for detecting item facings
US11734873B2 (en) 2019-12-13 2023-08-22 Sony Group Corporation Real-time volumetric visualization of 2-D images
US12309500B2 (en) 2019-12-13 2025-05-20 Sony Group Corporation Trans-spectral feature detection for volumetric image alignment and colorization
US11798196B2 (en) 2020-01-08 2023-10-24 Apple Inc. Video-based point cloud compression with predicted patches
US11475605B2 (en) 2020-01-09 2022-10-18 Apple Inc. Geometry encoding of duplicate points
KR20220132620A (en) 2020-01-29 2022-09-30 인트린식 이노베이션 엘엘씨 Systems and methods for characterizing object pose detection and measurement systems
JP7542070B2 (en) 2020-01-30 2024-08-29 イントリンジック イノベーション エルエルシー Systems and methods for synthesizing data for training statistical models across different imaging modalities, including polarization images - Patents.com
US11240465B2 (en) 2020-02-21 2022-02-01 Alibaba Group Holding Limited System and method to use decoder information in video super resolution
US11430179B2 (en) * 2020-02-24 2022-08-30 Microsoft Technology Licensing, Llc Depth buffer dilation for remote rendering
US11822333B2 (en) 2020-03-30 2023-11-21 Zebra Technologies Corporation Method, system and apparatus for data capture illumination control
US11700353B2 (en) * 2020-04-06 2023-07-11 Eingot Llc Integration of remote audio into a performance venue
WO2021243088A1 (en) 2020-05-27 2021-12-02 Boston Polarimetrics, Inc. Multi-aperture polarization optical systems using beam splitters
US11776205B2 (en) * 2020-06-09 2023-10-03 Ptc Inc. Determination of interactions with predefined volumes of space based on automated analysis of volumetric video
US11615557B2 (en) 2020-06-24 2023-03-28 Apple Inc. Point cloud compression using octrees with slicing
US11620768B2 (en) 2020-06-24 2023-04-04 Apple Inc. Point cloud geometry compression using octrees with multiple scan orders
US11450024B2 (en) 2020-07-17 2022-09-20 Zebra Technologies Corporation Mixed depth object detection
US11875452B2 (en) * 2020-08-18 2024-01-16 Qualcomm Incorporated Billboard layers in object-space rendering
US11748918B1 (en) * 2020-09-25 2023-09-05 Apple Inc. Synthesized camera arrays for rendering novel viewpoints
US12026833B2 (en) * 2020-10-08 2024-07-02 Google Llc Few-shot synthesis of talking heads
US11593915B2 (en) 2020-10-21 2023-02-28 Zebra Technologies Corporation Parallax-tolerant panoramic image generation
US11392891B2 (en) 2020-11-03 2022-07-19 Zebra Technologies Corporation Item placement detection and optimization in material handling systems
US11847832B2 (en) 2020-11-11 2023-12-19 Zebra Technologies Corporation Object classification for autonomous navigation systems
US11527014B2 (en) * 2020-11-24 2022-12-13 Verizon Patent And Licensing Inc. Methods and systems for calibrating surface data capture devices
US11874415B2 (en) * 2020-12-22 2024-01-16 International Business Machines Corporation Earthquake detection and response via distributed visual input
US11703457B2 (en) * 2020-12-29 2023-07-18 Industrial Technology Research Institute Structure diagnosis system and structure diagnosis method
JP7632488B2 (en) * 2021-01-12 2025-02-19 ソニーグループ株式会社 Server device and network control method
US12020455B2 (en) 2021-03-10 2024-06-25 Intrinsic Innovation Llc Systems and methods for high dynamic range image reconstruction
US12069227B2 (en) 2021-03-10 2024-08-20 Intrinsic Innovation Llc Multi-modal and multi-spectral stereo camera arrays
US11651538B2 (en) * 2021-03-17 2023-05-16 International Business Machines Corporation Generating 3D videos from 2D models
US11948338B1 (en) 2021-03-29 2024-04-02 Apple Inc. 3D volumetric content encoding using 2D videos and simplified 3D meshes
US11290658B1 (en) 2021-04-15 2022-03-29 Boston Polarimetrics, Inc. Systems and methods for camera exposure control
US11954886B2 (en) 2021-04-15 2024-04-09 Intrinsic Innovation Llc Systems and methods for six-degree of freedom pose estimation of deformable objects
US12067746B2 (en) 2021-05-07 2024-08-20 Intrinsic Innovation Llc Systems and methods for using computer vision to pick up small objects
US11954882B2 (en) 2021-06-17 2024-04-09 Zebra Technologies Corporation Feature-based georegistration for mobile computing devices
US12175741B2 (en) 2021-06-22 2024-12-24 Intrinsic Innovation Llc Systems and methods for a vision guided end effector
US12340538B2 (en) 2021-06-25 2025-06-24 Intrinsic Innovation Llc Systems and methods for generating and using visual datasets for training computer vision models
US12172310B2 (en) 2021-06-29 2024-12-24 Intrinsic Innovation Llc Systems and methods for picking objects using 3-D geometry and segmentation
US11689813B2 (en) 2021-07-01 2023-06-27 Intrinsic Innovation Llc Systems and methods for high dynamic range imaging using crossed polarizers
US12293535B2 (en) 2021-08-03 2025-05-06 Intrinsic Innovation Llc Systems and methods for training pose estimators in computer vision
CN113761238B (en) * 2021-08-27 2022-08-23 广州文远知行科技有限公司 Point cloud storage method, device, equipment and storage medium
US11823319B2 (en) 2021-09-02 2023-11-21 Nvidia Corporation Techniques for rendering signed distance functions
US12254556B2 (en) 2021-09-02 2025-03-18 Nvidia Corporation Techniques for rendering signed distance functions
CN113905221B (en) * 2021-09-30 2024-01-16 福州大学 Stereoscopic panoramic video asymmetric transport stream self-adaption method and system
CN114355287B (en) * 2022-01-04 2023-08-15 湖南大学 Ultra-short baseline underwater sound distance measurement method and system
US20250164412A1 (en) * 2022-02-17 2025-05-22 Nutech Ventures Single-Pass 3D Reconstruction of Internal Surface of Pipelines Using Depth Camera Array
CN116800947A (en) * 2022-03-16 2023-09-22 安霸国际有限合伙企业 Rapid RGB-IR calibration verification for mass production process
CN119866511A (en) * 2022-07-01 2025-04-22 谷歌有限责任公司 Three-dimensional video bright spot from camera source
US12277733B2 (en) * 2022-12-05 2025-04-15 Verizon Patent And Licensing Inc. Calibration methods and systems for an under-calibrated camera capturing a scene
WO2024144805A1 (en) * 2022-12-29 2024-07-04 Innopeak Technology, Inc. Methods and systems for image processing with eye gaze redirection
GB2638245A (en) * 2024-02-16 2025-08-20 Murrell Richard Telepresence system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030038892A1 (en) * 2001-08-09 2003-02-27 Sidney Wang Enhancing broadcast of an event with synthetic scene using a depth map
US20050286759A1 (en) * 2004-06-28 2005-12-29 Microsoft Corporation Interactive viewpoint video system and process employing overlapping images of a scene captured from viewpoints forming a grid
US20090315978A1 (en) * 2006-06-02 2009-12-24 Eidgenossische Technische Hochschule Zurich Method and system for generating a 3d representation of a dynamically changing 3d scene
US20100026712A1 (en) * 2008-07-31 2010-02-04 Stmicroelectronics S.R.L. Method and system for video rendering, computer program product therefor
US20130286204A1 (en) * 2012-04-30 2013-10-31 Convoy Technologies Corp. Motor vehicle camera and monitoring system

Family Cites Families (104)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5602903A (en) 1994-09-28 1997-02-11 Us West Technologies, Inc. Positioning system and method
US6327381B1 (en) 1994-12-29 2001-12-04 Worldscape, Llc Image transformation and synthesis methods
US5850352A (en) 1995-03-31 1998-12-15 The Regents Of The University Of California Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized, including panoramic, scene interactive and stereoscopic images
JP3461980B2 (en) 1995-08-25 2003-10-27 株式会社東芝 High-speed drawing method and apparatus
US6163337A (en) 1996-04-05 2000-12-19 Matsushita Electric Industrial Co., Ltd. Multi-view point image transmission method and multi-view point image display method
US5926400A (en) 1996-11-21 1999-07-20 Intel Corporation Apparatus and method for determining the intensity of a sound in a virtual world
US6064771A (en) 1997-06-23 2000-05-16 Real-Time Geometry Corp. System and method for asynchronous, adaptive moving picture compression, and decompression
US6072496A (en) 1998-06-08 2000-06-06 Microsoft Corporation Method and system for capturing and representing 3D geometry, color and shading of facial expressions and other animated objects
US6226003B1 (en) 1998-08-11 2001-05-01 Silicon Graphics, Inc. Method for rendering silhouette and true edges of 3-D line drawings with occlusion
US6556199B1 (en) 1999-08-11 2003-04-29 Advanced Research And Technology Institute Method and apparatus for fast voxelization of volumetric models
US6509902B1 (en) 2000-02-28 2003-01-21 Mitsubishi Electric Research Laboratories, Inc. Texture filtering for surface elements
US7522186B2 (en) 2000-03-07 2009-04-21 L-3 Communications Corporation Method and apparatus for providing immersive surveillance
US6968299B1 (en) 2000-04-14 2005-11-22 International Business Machines Corporation Method and apparatus for reconstructing a surface using a ball-pivoting algorithm
US6750873B1 (en) 2000-06-27 2004-06-15 International Business Machines Corporation High quality texture reconstruction from multiple scans
US7538764B2 (en) 2001-01-05 2009-05-26 Interuniversitair Micro-Elektronica Centrum (Imec) System and method to obtain surface structures of multi-dimensional objects, and to represent those surface structures for animation, transmission and display
US6919906B2 (en) 2001-05-08 2005-07-19 Microsoft Corporation Discontinuity edge overdraw
GB2378337B (en) 2001-06-11 2005-04-13 Canon Kk 3D Computer modelling apparatus
US7909696B2 (en) 2001-08-09 2011-03-22 Igt Game interaction in 3-D gaming environments
US6781591B2 (en) 2001-08-15 2004-08-24 Mitsubishi Electric Research Laboratories, Inc. Blending multiple images using local and global information
US7023432B2 (en) 2001-09-24 2006-04-04 Geomagic, Inc. Methods, apparatus and computer program products that reconstruct surfaces from data point sets
US7096428B2 (en) 2001-09-28 2006-08-22 Fuji Xerox Co., Ltd. Systems and methods for providing a spatially indexed panoramic video
KR100861161B1 (en) 2002-02-06 2008-09-30 디지털 프로세스 가부시끼가이샤 Computer-readable recording media recording three-dimensional display programs, three-dimensional display devices, and three-dimensional display methods
US20040217956A1 (en) 2002-02-28 2004-11-04 Paul Besl Method and system for processing, compressing, streaming, and interactive rendering of 3D color image data
US7515173B2 (en) 2002-05-23 2009-04-07 Microsoft Corporation Head pose tracking system
US7030875B2 (en) 2002-09-04 2006-04-18 Honda Motor Company Ltd. Environmental reasoning using geometric data structure
US7106358B2 (en) 2002-12-30 2006-09-12 Motorola, Inc. Method, system and apparatus for telepresence communications
US20050017969A1 (en) 2003-05-27 2005-01-27 Pradeep Sen Computer graphics rendering using boundary information
US7480401B2 (en) 2003-06-23 2009-01-20 Siemens Medical Solutions Usa, Inc. Method for local surface smoothing with application to chest wall nodule segmentation in lung CT data
US7321669B2 (en) * 2003-07-10 2008-01-22 Sarnoff Corporation Method and apparatus for refining target position and size estimates using image and depth data
GB2405775B (en) 2003-09-05 2008-04-02 Canon Europa Nv 3D computer surface model generation
US7184052B2 (en) 2004-06-18 2007-02-27 Microsoft Corporation Real-time texture rendering using generalized displacement maps
US20060023782A1 (en) 2004-07-27 2006-02-02 Microsoft Corporation System and method for off-line multi-view video compression
US7671893B2 (en) 2004-07-27 2010-03-02 Microsoft Corp. System and method for interactive multi-view video
US7561620B2 (en) 2004-08-03 2009-07-14 Microsoft Corporation System and process for compressing and decompressing multiple, layered, video streams employing spatial and temporal encoding
US7142209B2 (en) 2004-08-03 2006-11-28 Microsoft Corporation Real-time rendering system and process for interactive viewpoint video that was generated using overlapping images of a scene captured from viewpoints forming a grid
US7221366B2 (en) 2004-08-03 2007-05-22 Microsoft Corporation Real-time rendering system and process for interactive viewpoint video
US8477173B2 (en) 2004-10-15 2013-07-02 Lifesize Communications, Inc. High definition videoconferencing system
WO2006062199A1 (en) 2004-12-10 2006-06-15 Kyoto University 3-dimensional image data compression device, method, program, and recording medium
US7860301B2 (en) 2005-02-11 2010-12-28 Macdonald Dettwiler And Associates Inc. 3D imaging system
DE102005023195A1 (en) 2005-05-19 2006-11-23 Siemens Ag Method for expanding the display area of a volume recording of an object area
US8228994B2 (en) 2005-05-20 2012-07-24 Microsoft Corporation Multi-view video coding based on temporal and view decomposition
WO2007005752A2 (en) 2005-07-01 2007-01-11 Dennis Christensen Visual and aural perspective management for enhanced interactive video telepresence
JP4595733B2 (en) 2005-08-02 2010-12-08 カシオ計算機株式会社 Image processing device
US7551232B2 (en) 2005-11-14 2009-06-23 Lsi Corporation Noise adaptive 3D composite noise reduction
US7623127B2 (en) 2005-11-29 2009-11-24 Siemens Medical Solutions Usa, Inc. Method and apparatus for discrete mesh filleting and rounding through ball pivoting
US7577491B2 (en) 2005-11-30 2009-08-18 General Electric Company System and method for extracting parameters of a cutting tool
KR100810268B1 (en) 2006-04-06 2008-03-06 삼성전자주식회사 Implementation Method for Color Weaknesses in Mobile Display Devices
US7778491B2 (en) 2006-04-10 2010-08-17 Microsoft Corporation Oblique image stitching
US7679639B2 (en) 2006-04-20 2010-03-16 Cisco Technology, Inc. System and method for enhancing eye gaze in a telepresence system
US20080043024A1 (en) 2006-06-26 2008-02-21 Siemens Corporate Research, Inc. Method for reconstructing an object subject to a cone beam using a graphic processor unit (gpu)
USD610105S1 (en) 2006-07-10 2010-02-16 Cisco Technology, Inc. Telepresence system
US20080095465A1 (en) 2006-10-18 2008-04-24 General Electric Company Image registration system and method
US8213711B2 (en) 2007-04-03 2012-07-03 Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Industry, Through The Communications Research Centre Canada Method and graphical user interface for modifying depth maps
GB0708676D0 (en) 2007-05-04 2007-06-13 Imec Inter Uni Micro Electr A Method for real-time/on-line performing of multi view multimedia applications
US8253770B2 (en) 2007-05-31 2012-08-28 Eastman Kodak Company Residential video communication system
US8063901B2 (en) 2007-06-19 2011-11-22 Siemens Aktiengesellschaft Method and apparatus for efficient client-server visualization of multi-dimensional data
JP4947593B2 (en) 2007-07-31 2012-06-06 Kddi株式会社 Apparatus and program for generating free viewpoint image by local region segmentation
US8223192B2 (en) 2007-10-31 2012-07-17 Technion Research And Development Foundation Ltd. Free viewpoint video
US8441476B2 (en) 2007-11-16 2013-05-14 Sportvision, Inc. Image repair interface for providing virtual viewpoints
US8160345B2 (en) 2008-04-30 2012-04-17 Otismed Corporation System and method for image segmentation in generating computer models of a joint to undergo arthroplasty
US8840470B2 (en) * 2008-02-27 2014-09-23 Sony Computer Entertainment America Llc Methods for capturing depth data of a scene and applying computer actions
TWI357582B (en) 2008-04-18 2012-02-01 Univ Nat Taiwan Image tracking system and method thereof
US8442355B2 (en) 2008-05-23 2013-05-14 Samsung Electronics Co., Ltd. System and method for generating a multi-dimensional image
US7840638B2 (en) 2008-06-27 2010-11-23 Microsoft Corporation Participant positioning in multimedia conferencing
WO2010023580A1 (en) 2008-08-29 2010-03-04 Koninklijke Philips Electronics, N.V. Dynamic transfer of three-dimensional image data
US20110169824A1 (en) 2008-09-29 2011-07-14 Nobutoshi Fujinami 3d image processing device and method for reducing noise in 3d image processing device
JP5243612B2 (en) 2008-10-02 2013-07-24 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ Intermediate image synthesis and multi-view data signal extraction
US8200041B2 (en) 2008-12-18 2012-06-12 Intel Corporation Hardware accelerated silhouette detection
US8436852B2 (en) 2009-02-09 2013-05-07 Microsoft Corporation Image editing consistent with scene geometry
US8477175B2 (en) 2009-03-09 2013-07-02 Cisco Technology, Inc. System and method for providing three dimensional imaging in a network environment
JP5222205B2 (en) 2009-04-03 2013-06-26 Kddi株式会社 Image processing apparatus, method, and program
US20100259595A1 (en) 2009-04-10 2010-10-14 Nokia Corporation Methods and Apparatuses for Efficient Streaming of Free View Point Video
US8719309B2 (en) 2009-04-14 2014-05-06 Apple Inc. Method and apparatus for media data transmission
US8665259B2 (en) 2009-04-16 2014-03-04 Autodesk, Inc. Multiscale three-dimensional navigation
US8755569B2 (en) 2009-05-29 2014-06-17 University Of Central Florida Research Foundation, Inc. Methods for recognizing pose and action of articulated objects with collection of planes in motion
US8629866B2 (en) 2009-06-18 2014-01-14 International Business Machines Corporation Computer method and apparatus providing interactive control and remote identity through in-world proxy
KR101070591B1 (en) * 2009-06-25 2011-10-06 (주)실리콘화일 distance measuring apparatus having dual stereo camera
US9648346B2 (en) 2009-06-25 2017-05-09 Microsoft Technology Licensing, Llc Multi-view video compression and streaming based on viewpoints of remote viewer
US8194149B2 (en) 2009-06-30 2012-06-05 Cisco Technology, Inc. Infrared-aided depth estimation
US8633940B2 (en) 2009-08-04 2014-01-21 Broadcom Corporation Method and system for texture compression in a system having an AVC decoder and a 3D engine
US8908958B2 (en) 2009-09-03 2014-12-09 Ron Kimmel Devices and methods of generating three dimensional (3D) colored models
US8284237B2 (en) 2009-09-09 2012-10-09 Nokia Corporation Rendering multiview content in a 3D video system
US8441482B2 (en) 2009-09-21 2013-05-14 Caustic Graphics, Inc. Systems and methods for self-intersection avoidance in ray tracing
US20110084983A1 (en) 2009-09-29 2011-04-14 Wavelength & Resonance LLC Systems and Methods for Interaction With a Virtual Environment
US9154730B2 (en) 2009-10-16 2015-10-06 Hewlett-Packard Development Company, L.P. System and method for determining the active talkers in a video conference
US8537200B2 (en) 2009-10-23 2013-09-17 Qualcomm Incorporated Depth map generation techniques for conversion of 2D video data to 3D video data
KR101365329B1 (en) 2009-11-23 2014-03-14 제너럴 인스트루먼트 코포레이션 Depth coding as an additional channel to video sequence
US8487977B2 (en) 2010-01-26 2013-07-16 Polycom, Inc. Method and apparatus to virtualize people with 3D effect into a remote room on a telepresence call for true in person experience
US20110211749A1 (en) 2010-02-28 2011-09-01 Kar Han Tan System And Method For Processing Video Using Depth Sensor Information
US8898567B2 (en) 2010-04-09 2014-11-25 Nokia Corporation Method and apparatus for generating a virtual interactive workspace
EP2383696A1 (en) 2010-04-30 2011-11-02 LiberoVision AG Method for estimating a pose of an articulated object model
US20110304619A1 (en) 2010-06-10 2011-12-15 Autodesk, Inc. Primitive quadric surface extraction from unorganized point cloud data
US8411126B2 (en) 2010-06-24 2013-04-02 Hewlett-Packard Development Company, L.P. Methods and systems for close proximity spatial audio rendering
KR20120011653A (en) * 2010-07-29 2012-02-08 삼성전자주식회사 Image processing apparatus and method
US8659597B2 (en) 2010-09-27 2014-02-25 Intel Corporation Multi-view ray tracing using edge detection and shader reuse
US8787459B2 (en) 2010-11-09 2014-07-22 Sony Computer Entertainment Inc. Video coding methods and apparatus
US9123115B2 (en) * 2010-11-23 2015-09-01 Qualcomm Incorporated Depth estimation based on global motion and optical flow
JP5858381B2 (en) * 2010-12-03 2016-02-10 国立大学法人名古屋大学 Multi-viewpoint image composition method and multi-viewpoint image composition system
US8693713B2 (en) 2010-12-17 2014-04-08 Microsoft Corporation Virtual audio environment for multidimensional conferencing
US8156239B1 (en) 2011-03-09 2012-04-10 Metropcs Wireless, Inc. Adaptive multimedia renderer
WO2012155279A2 (en) 2011-05-13 2012-11-22 Liberovision Ag Silhouette-based pose estimation
US8867886B2 (en) 2011-08-08 2014-10-21 Roy Feinson Surround video playback
EP2761878B1 (en) 2011-09-29 2020-04-08 Dolby Laboratories Licensing Corporation Representation and coding of multi-view images using tapestry encoding
US9830743B2 (en) 2012-04-03 2017-11-28 Autodesk, Inc. Volume-preserving smoothing brush

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030038892A1 (en) * 2001-08-09 2003-02-27 Sidney Wang Enhancing broadcast of an event with synthetic scene using a depth map
US20050286759A1 (en) * 2004-06-28 2005-12-29 Microsoft Corporation Interactive viewpoint video system and process employing overlapping images of a scene captured from viewpoints forming a grid
US20090315978A1 (en) * 2006-06-02 2009-12-24 Eidgenossische Technische Hochschule Zurich Method and system for generating a 3d representation of a dynamically changing 3d scene
US20100026712A1 (en) * 2008-07-31 2010-02-04 Stmicroelectronics S.R.L. Method and system for video rendering, computer program product therefor
US8106924B2 (en) * 2008-07-31 2012-01-31 Stmicroelectronics S.R.L. Method and system for video rendering, computer program product therefor
US20130286204A1 (en) * 2012-04-30 2013-10-31 Convoy Technologies Corp. Motor vehicle camera and monitoring system

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10079968B2 (en) 2012-12-01 2018-09-18 Qualcomm Incorporated Camera having additional functionality based on connectivity with a host device
US9144905B1 (en) * 2013-03-13 2015-09-29 Hrl Laboratories, Llc Device and method to identify functional parts of tools for robotic manipulation
US9191643B2 (en) 2013-04-15 2015-11-17 Microsoft Technology Licensing, Llc Mixing infrared and color component data point clouds
US20160381348A1 (en) * 2013-09-11 2016-12-29 Sony Corporation Image processing device and method
US10587864B2 (en) * 2013-09-11 2020-03-10 Sony Corporation Image processing device and method
US11610256B1 (en) 2013-10-25 2023-03-21 Appliance Computing III, Inc. User interface for image-based rendering of virtual tours
US11062384B1 (en) 2013-10-25 2021-07-13 Appliance Computing III, Inc. Image-based rendering of real spaces
US11449926B1 (en) 2013-10-25 2022-09-20 Appliance Computing III, Inc. Image-based rendering of real spaces
US11948186B1 (en) 2013-10-25 2024-04-02 Appliance Computing III, Inc. User interface for image-based rendering of virtual tours
US10592973B1 (en) 2013-10-25 2020-03-17 Appliance Computing III, Inc. Image-based rendering of real spaces
US11783409B1 (en) 2013-10-25 2023-10-10 Appliance Computing III, Inc. Image-based rendering of real spaces
US10510111B2 (en) 2013-10-25 2019-12-17 Appliance Computing III, Inc. Image-based rendering of real spaces
US12266011B1 (en) 2013-10-25 2025-04-01 Appliance Computing III, Inc. User interface for image-based rendering of virtual tours
US20150224648A1 (en) * 2014-02-13 2015-08-13 GM Global Technology Operations LLC Robotic system with 3d box location functionality
US9233469B2 (en) * 2014-02-13 2016-01-12 GM Global Technology Operations LLC Robotic system with 3D box location functionality
EP4503602A3 (en) * 2014-09-03 2025-04-16 Nevermind Capital LLC Methods and apparatus for capturing, streaming and/or playing back content
CN107211117B (en) * 2015-01-22 2019-06-28 微软技术许可有限责任公司 The second eyes viewport is synthesized using interweaving
US9661312B2 (en) * 2015-01-22 2017-05-23 Microsoft Technology Licensing, Llc Synthesizing second eye viewport using interleaving
US9686520B2 (en) * 2015-01-22 2017-06-20 Microsoft Technology Licensing, Llc Reconstructing viewport upon user viewpoint misprediction
US20160217760A1 (en) * 2015-01-22 2016-07-28 Microsoft Technology Licensing, Llc. Reconstructing viewport upon user viewpoint misprediction
CN107211117A (en) * 2015-01-22 2017-09-26 微软技术许可有限责任公司 The second eyes viewport is synthesized using interweaving
US10750139B2 (en) 2015-01-22 2020-08-18 Microsoft Technology Licensing, Llc Reconstructing viewport upon user viewpoint misprediction
US20160275987A1 (en) * 2015-03-17 2016-09-22 Thomson Licensing Method and apparatus for displaying light field video data
US10388323B2 (en) * 2015-03-17 2019-08-20 Interdigital Ce Patent Holdings Method and apparatus for displaying light field video data
US10554713B2 (en) 2015-06-19 2020-02-04 Microsoft Technology Licensing, Llc Low latency application streaming using temporal frame transformation
US11962940B2 (en) * 2015-08-14 2024-04-16 Interdigital Vc Holdings, Inc. System and method for augmented reality multi-view telepresence
US12407792B2 (en) 2015-08-14 2025-09-02 Interdigital Vc Holdings, Inc. System and method for augmented reality multi-view telepresence
US20220368858A1 (en) * 2015-08-14 2022-11-17 Pcms Holdings, Inc. System and method for augmented reality multi-view telepresence
US11082471B2 (en) * 2016-07-27 2021-08-03 R-Stor Inc. Method and apparatus for bonding communication technologies
US20180034882A1 (en) * 2016-07-27 2018-02-01 R-Stor Inc. Method and apparatus for bonding communication technologies
US9980078B2 (en) 2016-10-14 2018-05-22 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US10433096B2 (en) 2016-10-14 2019-10-01 Nokia Technologies Oy Audio object modification in free-viewpoint rendering
US20190311526A1 (en) * 2016-12-28 2019-10-10 Panasonic Intellectual Property Corporation Of America Three-dimensional model distribution method, three-dimensional model receiving method, three-dimensional model distribution device, and three-dimensional model receiving device
US11551408B2 (en) * 2016-12-28 2023-01-10 Panasonic Intellectual Property Corporation Of America Three-dimensional model distribution method, three-dimensional model receiving method, three-dimensional model distribution device, and three-dimensional model receiving device
US11096004B2 (en) 2017-01-23 2021-08-17 Nokia Technologies Oy Spatial audio rendering point extension
US11665308B2 (en) 2017-01-31 2023-05-30 Tetavi, Ltd. System and method for rendering free viewpoint video for sport applications
US11632489B2 (en) 2017-01-31 2023-04-18 Tetavi, Ltd. System and method for rendering free viewpoint video for studio applications
US11044570B2 (en) 2017-03-20 2021-06-22 Nokia Technologies Oy Overlapping audio-object interactions
US10531219B2 (en) 2017-03-20 2020-01-07 Nokia Technologies Oy Smooth rendering of overlapping audio-object interactions
US10726574B2 (en) * 2017-04-11 2020-07-28 Dolby Laboratories Licensing Corporation Passive multi-wearable-devices tracking
US11669991B2 (en) 2017-04-11 2023-06-06 Dolby Laboratories Licensing Corporation Passive multi-wearable-devices tracking
US12277724B2 (en) 2017-04-11 2025-04-15 Dolby Laboratories Licensing Corporation Passive multi-wearable-devices tracking
US11604624B2 (en) 2017-05-05 2023-03-14 Nokia Technologies Oy Metadata-free audio-object interactions
US11442693B2 (en) 2017-05-05 2022-09-13 Nokia Technologies Oy Metadata-free audio-object interactions
US11074036B2 (en) 2017-05-05 2021-07-27 Nokia Technologies Oy Metadata-free audio-object interactions
US10165386B2 (en) 2017-05-16 2018-12-25 Nokia Technologies Oy VR audio superzoom
CN111052750A (en) * 2017-08-30 2020-04-21 三星电子株式会社 Method and apparatus for point cloud streaming
US11290758B2 (en) 2017-08-30 2022-03-29 Samsung Electronics Co., Ltd. Method and apparatus of point-cloud streaming
WO2019045473A1 (en) * 2017-08-30 2019-03-07 Samsung Electronics Co., Ltd. Method and apparatus for point-cloud streaming
US11395087B2 (en) 2017-09-29 2022-07-19 Nokia Technologies Oy Level-based audio-object interactions
US11430412B2 (en) * 2017-12-19 2022-08-30 Sony Interactive Entertainment Inc. Freely selected point of view image generating apparatus, reference image data generating apparatus, freely selected point of view image generating method, and reference image data generating method
US10542368B2 (en) 2018-03-27 2020-01-21 Nokia Technologies Oy Audio content modification for playback audio
US20190306651A1 (en) 2018-03-27 2019-10-03 Nokia Technologies Oy Audio Content Modification for Playback Audio
US11315326B2 (en) * 2019-10-15 2022-04-26 At&T Intellectual Property I, L.P. Extended reality anchor caching based on viewport prediction
US20230393800A1 (en) * 2022-03-01 2023-12-07 Tencent Technology (Shenzhen) Company Limited Online meeting interface display method and apparatus, medium, and computer program product

Also Published As

Publication number Publication date
US20130321590A1 (en) 2013-12-05
US20130321566A1 (en) 2013-12-05
US20130321418A1 (en) 2013-12-05
US9251623B2 (en) 2016-02-02
US8917270B2 (en) 2014-12-23
US20130321589A1 (en) 2013-12-05
US20130321410A1 (en) 2013-12-05
US20130321575A1 (en) 2013-12-05
US20130321396A1 (en) 2013-12-05
US9846960B2 (en) 2017-12-19
US20130321593A1 (en) 2013-12-05
US9256980B2 (en) 2016-02-09
US20130321413A1 (en) 2013-12-05

Similar Documents

Publication Publication Date Title
US20130321586A1 (en) Cloud based free viewpoint video streaming
US10958887B2 (en) Free-viewpoint photorealistic view synthesis from casually captured video
Serrano et al. Motion parallax for 360 RGBD video
US11257233B2 (en) Volumetric depth video recording and playback
US12293450B2 (en) 3D conversations in an artificial reality environment
Casas et al. 4d video textures for interactive character appearance
KR102502794B1 (en) Methods and systems for customizing virtual reality data
US20080246759A1 (en) Automatic Scene Modeling for the 3D Camera and 3D Video
Richardt et al. Capture, reconstruction, and representation of the visual real world for virtual reality
CN112740261A (en) Panoramic light field capture, processing and display
JP7447266B2 (en) View encoding and decoding for volumetric image data
JP2023551991A (en) Real-time multi-view video conversion method and system
Alain et al. Introduction to immersive video technologies
Szeliski Image-based rendering
US20240096035A1 (en) Latency reduction for immersive content production systems
Demiris Merging the real and the synthetic in augmented 3D worlds: A brief survey of applications and challenges
CN120499442A (en) Video special effect adding method and device and electronic equipment
Ruiz‐Hidalgo et al. Interactive Rendering
Wetzstein Capture, Reconstruction, and Representation of the Visual Real World for Virtual Reality

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIRK, ADAM;SWEENEY, PATRICK;GILLETT, DON;AND OTHERS;SIGNING DATES FROM 20120806 TO 20120816;REEL/FRAME:028808/0269

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0541

Effective date: 20141014

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE