[go: up one dir, main page]

WO2010075726A1 - 立体全景视频流生成方法、设备及视频会议方法和设备 - Google Patents

立体全景视频流生成方法、设备及视频会议方法和设备 Download PDF

Info

Publication number
WO2010075726A1
WO2010075726A1 PCT/CN2009/075383 CN2009075383W WO2010075726A1 WO 2010075726 A1 WO2010075726 A1 WO 2010075726A1 CN 2009075383 W CN2009075383 W CN 2009075383W WO 2010075726 A1 WO2010075726 A1 WO 2010075726A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
image
video image
corrected
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2009/075383
Other languages
English (en)
French (fr)
Inventor
李凯
刘源
王静
苏红宏
赵嵩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Device Co Ltd
Original Assignee
Huawei Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN200810247531A external-priority patent/CN101771830B/zh
Priority claimed from CN2009101186295A external-priority patent/CN101820550B/zh
Application filed by Huawei Device Co Ltd filed Critical Huawei Device Co Ltd
Priority to EP09836013A priority Critical patent/EP2385705A4/en
Publication of WO2010075726A1 publication Critical patent/WO2010075726A1/zh
Priority to US13/172,193 priority patent/US8717405B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the invention relates to a video splicing technology, in particular to a video splicing technology in a telepresence conference system, and in particular to a stereoscopic panoramic video stream generation method, device and video conference method and device.
  • the existing Telepresence technology is a technology that combines high-quality audio, high-definition video graphics and interactive components to deliver an immersive and unique experience over the web.
  • advanced video, audio, and collaboration technologies in a telepresence conferencing system provide users with a real-time, face-to-face interactive experience.
  • the telepresence conferencing system even provides a picture of the suite in the room, creating a face-to-face meeting experience around a virtual conference table with life-size images, high-definition resolution, and stereo and multi-channel audio.
  • the existing telepresence can give users a better and more realistic in-room suite conference experience than the traditional conference system, the real-life face-to-face real-life communication gap is lacking in real stereo experience.
  • the video information that people get is only two-dimensional information, and there is no exchange information with deep level feeling.
  • the existing stereoscopic (3D) video technology can provide depth information conforming to the principle of stereoscopic vision, so as to truly reproduce the objective world scene, and express the depth, layering and authenticity of the scene, which is an important direction of current video technology development. .
  • display equipment is expensive and lacks standard Quasi-equal reasons have not been applied on a large scale.
  • the existing image stitching technology can break through the physical limitations of the imaging device itself, and obtain a digital panoramic image of a large field of view.
  • the prior art telepresence conference system cannot provide users with a panoramic, high resolution, seamless and three-dimensional conference experience.
  • the embodiments of the present invention provide a stereoscopic panoramic video stream generating method, device, and video conferencing method and device, which are capable of providing panoramic, high-level users based on multiple display modes of different display devices. Resolution, seamless and 3D telepresence video images.
  • One of the objectives of the embodiments of the present invention is to provide a method for generating a stereoscopic panoramic video stream, the method comprising: acquiring depth information of at least two video images to be spliced; and correspondingly waiting according to depth information of each video image to be spliced Obtaining image data of a plurality of depth levels in the spliced video image; splicing the video image data according to the acquired image data of the plurality of depth levels to generate a stereoscopic panoramic video stream.
  • One of the objectives of the embodiments of the present invention is to provide a stereoscopic video conferencing method, the method comprising: acquiring video streams of the same site at least from two perspectives; and corresponding video streams according to depth information of each video stream Obtaining image data of multiple depth levels; performing stitching based on depth information on the acquired video streams of different views to generate a stereoscopic panoramic video stream; displaying the video image of the stereoscopic panoramic video stream in the terminal according to the category of the terminal display On the display.
  • One of the objectives of the embodiments of the present invention is to provide a stereoscopic panoramic video stream generating device, which is provided.
  • the method includes: a depth information acquiring device, configured to acquire depth information of at least two video images to be spliced; and a layered image acquiring device, configured to acquire, according to depth information of each video image to be spliced, a corresponding video image to be spliced
  • the image data of the depth level is generated by the stereoscopic panoramic video stream generating device, configured to perform splicing between the video image data according to the acquired image data of the plurality of depth levels to generate a stereoscopic panoramic video stream.
  • One of the objectives of the embodiments of the present invention is to provide a stereoscopic video conferencing device, where the device includes: a depth information acquiring device that acquires a video stream of the same site from at least two perspectives; And acquiring, according to the depth information of each video stream, image data of a plurality of depth levels from the corresponding video stream; the stereoscopic panoramic video stream generating device, performing the depth information-based splicing of the acquired video streams of different viewing angles to generate a stereoscopic panorama a video image display device, configured to display a video image of the stereoscopic panoramic video stream on a terminal display according to a category of the terminal display.
  • the invention provides a fast and real-time video image splicing, which reduces the complexity of video image splicing and improves the efficiency of video image splicing. It can provide users with a panoramic, brightly colored, seamless, three-dimensional conference experience, which can achieve a higher level of real experience than traditional telepresence.
  • FIG. 1 is a schematic diagram of a multi-view video conference system based on a depth camera according to an embodiment of the present invention
  • FIG. 2 is a flowchart of generating a stereoscopic panoramic video stream according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic diagram of video splicing based on a character layer and a non-person layer according to Embodiment 1 of the present invention
  • FIG. 4 is a flowchart of a method for generating a stereoscopic panoramic video stream according to Embodiment 2 of the present invention
  • FIG. 5 is a flowchart of a color correction method according to Embodiment 3 of the present invention
  • FIG. 6 is a flowchart of a color correction method according to Embodiment 4 of the present invention.
  • FIG. 7A is a structural diagram of a stereoscopic panoramic video stream generating apparatus according to Embodiment 5 of the present invention
  • FIG. 7B is a structural block diagram of a stereoscopic panoramic video stream generating apparatus according to Embodiment 5 of the present invention
  • FIG. 8B is a structural diagram of a video image correcting apparatus according to Embodiment 6 of the present invention
  • FIG. 9A is a structural diagram of another video image correction apparatus according to Embodiment 6 of the present invention
  • FIG. 9B is a structural diagram of a selection unit according to Embodiment 6 of the present invention
  • FIG. 9C is a structural diagram of a generating unit according to Embodiment 6 of the present invention.
  • 9D is a structural diagram of a correction unit according to Embodiment 6 of the present invention.
  • FIG. 9E is a structural diagram of another correction unit according to Embodiment 6 of the present invention.
  • FIG. 10 is a flowchart of a stereoscopic panoramic video conference method according to Embodiment 7 of the present invention
  • FIG. 11 is a flowchart of video splicing provided by Embodiment 7 of the present invention.
  • FIG. 12 is a schematic diagram of two video image sequences according to Embodiment 7 of the present invention.
  • FIG. 13 is a flowchart of a stereoscopic panoramic video conference device according to Embodiment 8 of the present invention
  • FIG. 14 is a schematic structural diagram of a stereoscopic panoramic video conference device according to Embodiment 9 of the present invention
  • FIG. 15B is a structural block diagram of a video image display apparatus according to Embodiment 9 of the present invention
  • FIG. 17 is a schematic diagram of a multi-site multi-view video conference system based on a depth camera according to Embodiment 9 of the present invention.
  • FIG. 18 is a schematic diagram of a site A according to Embodiment 9 of the present invention.
  • FIG. 21 is a structural diagram of a stereoscopic panoramic video conference device according to Embodiment 10 of the present invention. detailed description
  • this embodiment proposes a multi-site, two-dimensional (2D)/three-dimensional (3D)/multi-layer (Mul it-Layer), and multi-view video conferencing system based on a depth camera.
  • the site A includes: a depth camera (101A, 102A), a video conference server 103A, and a terminal display device (104A, 105A).
  • the depth camera (101A, 102A) is connected to the terminal display device (104A, 105A) through the video conference server 103A, and the terminal display device (104A, 105A) may be a 2D display, a 3D display or a multi-layer display (Mul it-Layer).
  • the site B includes: a depth camera (111B, 112B), a server 113B, and a depth camera (1 11B, 112B) connected to the server 113B.
  • the venue C includes: a depth camera (121C, 122C), a server 123C, and a depth camera (1 21C, 122C) connected to the server 123C.
  • the venue D includes: a depth camera (131D, 132D), a server 133D, and a depth camera (1 31D, 132D) connected to the server 133D.
  • the server 103A is connected to the servers (113B, 123C, 133D) via the network 142 and the transmission device 141, respectively.
  • Network 142 can be a cable, internet or satellite network.
  • the stereoscopic panoramic video stream generating method of the embodiment of the present invention includes the following steps: S201: acquiring depth information of at least two video images to be stitched;
  • S202 Obtain image data of multiple depth levels from corresponding video images to be stitched according to depth information of each video image to be stitched;
  • S203 Perform splicing between video image data according to the acquired image data of multiple depth levels to generate a stereoscopic panoramic video stream.
  • the depth camera (111B, 112B) obtains the video stream of the site B and the depth information of each frame image from two perspectives; and obtain depth images of different depth levels according to the depth information of the image, for the depth level.
  • image stitching In areas with small changes, only one image stitching is generally performed; for people and things with motion changes, frame-by-frame image stitching is required in real time.
  • Areas with small changes in depth level generally refer to: fixed furniture in the conference scene, fixed-position video communication equipment (such as: video camera, large-screen display device, printer, etc.). These areas are basically unchanged, and the depth level is basically not The change or the change is small, so that the areas with small changes in the depth level can be extracted in advance by means such as a depth camera, and the seamless stitching of the two camera videos can be performed separately.
  • Areas with large changes in depth levels generally refer to people or things that move (such as chairs). The participants will usually do some exercises, and the chairs will move. If the person (not reaching out) moves forward and backward relative to the camera, it will cause the depth level of the character in the time axis to change greatly, but at the same time, the people in the images taken by different cameras are still at the same depth level, which is easy. Seamless image stitching can be achieved using traditional image stitching techniques. If the person (reaching out) moves relatively back and forth relative to the camera, at the same time, the people in the images captured by different cameras are not at the same depth level, resulting in different levels of depth/disparity.
  • the depth information acquires character image data and non-person image data from the corresponding video image. Splicing non-person image data to generate non-person stitching image data; splicing the person image data to generate character stitching image data; pasting the person stitching image data and the non-person stitching image data to generate a stereo Panoramic video stream.
  • an image change region corresponding to the image data of the person corresponding to the previous frame in the current frame of each video stream may be detected, and after determining that the change region is greater than a set threshold, only The person image data of the changed area is spliced.
  • the person image data (306, 307) and the non-person image data (303, 304) are acquired from the video image (301, 302) according to the depth information of the image; the non-person image data (303, 304) Performing splicing to generate non-person stitching image data 305; splicing the person image data (30 6, 307) to generate character stitching image data 308; and arranging the person stitching image data 308 and the non-person stitching image data 305 Paste is performed to generate a composite video image 309 and encode the output.
  • the technical solution provided by the embodiment of the invention realizes fast and real-time video image splicing, reduces the complexity of video image splicing, and improves the efficiency of video image splicing. It provides users with a panoramic, high-resolution, seamless, three-dimensional conference experience. Achieve a higher level of real feeling than traditional telepresence. It solves the problem of ghosting caused by parallax when multi-view video is stitched, especially for scenes with near-view parallax.
  • Embodiment 2 realizes fast and real-time video image splicing, reduces the complexity of video image splicing, and improves the efficiency of video image splicing. It provides users with a panoramic, high-resolution, seamless, three-dimensional conference experience. Achieve a higher level of real feeling than traditional telepresence. It solves the problem of ghosting caused by parallax when multi-view video is stitched, especially for scenes with near-view parallax.
  • Embodiment 2 solves the problem
  • FIG. 4 is a flowchart of a method for generating a stereoscopic panoramic video stream according to Embodiment 2 of the present invention. The method includes the following steps:
  • S401 Obtain depth information of at least two video images to be stitched.
  • S402 Acquire image data of multiple depth levels from the corresponding video image to be spliced according to depth information of each video image to be spliced.
  • Step S401 and step S402 are similar to steps S201 and S202 in the first embodiment, and will not be further described herein.
  • S403 Acquire at least two corrected video images, two or two of the at least two corrected video images
  • S404 Select, from the overlapping area, a paired feature point of the two adjacent corrected video images.
  • a plurality of methods may be used to obtain paired feature points of two adjacent video images, such as: Harris feature point detection method, SUSAN (Smal lest Univalue Segment Assimi lating Nucleus) feature point detection method
  • Harris feature point detection method SUSAN (Smal lest Univalue Segment Assimi lating Nucleus) feature point detection method
  • the wavelet-based feature point detection method the SIFT (Scale-invariant feature transform) feature point detection method, and the like, for which the embodiment of the present invention does not impose any limitation.
  • S405 Generate a color correction matrix of the two adjacent corrected video images according to the paired feature points.
  • step S403-S405 need not be repeated, and step S406 can be directly performed.
  • steps S403-S405 may be performed before step S401 to obtain a color correction matrix.
  • S407 splicing the corrected video images to be spliced according to the obtained image data of the plurality of depth levels to generate a stereoscopic panoramic video stream.
  • the embodiment of the present invention can provide a panoramic, high-resolution, seamless, three-dimensional conference experience to the user, and can provide the user with better brightness color due to color correction of the video image in the embodiment of the present invention.
  • Panoramic video stream can provide a panoramic, high-resolution, seamless, three-dimensional conference experience to the user, and can provide the user with better brightness color due to color correction of the video image in the embodiment of the present invention.
  • the histogram correction in the prior art can also correct the deviation of the above-mentioned bright chromaticity.
  • the correction of the bright chromaticity of the video image by using the histogram requires a large similarity between the video images, the multi-viewpoint shooting is performed.
  • the correction result is likely to be poor, or the correction fails; and since the histogram is used Real-time statistical correction is required for each image during calibration, and the correction time is longer.
  • the embodiment of the present invention is an illustration of another embodiment of color correction in the process of generating a panoramic video stream.
  • This embodiment shows the process of correcting the video images captured by the two cameras, wherein the two video images captured by the two cameras need to have overlapping regions when initially calculating the color correction matrix, and the subsequent color correction matrix pairs are not adjacent.
  • the images captured by the two cameras may have no overlapping area, but an overlapping area is required between the two adjacent video images captured.
  • FIG. 5 is a flowchart of a color correction method according to Embodiment 3 of the present invention. The method includes the following steps:
  • S501 Receive video images with overlapping areas captured by two cameras.
  • the two cameras are the source camera and the target camera respectively.
  • the source camera captures the source video image
  • the target camera captures the target video image.
  • the color of the source video image needs to be corrected to match the color of the target image.
  • the positions of the two cameras can be adjusted so that the captured source video image and the target video image have overlapping regions.
  • the size range of the overlap area is not limited, as long as the overlapping area is ensured.
  • S502 Perform image preprocessing on the two captured video images.
  • the preprocessing of the video image includes the smoothing noise reduction processing and the distortion correction processing which are usually employed, and this step is an optional step.
  • S503 Perform color space conversion on the preprocessed two video images.
  • the video image captured by the camera can be converted into a color space.
  • the format of the video image before and after conversion can be RGB (Red Green Blue, red, green and blue), or HSV, or YUV. Or any of HS or CIE-Lab, or CIE_Luv, or CMY, or CMYK, or XYZ.
  • S504 Select a pairing feature point from an overlapping area of two video images.
  • the manner of acquiring feature points from two video images has many, such as: Harris feature point detection method, SUSAN (Smal lest Univalue Segment Assimi lating Nucleus) feature point detection method, wavelet-based feature Point detection method, SIF T (Scale-invariant feature transform) feature point detection method, etc., in order to achieve a better effect, the embodiment of the present invention uses rotation, scale scaling, and brightness change to remain unchanged.
  • a SIFT Scale-invariant feature transform
  • the other feature extraction algorithm obtains the paired feature points of the overlapping regions of the two video images, and the embodiment of the present invention does not impose any limitation.
  • Method 1 SIFT feature point detection is performed on the overlapping area, and the detected feature points are matched to obtain multiple sets of paired feature points of two adjacent video images.
  • the SIFT feature point detection method is the most commonly used method in the existing image processing, and the SIFT feature point detection can satisfy the affine and bright chromaticity. It should be noted that in addition to SIFT feature point detection, there are various feature point detection methods in the prior art, such as Harris detection mode, Susan angle detection mode, and improved algorithm thereof, as long as the feature points can be detected from the overlapping area. can.
  • the RANSAC Random SAmple Consensus
  • the prior art for example, a method based on probability statistics, etc. may be used in the case of removing unmatched feature points, and the embodiment of the present invention is not limited.
  • Manner 2 Perform SIFT feature point detection on the overlapping area, and match the detected feature points to obtain multiple sets of paired feature points of two adjacent video images, and divide the range by using the paired feature points as the center The same area assigns the average of the color features of the divided areas to the paired feature points.
  • the detection feature points and the feature points for removing the mismatch are consistent with the description in the first method, and details are not described herein again.
  • the area with the same area as the center of the feature point can be respectively used as the paired area of the two video images, and the average value of each color channel of the area is used as the color value of the feature point, for example, HSL (H) -Hue represents the chrominance, S_Saturation represents the saturation, and L_L ightness represents the video image in the format of brightness.
  • HSL (H) -Hue represents the chrominance
  • S_Saturation represents the saturation
  • L_L ightness represents the video image in the format of brightness.
  • Each pairing point has a corresponding H value, S value and L value, and the corresponding pairing area is an area composed of several points. , the H value, S value and L value of all points in the region are averaged respectively. If the average values are H', S ' and L', H', S ' and L' are assigned to the feature point.
  • Manner 3 The overlapping area is divided, and the corresponding area of the divided two video images is used as a pairing feature point, and an average value of the color features of the corresponding area is assigned to the paired feature point.
  • the two video images can be divided into several sets of matching regions, and the range of each pair of matching regions can be different, each of which contains several feature points, and the average color channel of each region is averaged.
  • the value of the feature point, the process of averaging is similar to the example in the second method, and will not be described here. After calculating the average of the color channels for the paired regions, several paired feature points are obtained.
  • Manner 4 receiving a region block that is manually selected from the overlapping region, and selecting a corresponding region block of the two video images as a pairing feature point, and assigning an average value of the color features of the corresponding region block to the The pairing feature points are described.
  • the difference between the fourth method and the third method is that the mode 3 can be automatically completed by the image correcting device according to the preset setting when the overlapping region is divided, and the plurality of matching regions are manually selected from the overlapping region by the manual manner, and then the selected result is input. In the image correction device, subsequent processing is performed.
  • S505 Create a color space matrix of two video images according to the paired feature points.
  • Mat_dst is the color space matrix of the target video image
  • Mat_src is the color space matrix of the source video image
  • the meaning of the matrix is explained by the first behavior example in Mat_dst
  • the first line represents m points The first point in , where hl l is the saturation value of the first point, sl2 is the chromaticity value of the first point, 113 is the brightness of the first point, so Mat_dst is m A matrix of ⁇ S, L values of m target pixel points of the target video image in the pairing point.
  • S506 Establish a transformation relationship of two color space matrices, and obtain a color correction matrix according to the transformation relationship.
  • Mat—ColorRectify is the color correction matrix to be solved.
  • S508 Perform color correction on the received video image to be stitched by using a color correction matrix.
  • a color space matrix of the video image to be corrected is generated, and it is assumed that the video image to be corrected is converted into a video image of the HSL format through the color space.
  • the video image is composed of Y pixels, and a matrix of color space is generated as a matrix of ( ⁇ 3 ), and each row of the matrix represents a ⁇ value, an S value, and an L value of one pixel.
  • the above embodiment of the present invention performs color correction on the video image. Since only the overlapping regions between the two images are required when calculating the color correction matrix, the color correction can be performed regardless of whether there is an overlapping region between the video images during the correction process.
  • the matrix is color corrected, and the color correction matrix only needs to be generated once, saving the time for color correction of the video image.
  • the embodiment of the present invention is a description of still another embodiment of color correction in the process of generating a panoramic video stream.
  • This embodiment shows a process of correcting a video image taken by N cameras, wherein when the color correction matrix is initially calculated, video images captured by two adjacent cameras in the N cameras need to have overlapping regions, and are input in common. N-1 sets of adjacent video images, correspondingly generate N-1 color correction matrices, and when the video images are corrected by N-1 color correction matrices, the images captured by the N cameras may have no common overlapping area. .
  • FIG. 6 is a flowchart of a color correction method according to Embodiment 4 of the present invention. The method includes the following steps:
  • S601 Receive N video images captured by N cameras, and two adjacent video images of the N video images have overlapping regions.
  • Two cameras adjacent to each other in the N cameras are a group, for example, a camera 1 and a camera 2 are a group, a camera 2 and a camera 3 are a group, and thus a camera N-1 and a camera N are grouped together.
  • the two cameras in each group are the source camera and the target camera, where the source camera captures the source video image and the target camera captures the target video image.
  • the pre-processing of the video image includes the smoothing noise reduction processing and the distortion correction processing, which are generally used.
  • the processing is a prior art, and details are not described herein.
  • S603 Perform color space conversion on the preprocessed N video images.
  • the RGB format image can be color space converted.
  • the converted image can be HSV, or YUV, or HSL, or CIE-La b, or CIE-Luv, or Any of CMY, or CMYK, or XYZ.
  • S604 Select the paired feature points of the adjacent two-two video images of the N-1 group in sequence.
  • the process of obtaining the matching feature points of the two adjacent video images of the N-1 group in the process of the step 304 is similar to the step 204 of the previous embodiment, and details are not described herein again.
  • S605 Create a color space matrix of two adjacent video images according to the pairing feature points.
  • S606 Establish a transformation relationship between two color space matrices, and obtain a color correction matrix according to the transformation relationship.
  • steps S604 to S607 are processes for processing a group of video images in two adjacent video images of the N-1 group, the process being consistent with the descriptions of steps S504 to S507 in the foregoing second embodiment. , will not repeat them here.
  • step S608 It is judged whether the N-1 group video image is processed, and if yes, step S609 is performed; otherwise, the process returns to step S604.
  • S609 Receive a video image to be corrected sent by the Kth camera.
  • N-1 color correction matrices have been obtained, wherein it is assumed that the first group video image corresponds to the first color correction matrix Mat-1, and the second group video image corresponds to the second color correction matrix Mat-2. And so on, the N-1th video image corresponds to the N-1th color correction matrix Mat_N_l.
  • S611 Color-correct the video image captured by the Kth camera by using the color correction matrix of the Kth video image.
  • the above embodiment of the present invention performs color correction on the video image. Since only the overlapping regions between the two images are required when calculating the color correction matrix, the color correction can be performed regardless of whether there is an overlapping region between the video images during the correction process.
  • the matrix is color corrected, and the color correction matrix only needs to be generated once, saving the time for color correction of the video image.
  • Embodiment 5
  • FIG. 7A is a structural diagram of a stereoscopic panoramic video stream generating device according to Embodiment 5 of the present invention.
  • the device includes: depth information acquiring device 701, configured to acquire depth information of at least two video images to be stitched;
  • the 702 is configured to acquire image data of a plurality of depth levels from the corresponding video image to be spliced according to the depth information of the video image to be spliced;
  • the stereoscopic panoramic video stream generating device 703 is configured to perform image data according to the acquired multiple depth levels. Splicing between video image data to generate a stereoscopic panoramic video stream.
  • the depth camera (111B, 112B) obtains the video stream of the site B and the depth information of each frame image from two perspectives; and obtain depth images of different depth levels according to the depth information of the image, for the depth level.
  • the layered image obtaining means 702 acquires the person image data from the corresponding video image based on the depth information of each video image, and acquires the non-person image data from the corresponding video image based on the depth information of each video image. As shown in FIG.
  • the stereoscopic panoramic video stream generating device 703 includes: an image splicing unit 7031 and an image detecting unit 7032; the image splicing unit 7031 splicing non-person image data to generate non-person stitching image data, and performing image data on the person Splicing, generating character stitching image data; then pasting the stitched image data of the character with the non-person stitching image data to generate a stereoscopic panoramic video stream.
  • the image detecting unit 7032 detects an image change region of each person image data in the current frame of each video stream relative to the corresponding person image data of the previous frame, and determines that the change region is smaller than the set threshold, and the image splicing unit 7031 only refers to the person in the changed region. Image data is stitched.
  • the layered image obtaining means 702 acquires foreground image data from the corresponding video image based on the depth information of each video image, and acquires background image data from the corresponding video image based on the depth information of each video image.
  • the stereoscopic panoramic video stream generating device 703 includes: an image splicing unit 7031 and an image detecting unit 7032; the image splicing unit 7031 splicing the acquired background image data to generate background panoramic image data, and splicing the acquired foreground image data to generate a foreground Splicing the image data; then pasting the foreground stitched image data to the background panoramic image data to generate a stereoscopic panoramic video stream.
  • the image detecting unit 7032 detects the video image change region of the foreground image data of the current frame of each video stream relative to the foreground image data of the previous frame, and determines that the changed region is smaller than the set threshold, and the image splicing unit 7031 only belongs to the changed region.
  • the foreground image data is stitched.
  • the technical solution provided by the embodiment of the invention realizes fast and real-time video image splicing, reduces the complexity of video image splicing, and improves the efficiency of video image splicing. It provides users with a panoramic, high-resolution, seamless, three-dimensional conference experience. Achieve a higher level of real feeling than traditional telepresence. It solves the problem of ghosting caused by parallax when multi-view video is stitched, especially for scenes with near-view parallax.
  • FIG. 8A is a structural diagram of a stereoscopic panoramic video stream generating apparatus according to Embodiment 6 of the present invention, where The device includes: a depth information acquiring device 801, a layered image acquiring device 802, a video image correcting device 803, and a stereoscopic panoramic video stream generating device 804, wherein the depth information acquiring device 801 and the layered image capturing device 802 are in phase five with the embodiment Similar, it will not be repeated here.
  • the video image correcting means 803 includes: an obtaining unit 8031, a selecting unit 8032, a generating unit 8033, and a correcting unit 8034 (as shown in Fig. 8B).
  • the acquiring unit 8031 is configured to acquire at least two corrected video images, where the two adjacent two of the at least two corrected video images have overlapping regions, and the acquiring unit 510 may specifically adopt the following manner: a camera or a camera And other imaging device implementations; the selecting unit 8032 is configured to select the paired feature points of the two adjacent corrected video images from the overlapping region, and the selecting unit 8032 can be implemented as follows: integrated image feature point extraction and A matching processor-specific chip or a general-purpose processor chip is implemented in combination with an image feature extraction and matching algorithm; a generating unit 8033 is configured to generate color correction of the two adjacent corrected video images according to the paired feature points a matrix, the generating unit 8033 may be implemented by: CPL D (Complex Programmable Logic Device) integrated with matrix processing function, or implemented by an FPGA (Field Programmable Gate Array); Unit 8034 is used to pass the office Color correction matrix received video images to be spliced correction.
  • CPL D Complex Programmable Logic Device
  • FPGA Field Programmable Gate
  • FIG. 9A A block diagram of a second embodiment of the video image correcting apparatus of the present invention is shown in Fig. 9A, which includes: an obtaining unit 910, a pre-processing unit 920, a converting unit 930, a selecting unit 940, a generating unit 950 and a correcting unit 960.
  • the acquiring unit 910 is configured to acquire at least two corrected video images, where two or two adjacent video images of the at least two corrected video images have overlapping regions, and the acquiring unit 910 may specifically adopt the following manners: a camera, a camera, etc.
  • the pre-processing unit 920 is configured to: after the acquiring unit acquires at least two corrected video images, perform pre-processing on the at least two corrected video images, where the pre-processing includes smoothing noise reduction processing, And/or distortion correction processing, wherein the pre-processing unit 920 is an optional unit; the conversion unit 930 is configured to correct the at least two
  • the video image is subjected to color space conversion, and the format of the corrected video image before and after the conversion includes RGB, or HSV, or YUV, or HSL, or CIE-Lab, or CIE_Luv, or CMY, or CMYK, or XYZ; Selecting, from the overlapping area, the paired feature points of the two adjacent corrected video images; the generating unit 950
  • the selecting unit 940 includes at least one unit: a first selecting unit 941, configured to perform SIFT feature point detection on the overlapping area, and match the detected feature points. Obtaining a plurality of sets of paired feature points of two adjacent corrected video images; a second selecting unit 942, configured to perform SIFT feature point detection on the overlapping area, and match the detected feature points to obtain two adjacent a plurality of sets of paired feature points of the corrected video image, wherein the paired feature points are divided into regions having the same range, and an average value of the color features of the divided regions is assigned to the paired feature points; the third selection unit 943 And dividing the overlapping area, the corresponding area of the two corrected video images is used as a pairing feature point, and an average value of the color features of the corresponding area is assigned to the pairing feature point; a four selection unit 944, configured to receive the area block selected by manually from the overlapping area, the selected two corrected video images The corresponding region block of the image is used as a pairing
  • the generating unit 950 may include: a color matrix establishing unit 951, configured to: Establishing, respectively, a color space matrix of the source video image and the target video image, each row of the color space matrix representing a color space attribute of one of the paired feature points; a transformation relationship establishing unit 952, configured to establish the a color space matrix of the source video image and a color space matrix transformation relationship of the target video image, the transformation relationship being: the source video image a product of the color space matrix and the color correction matrix plus an error amount equal to a color space matrix of the target video image; a solution correction matrix unit 953 for determining, when the error amount is the smallest, according to the transformation relationship Color correction matrix.
  • a color matrix establishing unit 951 configured to: Establishing, respectively, a color space matrix of the source video image and the target video image, each row of the color space matrix representing a color space attribute of one of the paired feature points
  • a transformation relationship establishing unit 952 configured to establish the a color space matrix of the source video image and a
  • the correcting unit 960 may include: a video image receiving unit 961, configured to receive an input device that inputs the source video image. a video image to be corrected, such as a video image to be stitched; a color matrix generating unit 962, configured to generate a color space matrix of the video image to be corrected; a color matrix transforming unit 963, configured to use the color correction matrix The color space matrix of the corrected video image is multiplied, and the multiplied result is used as a color space matrix of the corrected video image; the correction result generating unit 964 is configured to: according to the color space of the corrected video image The matrix generates the corrected video image.
  • a video image receiving unit 961 configured to receive an input device that inputs the source video image.
  • a video image to be corrected such as a video image to be stitched
  • a color matrix generating unit 962 configured to generate a color space matrix of the video image to be corrected
  • a color matrix transforming unit 963 configured to use the color correction matrix
  • the correcting unit 660 may include: a video image receiving unit 96 ⁇ for receiving a video image to be corrected transmitted by the input device, the to be corrected The video image is the Kth video image of the N video images; the first color matrix generating unit 962' is configured to generate a color space matrix of the video image to be corrected; the correction matrix generating unit 963', Multiplying the first color correction matrix to the K-1th color correction matrix to obtain a color correction matrix of the video image to be corrected; a second color matrix generation unit 964' for using the color correction matrix Multiplying a color space matrix of the video to be corrected, and using the multiplied result as a color space matrix of the corrected video image; N-result generating means 965, to generate
  • steps S504 to S507 respectively, the color correction matrices of Z1, Z2, Z3 and Z4 are respectively solved, which are represented as Mat-1, Mat-2, Mat-3, Mat-4.
  • the color correction matrix Mat-2 corresponding to the image captured by the second camera is known.
  • ' Mat— 1;
  • the color correction matrix corresponding to the image captured by the third camera Mat — 3′ Mat— lXMat — 2;
  • the color correction matrix corresponding to the image captured by the fourth camera Mat — 4′ Mat— IX Mat — 2 X Mat— 3;
  • the color correction matrix corresponding to the image captured by the 5th camera Mat-5' Mat-1 X Mat-2 X Mat-3 X Mat-4.
  • FIG. 10 is a flowchart of a method for a stereoscopic panoramic video conference according to Embodiment 7 of the present invention, where the method includes:
  • S1001 Obtain a video stream of the same site from at least two perspectives;
  • S1002 Acquire image data of multiple depth levels from the corresponding video stream according to depth information of each video stream;
  • S1003 performing depth information-based splicing on the acquired video streams of different views to generate a stereoscopic panoramic video stream
  • S1004 Display a video image of the stereoscopic panoramic video stream according to a category of the terminal display On the terminal display.
  • FIG. 11 is a flowchart of a video splicing provided in Embodiment 7 of the present invention, which includes: Step s
  • Step S1102 First, detect a change region of the first frame and the previous frame of the current frame for the subsequent video sequence.
  • Step S1108 Determine whether it is the last frame?
  • step S1102. Since the image sequences generally have strong correlations, and the changed regions are only a part of the scene content, the algorithm can significantly reduce the complexity of the video stitching algorithm. Thus, a more complex algorithm can be used for video stitching. , to obtain more accurate panoramic stitching video while satisfying the real-time video stitching.
  • the current frame is used to calculate the change region in the previous frame.
  • the current frame may also be used to detect the change region relative to the initial frame.
  • the first and second image sequences as shown in FIG. 12 can be obtained by using the camera shown in FIG. 1, and the video sequence is spliced to the corresponding image pairs in the first image sequence and the second image sequence to obtain each image.
  • the splicing diagram of the pair the splicing diagram is stereo coded and output. Determining the terminal display category, if the terminal display device is a two-dimensional display, displaying the two-dimensional image information of the composite video image; if the terminal display device is a three-dimensional stereo display, displaying the three-dimensional stereoscopic image information of the composite video image;
  • the device is a multi-layer display that displays image information for multiple depth levels of the composite video image.
  • An advantageous effect of the embodiments of the present invention is that a panoramic, high-resolution, seamless, three-dimensional conference experience can be provided to the user. It solves the problem of ghosting caused by parallax when multi-view video splicing, especially for scenes with near-view parallax.
  • a multi-display mode for different display devices is provided.
  • the multi-layer display can be used to realize the separate display of the front and rear scenes. Can also have a better three-dimensional feeling.
  • stereoscopic displays and flat panel displays can be used to achieve a more accurate and better stereo experience.
  • FIG. 13 is a flowchart of a method for a stereoscopic panoramic video conference according to Embodiment 8 of the present invention, where the method includes:
  • S 1301 Obtain a video stream of the same site from at least two perspectives.
  • S1302 Acquire image data of a plurality of depth levels from the corresponding video stream according to depth information of each video stream.
  • Steps S1301 - S1302 are similar to those in the seventh embodiment, and will not be described again here.
  • S 1303 Acquiring at least two corrected video images, wherein two or more adjacent corrected video images of the at least two corrected video images have overlapping regions.
  • S 1304 Select pairing feature points of the two adjacent corrected video images from the overlapping area.
  • S 1305 Generate a color correction matrix of the two adjacent corrected video images according to the paired feature points.
  • Step S1303-S1306 is to perform color correction on the acquired video stream.
  • the color correction matrix only needs to be generated once. If it is necessary to perform color correction on different video images to be spliced later, the S1303-S1305 may not need to be repeated, and the S1306 may be directly executed.
  • steps S1303 - S1305 may be performed in front of step S1301 to obtain a color correction matrix.
  • S 1307 Perform depth information-based splicing on the obtained corrected video streams of different views to generate a stereoscopic panoramic video stream.
  • FIG. 14 is a structural diagram of a stereoscopic video conference device according to Embodiment 9 of the present invention.
  • the device includes: a depth information acquiring device 1401, configured to acquire a video stream of the same site from at least two perspectives;
  • the obtaining device 1402 is configured to acquire image data of a plurality of depth levels from the corresponding video stream according to the depth information of each video stream;
  • the stereoscopic panoramic video stream generating device 1403 is configured to perform depth information based on the acquired video streams of different viewing angles.
  • the splicing generates a stereoscopic panoramic video stream;
  • the video image display device 1404 is configured to display the video image of the stereoscopic panoramic video stream on the terminal display according to the category of the terminal display.
  • the depth camera (1501, 1502, 1503, 1504) is connected to the stereoscopic panoramic video conference device 1400, and the depth information acquisition device 1401 receives the video stream of the same site acquired from the four perspectives;
  • the acquiring device 1402 acquires image data of a plurality of depth levels from the corresponding video stream according to the depth information of each video stream;
  • the stereoscopic panoramic video stream generating device 138 performs stitching based on the video image depth information on the acquired video streams of different viewing angles.
  • the video image display device 1404 is configured to display the video image of the stereoscopic panoramic video stream on the terminal display according to the category of the terminal display.
  • the stereoscopic panoramic video conferencing device 1400 further includes a gesture instruction storage device 1505 for storing a mapping relationship between the gesture information and the display control instruction.
  • the display instruction obtaining device 1506 is configured to obtain a corresponding display control from the mapping relationship according to the acquired gesture information.
  • the video image display device 1404 includes: a display category determining unit 140 41 and a display 14042, and the display 14042 includes: a two-dimensional display or a three-dimensional display or more
  • the display type determining unit 14041 determines that the terminal display is a two-dimensional, three-dimensional or multi-dimensional display, and if the display 14042 is a two-dimensional display, displays two-dimensional image information of the composite video image; if the display 14042 is a three-dimensional display And displaying the three-dimensional stereoscopic image information of the composite video image; if the display 14042 is a multi-layer display, displaying the image information of the plurality of depth levels of the composite video image.
  • the fast video splicing method of the embodiment of the present invention is as follows:
  • S 1601 The background of the conference scene is photographed by two or more cameras in advance, and the background image of the unmanned conference site is spliced, and the panorama and the conference background map are pre-stored;
  • S 1602 input two or more video streams, and splicing the first frame and the second picture of the initial frame;
  • Step S 1605 If the change area is too large, perform complete panorama stitching; Step S1608: Read the next frame again; Step S1606: If it is not too large, perform foreground image stitching of the changed area; Step S 1607: Update the mosaic of the corresponding area of the previous frame, and add the background panorama; Step S1608: Read the next frame again;
  • Step S1609 Is it the last frame? If it is the last frame, it ends; if it is not the last frame, it goes to step S 1603.
  • the panoramic mosaic is stereo coded and output. Determining the terminal display category, if the terminal display device is a two-dimensional display, displaying the two-dimensional image information of the composite video image; if the terminal display device is a three-dimensional stereo display, displaying the three-dimensional stereoscopic image information of the composite video image;
  • the device is a multi-layer display that displays image information for multiple depth levels of the composite video image.
  • An advantageous effect of the embodiments of the present invention is that a panoramic, high-resolution, seamless, three-dimensional conference experience can be provided to the user. It solves the problem of ghosting caused by parallax when multi-view video splicing, especially for scenes with near-view parallax.
  • a multi-display mode for different display devices is provided.
  • the multi-layer display can be used to realize the separate display of the front and rear scenes. Can also have a better three-dimensional feeling.
  • stereoscopic displays and flat panel displays can be used to achieve a more accurate and better stereo experience.
  • it provides a more friendly data cooperation method, which can realize gesture commands issued by different people in different venues, and the functions are displayed on the same display device, so that different people in different venues have the same venue and control data at the same time, and the conference system Feeling.
  • This embodiment utilizes a depth camera to make remote terminal data cooperation and conference control of the video or telepresence conference system more convenient and convenient. Due to the presence of the depth camera, the hand, finger, and palm can be recognized based on the depth camera. In turn, the instructions issued by the hand are recognized.
  • Step 1 Different participants in the venue issue a gesture command, and the depth camera makes an instruction to determine; Step 2.
  • the driving action indicated by the instruction is displayed on the remote terminal device.
  • One application scenario presented here is: a multi-site 2D/3D/Mul it-L ayer multi-view video conferencing system based on a depth camera as shown in FIG.
  • the data in the conference sites B, C, D, and E are simultaneously displayed in the display of the display data of the conference site A.
  • Site B can control the display of its data content by gestures.
  • C, D, and E can also control the display mode of each data content through gestures.
  • the person in the venue A controls the content of the site C by gestures to see what he wants to see.
  • some gestures of remote control data display mode can be reasonably defined to friendlyly control and display the conference data content between different venues.
  • the site B controls the display of its data in the site A.
  • the gesture can be defined as some common gesture models in the local site application.
  • the person in the venue A controls the content of the site C by gestures to see what you want to see.
  • the mapping relationship between the gesture and the display control command can be defined as:
  • erecting an index finger indicates that the data of the first site is displayed, and the control focus is placed on the first site data.
  • the index finger and the middle finger are erected to indicate the second site data, and the focus is placed on the first site data.
  • the focus is placed on the third site data; the finger except the thumb is displayed, indicating that the fourth site data is displayed, and the focus is placed on the fourth site data; the thumb is fixed, and the other fingers are rotated, indicating that the fifth is displayed in turn.
  • the sixth the site data, the focus is positioned to the site data that is positioned when the rotation stops; the palm is extended, the vertical arm is pulled back, and the chest is displayed, indicating that the full screen displays the focus site data.
  • the mapping relationship between the gesture information and the display control command may be stored; the depth camera captures the gesture of the person in the venue and generates the gesture information, and the corresponding display control instruction is obtained from the mapping relationship between the gesture information and the display control instruction; And controlling display of the terminal display device according to the obtained display control instruction.
  • the terminal display device is a 2D display, displaying two-dimensional image information of the composite video image; if the terminal display device is a 3D stereoscopic display, displaying three-dimensional stereoscopic image information of the composite video image; if the terminal display device is A multi-layer display displays image information at multiple depth levels.
  • Embodiments of the present invention are capable of providing a panoramic, high resolution, seamless, three dimensional conference experience to a user. Achieve a higher level of real feeling than traditional telepresence. It solves the problem of ghosting caused by parallax in multi-view video splicing, especially for scenes with near-view parallax. Provides a fast, real-time video stitching method. It can reduce the complexity of video stitching and improve the efficiency of video stitching.
  • a multi-display mode for different display devices is also provided. We can use the multi-layer display to realize the separate display of the front and rear scenes, and also have a better physical experience. Similarly, a stereoscopic display can be used to achieve a more accurate and better stereo experience. It also provides a more friendly way to collaborate on data. It can realize the gesture commands issued by different people in different venues, and the effect is displayed on the same display device, so that different people in different venues can have the same venue location and control the data and conference system.
  • Example ten Example ten
  • FIG. 21 is a structural diagram of a stereoscopic panoramic video conference device according to Embodiment 10 of the present invention.
  • the device includes: a depth information acquiring device 2110, a layered image acquiring device 2120, a video image correcting device 2130, and a stereoscopic panoramic video stream generation.
  • the device 2140 and the video image display device 2150 Its
  • the medium depth information acquiring means 2110, the layered image obtaining means 2120, and the video image display means 215 0 are similar to the ninth embodiment, and will not be described again.
  • the video image correcting device 2130 is configured to perform color correction on the video stream acquired by the depth information acquiring device 2110. In the embodiment, it is connected to the layered image acquiring device 2120, that is, the image data of the video stream depth level is completed. After obtaining the color correction, the embodiment does not exclude the color correction of the acquired video stream, and then acquires the image data of the depth level.
  • the video image correcting device 2130 includes: an obtaining unit 2131, a selecting unit 2132, a generating unit 2133, and a correcting unit 2134, which are sequentially connected to each other:
  • the obtaining unit 2131 is configured to acquire at least two corrected video images, wherein two or two adjacent corrected video images of the at least two corrected video images have overlapping regions;
  • the selecting unit 2132 is configured to select, from the overlapping area, the paired feature points of the two adjacent corrected video images;
  • the generating unit 2133 is configured to generate a color correction matrix of the two adjacent corrected video images according to the paired feature points;
  • a correction unit 2134 is for correcting the video stream by the color correction matrix.
  • the color correction matrix may be completed in the layered image acquiring device 2120.
  • the image data of the video stream depth level is generated after the image data is acquired.
  • this embodiment does not exclude that the color correction matrix has been generated before the video stream is acquired. In this embodiment, only the pre-generated image is needed.
  • the color correction matrix corrects the video stream.
  • the stereoscopic panoramic video stream generating device 2140 is configured to perform depth information-based splicing on the obtained corrected video streams of different views to generate a stereoscopic panoramic video stream.
  • This embodiment not only has the advantages described in Embodiment 9, but also performs color correction on the video image in the embodiment of the present invention, so that a panoramic video conference experience with better bright chroma can be obtained.
  • this embodiment since this embodiment only needs to have overlapping regions between two images when calculating the color correction matrix, In the process, whether there is an overlapping area between the video images, the color correction matrix can be used for color correction, and the color correction matrix only needs to be generated once, which saves the time for color correction of the video image.
  • the present invention can be implemented by means of software plus the necessary general purpose hardware platform. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a storage medium such as a ROM/RAM, a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Description

立体全景视频流生成方法、 设备及视频会议方法和设备 本申请要求 2008年 12月 30日递交的申请号为 200810247531. 5、发明名 称为 "立体全景视频流生成方法、 设备及视频会议方法和设备" 以及 2009年 2月 26日递交的申请号为 200910118629. 5、 名称为 "多视点视频图像校正方 法、 装置及系统" 的中国专利申请的优先权, 其全部内容通过引用结合在本 申请中。
技术领域
本发明关于视频拼接技术, 特别是关于网真会议系统中的视频拼接技术, 具体的讲是一种立体全景视频流生成方法、 设备及视频会议方法和设备。
背景技术
现有的网真 (Telepresence)技术是一种将高质量的音频、 高清晰视频画 面和交互式组件结合在一起的技术, 旨在通过网络提供有如身临其境的独特 体验。 例如, 在网真会议系统中利用高级视频、 音频和协作技术能够为用户 提供实时的面对面交互体验。 网真会议系统甚至能提供房中套房的画面, 通 过实物大小的图像、 高清晰的分辨率以及立体和多声道音频, 围绕着一张虚 拟会议桌创造面对面的会议体验。 虽然现有的网真能够给用户带来相对于传 统会议系统的更好的、 更真实的房中套房会议体验, 但距离现实的面对面的 真人交流还差距, 那就是缺乏真实的立体感受。 人们得到的视频信息仅仅是 二维平面的信息, 并没有获得有深度层次感觉的交流信息。
现有的立体 (3D)视频技术可以提供符合立体视觉原理的深度信息, 从而 能够真实地重现客观世界景象, 表现出场景的纵深感、 层次感和真实性, 是 当前视频技术发展的重要方向。 但由于技术不成熟, 显示设备昂贵和缺乏标 准等原因一直没有大规模应用。
现有的图像拼接技术可以突破成像设备本身的物理限制, 得到大视场的 数字全景图像。 但是, 图像拼接中的 (一) 重构虚拟视点的遮挡与空洞问题; (二) 原始视点差距较大, 导致视差较大, 进而需要产生连续视点的中间虚拟 视点图的数目将陡然增加, 运算量将增大的问题; (三) 视差的计算问题; 仍然没有得到很好的解决。
由于存在上述的诸多问题, 所以现有技术的网真会议系统还无法向用户 提供全景的、 高分辨率的、 无缝的以及三维立体的会议体验。
发明内容
为了克服现有技术中的缺陷, 本发明实施例提供了一种立体全景视频流 生成方法、 设备及视频会议方法和设备, 用以能够基于不同显示设备的多显 示方式向用户提供全景的、 高分辨率的、 无缝的和三维立体的网真会议视频 图像。
本发明实施例的目的之一是, 提供一种立体全景视频流生成方法, 该方 法包括: 获取至少两个待拼接视频图像的深度信息; 根据每个待拼接视频图 像的深度信息从对应的待拼接视频图像中获取多个深度层次的图像数据; 根 据获取的多个深度层次的图像数据进行视频图像数据间的拼接, 生成立体全 景视频流。
本发明实施例的目的之一是, 提供一种立体全景视频会议方法, 该方法 包括: 至少从两个视角同歩获取同一会场的视频流; 根据每个视频流的深度 信息从对应的视频流中获取多个深度层次的图像数据; 对获取的不同视角的 视频流进行基于深度信息的拼接, 生成立体全景视频流; 根据终端显示器的 类别, 将所述立体全景视频流的视频图像显示在终端显示器上。
本发明实施例的目的之一是, 提供一种立体全景视频流生成设备, 该设 备包括: 深度信息获取装置, 用于获取至少两个待拼接视频图像的深度信息; 分层图像获取装置, 用于根据每个待拼接视频图像的深度信息从对应的待拼 接视频图像中获取多个深度层次的图像数据; 立体全景视频流生成装置, 用 于根据获取的多个深度层次的图像数据进行视频图像数据间的拼接, 生成立 体全景视频流。
本发明实施例的目的之一是, 提供一种立体全景视频会议设备, 所述的 设备包括: 深度信息获取装置, 至少从两个视角同歩获取同一会场的视频流; 分层图像获取装置, 用于根据每个视频流的深度信息从对应的视频流中获取 多个深度层次的图像数据; 立体全景视频流生成装置, 对获取的不同视角的 视频流进行基于深度信息的拼接, 生成立体全景视频流; 视频图像显示装置, 用于根据终端显示器的类别, 将所述立体全景视频流的视频图像显示在终端 显 器上 °
本发明实施例的有益效果在于, 通过本发明实施例提供的技术方案, 实 现了快速、 实时的视频图像拼接, 降低了视频图像拼接的复杂度, 提高了视 频图像拼接的效率。 可以向用户提供全景的、 亮色度效果好的、 无缝的、 三 维立体的会议体验, 能够获得比传统网真更高级的、 真实的感受。 附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对实施例中所需要 使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的 一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前提 下, 还可以根据这些附图获得其他的附图。
图 1 为本发明实施例提供的基于深度摄像机的多视点视频会议系统示意 图;
图 2为本发明实施例一提供的立体全景视频流生成流程图;
图 3为本发明实施例一提供的基于人物层和非人物层的视频拼接示意图; 图 4为本发明实施例二提供的一种立体全景视频流生成方法流程图; 图 5为本发明实施例三提供的一种颜色校正方法流程图;
图 6为本发明实施例四提供的一种颜色校正方法流程图;
图 7A为本发明实施例五提供的立体全景视频流生成设备结构图; 图 7B为本发明实施例五提供的的立体全景视频流生成装置结构框图; 图 8A为本发明实施例六提供的立体全景视频流生成设备结构图; 图 8B为本发明实施例六提供的视频图像校正装置结构图;
图 9A为本发明实施例六提供的另一种视频图像校正装置结构图; 图 9B为本发明实施例六提供的一种选择单元结构图;
图 9C为本发明实施例六提供的一种生成单元结构图;
图 9D为本发明实施例六提供的一种校正单元结构图;
图 9E为本发明实施例六提供的另一种校正单元结构图;
图 10为本发明实施例七提供的的立体全景视频会议方法流程图; 图 11为本发明实施例七提供的视频拼接流程图;
图 12为本发明实施例七提供的两个视频图像序列的示意图;
图 13为本发明实施例八提供的立体全景视频会议方法流程图; 图 14为本发明实施例九提供的的立体全景视频会议设备结构图; 图 15A为本发明实施例九提供的立体全景视频会议系统的结构框图; 图 15B本发明实施例九提供的视频图像显示装置的结构框图;
图 16为本发明实施例九提供的快速视频拼接方法流程图;
图 17为本发明实施例九提供的基于深度摄像机构成的多会场多视点视频 会议系统图;
图 18为本发明实施例九提供的的会场 A的示意图;
图 19和图 20为本发明实施例九提供的手势指令示意图;
图 21为本发明实施例十提供的的立体全景视频会议设备结构图。 具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
如图 1 所示, 本具体实施方式提出了基于深度摄像机构成的多会场、 二 维 (2D) /三维 (3D ) /多层 (Mul it-Layer) 、 以及多视点的视频会议系统。
其中会场 A包括: 深度摄像机 (101A, 102A) , 视频会议服务器 103A, 和终端显示设备 (104A, 105A) 。 深度摄像机 (101A, 102A) 通过视频会议 服务器 103A与终端显示设备 (104A, 105A) 相连接, 终端显示设备 (104A, 105A) 可以是 2D显示器、 3D显示器或多层显示器 (Mul it-Layer) 。
会场 B包括: 深度摄像机(111B, 112B) , 服务器 113B, 深度摄像机(1 11B, 112B) 与服务器 113B相连接。
会场 C包括: 深度摄像机(121C, 122C) , 服务器 123C, 深度摄像机(1 21C, 122C) 与服务器 123C相连接。
会场 D包括: 深度摄像机 (131D,132D) , 服务器 133D, 深度摄像机 (1 31D, 132D) 与服务器 133D相连接。
服务器 103A通过网络 142和传输设备 141分别与服务器 (113B, 123C, 133D) 相连接。 网络 142可以是电缆、 互联网或卫星网。
实施例一
如图 2所示, 本发明实施例的立体全景视频流生成方法包括以下歩骤: S201 : 获取至少两个待拼接视频图像的深度信息;
S202: 根据每个待拼接视频图像的深度信息从对应的待拼接视频图像中 获取多个深度层次的图像数据;
S203: 根据获取的多个深度层次的图像数据进行视频图像数据间的拼接, 生成立体全景视频流。 如图 1 所示, 由深度摄像机 (111B, 112B) 从两个视角同歩获取会场 B 的视频流和每帧图像的深度信息; 根据图像的深度信息获得不同深度层次的 深度图像, 对于深度层次变化很小的区域, 一般只作一次图像拼接; 而对于 运动变化的人和物, 需要实时进行逐帧图像拼接。
深度层次变化小的区域一般是指: 会议场景中固定的家具、 位置固定的 视讯通讯设备 (如: 摄像机、 大屏幕显示设备、 打印机等) , 这些区域基本 是不变化的, 深度层次也基本不变化或者变化很小, 这样就可以通过诸如深 度摄像机的方式把这些深度层次变化小的区域预先提取出来, 单独进行 2个 摄像机视频的无缝拼接。
深度层次变化大的区域一般是指, 运动的人或物 (如椅子) 。 与会的人 一般会做些动作, 同时椅子也会动。 如果人 (不伸手) 相对摄像机前后运动 较大, 将导致人物在时间轴中所体现的深度层次变化较大, 但在同一时刻不 同摄像机所拍摄图像中的人还是在同一深度层次, 这个很容易进行无缝图像 拼接, 利用传统的图像拼接技术就可以实现。 如果人 (伸手) 相对摄像机前 后运动较大, 在同一时刻不同摄像机所拍摄图像中的人则不在同一深度层次, 产生了不同层次的深度 /视差, 在进行图像拼接时, 需根据每个视频图像的深 度信息从对应的视频图像中获取人物图像数据和非人物图像数据。 对非人物 图像数据进行拼接, 生成非人物拼接图像数据; 对人物图像数据进行拼接, 生成人物拼接图像数据; 将所述的人物拼接图像数据与所述的非人物拼接图 像数据进行粘贴, 生成立体全景视频流。 对人物图像数据进行拼接时, 可以 检测每个视频流当前帧中每个人物图像数据相对上一帧对应人物图像数据的 图像变化区域, 确定所述的变化区域大于设定的阈值后, 则仅对变化区域的 人物图像数据进行拼接。
或者根据每个视频图像的深度信息从对应的视频图像中获取背景视频图 像数据和前景图像数据。 对获取的背景图像数据进行拼接, 生成背景全景图 像数据; 对获取的前景图像数据进行拼接, 生成前景拼接图像数据; 将所述 的前景拼接图像数据粘贴到所述的背景全景图像数据, 生成立体全景视频流。 对前景视频图像数据进行拼接时, 可以检测每个视频流当前帧的前景图像数 据相对上一帧前景图像数据的视频图像变化区域, 确定所述的变化区域大于 设定的阈值后, 则仅对变化区域的前景图像数据进行拼接。
如图 3所示, 根据图像的深度信息从视频图像 (301, 302 ) 中获取人物 图像数据 (306, 307 ) 和非人物图像数据 (303, 304 ) ; 对非人物图像数据 ( 303, 304 ) 进行拼接, 生成非人物拼接图像数据 305 ; 对人物图像数据 (30 6, 307 )进行拼接, 生成人物拼接图像数据 308 ; 将所述的人物拼接图像数据 308与所述的非人物拼接图像数据 305进行粘贴,生成合成视频图像 309并编 码输出。
通过本发明实施例提供的技术方案, 实现了快速、 实时的视频图像拼接, 降低了视频图像拼接的复杂度, 提高了视频图像拼接的效率。 可以向用户提 供全景的、 高分辨率的、 无缝的、 三维立体的会议体验。 能够获得比传统网 真更高级的、 真实的感受。 解决了在多视点视频拼接时, 出现的由于视差带 来的重影问题, 特别是对于近景视差较为明显的场景。 实施例二
如图 4所示为本发明实施例二提供的一种立体全景视频流生成方法流程 图。 该方法包括以下歩骤:
S401 : 获取至少两个待拼接视频图像的深度信息。
S402: 根据每个待拼接视频图像的深度信息从对应的待拼接视频图像中 获取多个深度层次的图像数据。
歩骤 S401和歩骤 S402和实施例一中歩骤 S201及歩骤 S202相类似, 在 此就不再进行赘述了。
S403 : 获取至少两个校正视频图像, 所述至少两个校正视频图像中两两 S404: 从所述重叠区域选择所述两两相邻的校正视频图像的配对特征点。 在实施歩骤 S404的过程中, 可以采用多种方法获取两两相邻的视频图像 的配对特征点, 如: Harris特征点检测方法、 SUSAN ( Smal lest Univalue S egment Assimi lating Nucleus )特征点检测方法、 基于小波的特征点检测方 法、 SIFT ( Scale-invariant feature transform, 尺度不变特征变换)特征 点检测方法等, 对此, 本发明的实施例对此不作任何限制。
S405 : 根据所述配对特征点生成所述两两相邻的校正视频图像的颜色校 正矩阵。
S406: 通过所述颜色校正矩阵对所述待拼接视频图像进行校正。
需要指出的是, 该颜色校正矩阵只需要生成一次, 以后如果需要对不同 的待拼接视频图像进行颜色校正, 歩骤 S403-S405可以不必重复进行, 而直 接执行歩骤 S406即可。
因此, 作为本发明的一个实施例, 歩骤 S403-S405可以在歩骤 S401前面 予以执行, 以得到颜色校正矩阵为目的。
S407: 根据获取的多个深度层次的图像数据对校正后的待拼接视频图像 进行拼接, 生成立体全景视频流。
本发明实施例不但可以向用户提供全景的、 高分辨率的、 无缝的、 三维 立体的会议体验, 而且由于本发明实施例对视频图像进行颜色校正, 还可以 向用户提供具有较好亮色度的全景视频流。
现有技术中的直方图校正虽然也可以校正上述亮色度的偏差, 但是, 由 于采用直方图对视频图像的亮色度进行校正时要求视频图像之间具有较大的 相似性, 因此在多视点拍摄图像的场景中, 当视频图像之间的重叠区域较小 或者视频图像之间没有重叠区域时, 由于视频图像存在较大差异, 容易导致 校正结果较差, 或者校正失败; 并且由于采用直方图进行校正时需要对每一 个图像进行实时统计校正, 校正时间较长。
而由于本发明实施例仅在计算颜色校正矩阵时需要两两图像之间具有重 叠区域, 在校正过程中无论视频图像之间是否有重叠区域, 均可以通过该颜 色校正矩阵进行颜色校正, 并且颜色校正矩阵只需要生成一次, 可以节省对 视频图像进行颜色校正的时间。 实施例三
本发明实施例是对全景视频流生成过程中颜色校正的另一种实施方式的 说明。 该实施例示出了对两台摄像机所拍摄的视频图像进行校正的过程, 其 中在初始计算颜色校正矩阵时两台摄像机拍摄的两个视频图像需要有重叠区 域, 后续通过颜色校正矩阵对不相邻的两个视频图像进行校正时, 两台摄像 机所拍摄的图像可以没有重叠区域, 但是, 所拍摄的相邻的两个视频图像之 间需要具有重叠区域。
如图 5所示为本发明实施例三提供的一种颜色校正方法流程图。 该方法 包括以下歩骤:
S501 : 接收两台摄像机拍摄的具有重叠区域的视频图像。
假设两台摄像机分别为源摄像机和目标摄像机, 其中源摄像机拍摄的是 源视频图像, 目标摄像机拍摄的是目标视频图像, 需要将源视频图像的颜色 校正为与目标图像的颜色一致。
初始可以调整两台摄像机的位置, 使得所拍摄的源视频图像和目标视频 图像具有重叠区域。 与现有采用直方图校正需要有大范围重叠区域相比, 对 该重叠区域的大小范围不做限制, 只要保证具有重叠区域即可。
S502: 对所拍摄的两个视频图像进行图像预处理。
对视频图像的预处理包括通常采用的平滑降噪处理和畸变校正处理, 该 歩骤为可选歩骤。
S503: 对预处理后的两个视频图像进行颜色空间转换。
可以对摄像机所拍摄的视频图像进行颜色空间转换, 转换前后的视频图 像的格式可以为 RGB (Red Green Blue, 红绿蓝三原色) , 或 HSV、 或 YUV、 或 HS 或 CIE-Lab、 或 CIE_Luv、 或 CMY、 或 CMYK、 或 XYZ中的任意一种格 式。
S504: 从两个视频图像的重叠区域选择配对特征点。
对于本领域技术人员可以理解, 从两个视频图像获取特征点的方式具有 多禾中, 譬如: Harris特征点检测方法、 SUSAN ( Smal lest Univalue Segment Assimi lating Nucleus )特征点检测方法、 基于小波的特征点检测方法、 SIF T ( Scale-invariant feature transform, 尺度不变特征变换) 特征点检测 方法等, 为了达到更好的效果, 本发明的实施例采用的是对旋转、 尺度縮放、 亮度变化保持不变性, 且对视角变化、 仿射变换、 噪声也保持一定程度稳定 性的 SIFT ( Scale-invariant feature transform,尺度不变特征变换)算法, 正如本领域技术人员能够理解的那样, 还可以采用上述的以及其他的特征提 取算法获得两个视频图像的重叠区域的配对特征点, 对此, 本发明的实施例 不作任何限制。
在选择重叠区域的配对特征点时, 可以采用如下四种方式:
方式一:对重叠区域进行 SIFT特征点检测,对检测到的特征点进行匹配, 得到相邻两个视频图像的多组配对特征点。
其中, SIFT特征点检测方式是现有图像处理中最常用的方式, 采用 SIFT 特征点检测可以满足仿射和亮色度不变。 需要指出的是, 除了 SIFT特征点检 测, 现有技术中还存在多种特征点检测方式, 例如 Harris检测方式、 Susan 角检测方式、 及其改进算法等, 只要能够从重叠区域检测出特征点即可。
对于检测出的特征点, 可以利用 RANSAC (RANdom SAmple Consensus , 随 机采样一致性)方法去除不匹配的特征点, 从而获得稳定可靠的配对特征点。 在去除不匹配的特征点时, 可以采用现有技术 (譬如: 基于概率统计的方法 等) , 对此本发明实施例不做限制。
方式二:对重叠区域进行 SIFT特征点检测,对检测到的特征点进行匹配, 得到相邻两个视频图像的多组配对特征点, 以配对特征点为中心划分范围相 同的区域, 将划分的区域的颜色特征的平均值赋值给配对特征点。 其中, 检测特征点和去除不匹配的特征点与方式一中的描述一致, 在此 不再赘述。
对于每一组配对特征点, 可以分别以特征点为中心划分面积相同的区域 作为两个视频图像的配对区域, 将该区域各个颜色通道的平均值作为特征点 的颜色值, 例如对于 HSL (H-Hue表示色度, S_Saturation表示饱和度, L_L ightness表示亮度)格式的视频图像, 每个配对点都有对应的 H值、 S值和 L 值, 而对应的配对区域是由若干点组成的区域, 将该区域内所有点的 H值、 S 值和 L值分别取平均值, 假设平均值为 H' 、 S ' 和 L' , 则将 H' 、 S ' 和 L' 赋值给该特征点。
方式三: 对所述重叠区域进行分割, 所述分割后的两个视频图像的对应 区域作为配对特征点, 将所述对应的区域的颜色特征的平均值赋值给所述配 对特征点。
对重叠区域进行分割时, 两个视频图像可以分割出若干组配对区域, 每 一组配对区域的范围可以不同, 每个区域内都分别包含若干特征点, 将每个 区域颜色通道取平均值作为特征点的值, 取平均值的过程与方式二中的示例 类似, 在此不再赘述。 通过对配对区域分别计算颜色通道的平均值后, 得到 若干配对特征点。
方式四: 接收通过手动从所述重叠区域中选取的区域块, 所述选取的两 个视频图像的对应区域块作为配对特征点, 将所述对应的区域块的颜色特征 的平均值赋值给所述配对特征点。
方式四与方式三的不同在于, 方式三在对重叠区域进行分割时, 可以按 照预先设置由图像校正装置自动完成, 而方式四通过手动方式从重叠区域选 取若干配对区域, 然后将选取的结果输入到图像校正装置中, 进行后续处理。
S505: 根据配对特征点建立两个视频图像的颜色空间矩阵。
假设两个视频图像转换颜色空间后为 HSL格式, 且选择了 m (ra为大于 1 的自然数) 个配对点, 则 m个配对点对应的源视频图像和目标视频图像的颜
Figure imgf000014_0001
其中, Mat— dst为目标视频图像的颜色空间矩阵, Mat— src为源视频图像 的颜色空间矩阵, 以 Mat— dst 中的第一行为例说明矩阵的含义, 该第一行表 示了 m个点中的第一个点, 其中 hl l为该第一个点的饱和度值, sl2为该第一 个点的色度值, 113为该第一个点的亮度, 因此 Mat— dst为 m个配对点中目标 视频图像的 m个目标像素点的^ S、 L值的矩阵。
S506: 建立两个颜色空间矩阵的变换关系, 并根据变换关系求出颜色校 正矩阵。
假设待求的颜色校正矩阵为 Mat— ColorRectify, 建立颜色校正的变换关 系式如下:
Mat_dst = Mat_ColorRectify * Mat_src + error
其中, error表示颜色空间矩阵之间的误差, 基于上述变换关系式求解 e rror, 求角早 error的公式如下:
^ (Mat _ dst - Mat _ Color Re ctify * Mat _ src) 2
!•=1
当上述公式中的 error值最小时的 Mat— ColorRectify, 即为所求解的颜 色校正矩阵。
S507: 保存颜色校正矩阵。
S508: 用颜色校正矩阵对接收到的待拼接视频图像进行颜色校正。
后续无论源摄像机和目标摄像机的位置如何变换, 所拍摄的视频图像是 否有交集, 都可应用求解出的颜色校正矩阵进行颜色校正, 过程如下:
当源摄像机输入待校正的视频图像后, 生成待校正的视频图像的颜色空 间矩阵, 假设待校正的视频图像经过颜色空间转换为 HSL格式的视频图像, 该视频图像由 Y个像素点组成, 则生成颜色空间矩阵为一个(ΥΧ 3 ) 的矩阵, 矩阵的每一行表示一个像素点的 Η值、 S值、 L值。
将颜色校正矩阵 Mat— ColorRectify与待校正的视频图像的颜色空间矩阵 相乘, 将所述相乘的结果作为校正后的视频图像的颜色空间矩阵, 根据所述 校正后的视频图像的颜色空间矩阵生成所述校正后的视频图像。
上述本发明实施例对视频图像进行颜色校正, 由于仅在计算颜色校正矩 阵时需要两两图像之间具有重叠区域, 在校正过程中无论视频图像之间是否 有重叠区域, 均可以通过该颜色校正矩阵进行颜色校正, 并且颜色校正矩阵 只需要生成一次, 节省了对视频图像进行颜色校正的时间。 实施例四
本发明实施例是对全景视频流生成过程中颜色校正的再一种实施方式的 说明。 该实施例示出了对 N台摄像机所拍摄的视频图像进行校正的过程, 其 中在初始计算颜色校正矩阵时, N台摄像机中两两相邻的摄像机所拍摄的视频 图像需要有重叠区域, 共输入 N-1组两两相邻的视频图像, 相应生成 N-1个 颜色校正矩阵, 通过 N-1个颜色校正矩阵对视频图像进行校正时, N台摄像机 所拍摄的图像可以没有共同的重叠区域。
如图 6所示为本发明实施例四提供的一种颜色校正方法流程图。 该方法 包括以下歩骤:
S601 : 接收 N台摄像机拍摄的 N个视频图像, N个视频图像中两两相邻的 视频图像具有重叠区域。
N台摄像机中两两相邻的摄像机为一组,例如摄像机 1和摄像机 2为一组, 摄像机 2和摄像机 3为一组, 以此类推摄像机 N-1和摄像机 N为一组。 每一 组中的两台摄像机分别为源摄像机和目标摄像机, 其中源摄像机拍摄的是源 视频图像, 目标摄像机拍摄的是目标视频图像。
初始可以调整 N 台摄像机的位置, 使得每一组的两两摄像机所拍摄的源 视频图像和目标视频图像具有重叠区域。 与现有采用直方图校正相比, 对该 重叠区域的大小范围不做限制。
S602 : 预处理 N个视频图像。
对视频图像的预处理包括通常采用的平滑降噪处理和畸变校正处理, 该 歩骤为可选歩骤, 其处理过程为现有技术, 在此不再赘述。
S603 : 对预处理后的 N个视频图像进行颜色空间转换。
假设摄像机所拍摄的视频图像为 RGB格式的图像, 可以将 RGB格式的图 像进行颜色空间转换, 转换后的图像可以为 HSV、 或 YUV、 或 HSL、 或 CIE-La b、 或 CIE-Luv、 或 CMY、 或 CMYK、 或 XYZ中的任意一种格式。
S604: 顺序选择 N-1组两两相邻的视频图像的配对特征点。
在实施歩骤 304的过程中, 获取 N-1组两两相邻的视频图像的匹配特征 点的过程, 类似于上一实施例的歩骤 204, 在此不再赘述。
S605 : 根据配对特征点建立两两相邻的视频图像的颜色空间矩阵。
类似于上一实施例的 S505 , 在此不再赘述。
S606 : 建立两个颜色空间矩阵的变换关系, 并根据变换关系求出颜色校 正矩阵。
S607: 保存当前的颜色校正矩阵。
上述歩骤 S604至歩骤 S607为处理 N-1组两两相邻的视频图像中的一组 视频图像的过程, 该过程与前述第二实施例中的歩骤 S504至歩骤 S507的描 述一致, 在此不再赘述。
S608: 判断是否处理完 N-1组视频图像, 若是, 则执行歩骤 S609; 否则, 返回歩骤 S604。
S609: 接收由第 K台摄像机发送的待校正的视频图像。
根据前述歩骤, 已经求出了 N-1个颜色校正矩阵, 其中假设第 1组视频 图像对应第 1个颜色校正矩阵 Mat— 1,第 2组视频图像对应第 2个颜色校正矩 阵 Mat— 2, 以此类推第 N-1组视频图像对应第 N-1个颜色校正矩阵 Mat— N_ l。 S610: 将保存的前 K-l个颜色校正矩阵进行矩阵变换得到第 Κ个视频图 像的颜色校正矩阵。
顺序将第一个颜色校正矩阵至第 K-1个颜色校正矩阵相乘得到第 Κ个摄 像机输入的视频图像的颜色校正矩阵 Mat— (k), 变换公式如下所示: Mat— (k) = Mat— 1 X Mat— 2'" X Mat— (k_l)。
S611 : 用第 K个视频图像的颜色校正矩阵对第 K台摄像机拍摄的视频图 像进行颜色校正。
将上述颜色校正矩阵 Mat— (k)与待校正的视频的颜色空间矩阵相乘,将相 乘的结果作为校正后的视频图像的颜色空间矩阵, 根据校正后的视频图像的 颜色空间矩阵生成校正后的视频图像。
上述本发明实施例对视频图像进行颜色校正, 由于仅在计算颜色校正矩 阵时需要两两图像之间具有重叠区域, 在校正过程中无论视频图像之间是否 有重叠区域, 均可以通过该颜色校正矩阵进行颜色校正, 并且颜色校正矩阵 只需要生成一次, 节省了对视频图像进行颜色校正的时间。 实施例五
如图 7A所示为本发明实施例五提供的立体全景视频流生成设备结构图, 该设备包括: 深度信息获取装置 701 用于获取至少两个待拼接视频图像的深 度信息; 分层图像获取装置 702用于根据每个待拼接视频图像的深度信息从 对应的待拼接视频图像中获取多个深度层次的图像数据; 立体全景视频流生 成装置 703用于根据获取的多个深度层次的图像数据进行视频图像数据间的 拼接, 生成立体全景视频流。
如图 1 所示, 由深度摄像机 (111B, 112B) 从两个视角同歩获取会场 B 的视频流和每帧图像的深度信息; 根据图像的深度信息获得不同深度层次的 深度图像, 对于深度层次变化很小的区域, 一般只作一次图像拼接; 而对于 运动变化的人和物, 需要实时进行逐帧图像拼接。 分层图像获取装置 702根据每个视频图像的深度信息从对应的视频图像 中获取人物图像数据, 根据每个视频图像的深度信息从对应的视频图像中获 取非人物图像数据。 如图 7B所示, 立体全景视频流生成装置 703包括: 图像 拼接单元 7031和图像检测单元 7032 ; 图像拼接单元 7031对非人物图像数据 进行拼接, 生成非人物拼接图像数据, 并对人物图像数据进行拼接, 生成人 物拼接图像数据; 然后将人物拼接图像数据与非人物拼接图像数据进行粘贴, 生成立体全景视频流。 图像检测单元 7032检测每个视频流当前帧中每个人物 图像数据相对上一帧对应人物图像数据的图像变化区域, 确定变化区域小于 设定的阈值后, 图像拼接单元 7031仅对变化区域的人物图像数据进行拼接。
分层图像获取装置 702根据每个视频图像的深度信息从对应的视频图像 中获取前景图像数据, 并根据每个视频图像的深度信息从对应的视频图像中 获取背景图像数据。 立体全景视频流生成装置 703包括: 图像拼接单元 7031 和图像检测单元 7032 ; 图像拼接单元 7031对获取的背景图像数据进行拼接, 生成背景全景图像数据, 并对获取的前景图像数据进行拼接, 生成前景拼接 图像数据; 然后将所述的前景拼接图像数据粘贴到所述的背景全景图像数据, 生成立体全景视频流。 图像检测单元 7032检测每个视频流当前帧的前景图像 数据相对上一帧前景图像数据的视频图像变化区域, 确定所述的变化区域小 于设定的阈值后,图像拼接单元 7031仅对变化区域的前景图像数据进行拼接。
通过本发明实施例提供的技术方案, 实现了快速、 实时的视频图像拼接, 降低了视频图像拼接的复杂度, 提高了视频图像拼接的效率。 可以向用户提 供全景的、 高分辨率的、 无缝的、 三维立体的会议体验。 能够获得比传统网 真更高级的、 真实的感受。 解决了在多视点视频拼接时, 出现的由于视差带 来的重影问题, 特别是对于近景视差较为明显的场景。 实施例六
如图 8A所示为本发明实施例六提供的立体全景视频流生成设备结构图, 该设备包括: 深度信息获取装置 801、 分层图像获取装置 802、 视频图像校正 装置 803和立体全景视频流生成装置 804,其中深度信息获取装置 801和分层 图像获取装置 802与实施例五中相类似, 在此不再赘述。
视频图像校正装置 803包括: 获取单元 8031、 选择单元 8032、 生成单元 8033和校正单元 8034 (如图 8B所示) 。
其中, 获取单元 8031用于获取至少两个校正视频图像, 所述至少两个校 正视频图像中两两相邻的校正视频图像具有重叠区域, 所述获取单元 510具 体可以采用如下方式: 摄像机、 摄像头等其他的摄像设备实现; 选择单元 80 32用于从所述重叠区域选择所述两两相邻的校正视频图像的配对特征点, 选 择单元 8032可以采用如下方式实现: 集成了图像特征点提取与匹配的处理器 专用芯片或采用通用的处理器芯片, 结合图像特征提取以及匹配的算法来实 现; 生成单元 8033用于根据所述配对特征点生成所述两两相邻的校正视频图 像的颜色校正矩阵, 所述生成单元 8033可以采用: 集成矩阵处理功能的 CPL D ( Complex Programmable Logic Device , 复杂可编程逻辑器件) 完成, 或 利用 FPGA (Field Programmable Gate Array, 现场可编程门阵列) 来实现; 校正单元 8034用于通过所述颜色校正矩阵对接收到的待拼接视频图像进行校 正。
本发明中视频图像校正装置的第二实施例框图如图 9A所示,该装置包括: 获取单元 910、 预处理单元 920、 转换单元 930、 选择单元 940、 生成单元 95 0和校正单元 960。
其中, 获取单元 910用于获取至少两个校正视频图像, 所述至少两个校 正视频图像中两两相邻的视频图像具有重叠区域, 所述获取单元 910具体可 以采用如下方式: 摄像机、 摄像头等其他的摄像设备实现; 预处理单元 920 用于当所述获取单元获取到至少两个校正视频图像后, 对所述至少两个校正 视频图像进行预处理, 所述预处理包括平滑降噪处理, 和 /或畸变校正处理, 其中, 预处理单元 920为可选单元; 转换单元 930用于对所述至少两个校正 视频图像进行颜色空间转换, 所述转换前后的校正视频图像的格式包括 RGB、 或 HSV、 或 YUV、 或 HSL、 或 CIE-Lab、 或 CIE_Luv、 或 CMY、 或 CMYK、 或 XYZ; 选择单元 940用于从所述重叠区域选择所述两两相邻的校正视频图像的配对 特征点; 生成单元 950用于根据所述配对特征点生成所述两两相邻的校正视 频图像的颜色校正矩阵; 校正单元 960用于通过所述颜色校正矩阵对接收到 的待拼接视频图像进行校正。
具体的, 如图 9B所示, 所述选择单元 940至少包括下述一个单元: 第一 选择单元 941, 用于对所述重叠区域进行 SIFT特征点检测, 对所述检测到的 特征点进行匹配, 得到相邻两个校正视频图像的多组配对特征点; 第二选择 单元 942, 用于对所述重叠区域进行 SIFT特征点检测, 对所述检测到的特征 点进行匹配, 得到相邻两个校正视频图像的多组配对特征点, 以所述配对特 征点为中心划分范围相同的区域, 将所述划分的区域的颜色特征的平均值赋 值给所述配对特征点; 第三选择单元 943, 用于对所述重叠区域进行分割, 所 述分割后的两个校正视频图像的对应区域作为配对特征点, 将所述对应的区 域的颜色特征的平均值赋值给所述配对特征点; 第四选择单元 944, 用于接收 通过手动从所述重叠区域中选取的区域块, 所述选取的两个校正视频图像的 对应区域块作为配对特征点, 将所述对应的区域块的颜色特征的平均值赋值 给所述配对特征点。 需要说明的是, 为了示例清楚, 图 9B中的选择单元 940 包含了上述所有四个单元, 但是在实际应用过程中, 选择单元 940可以根据 需要包含其中至少一个单元即可。
具体的, 如图 9C所示, 假设所述两两相邻的校正视频图像中一个作为源 视频图像, 另一个作为目标视频图像, 所述生成单元 950可以包括: 颜色矩 阵建立单元 951,用于分别建立所述源视频图像和目标视频图像的颜色空间矩 阵, 所述颜色空间矩阵的每一行表示所述配对特征点中一个特征点的颜色空 间属性; 变换关系建立单元 952, 用于建立所述源视频图像的颜色空间矩阵和 目标视频图像的颜色空间矩阵变换关系, 所述变换关系为: 所述源视频图像 的颜色空间矩阵与所述颜色校正矩阵的乘积加上误差量等于所述目标视频图 像的颜色空间矩阵; 求解校正矩阵单元 953, 用于根据所述变换关系求出当所 述误差量最小时的颜色校正矩阵。
具体的, 如图 9D所示, 当所述获取单元 910获取两个校正视频图像时, 所述校正单元 960可以包括: 视频图像接收单元 961, 用于接收输入所述源视 频图像的输入装置传输的待校正的视频图像, 比如待拼接视频图像; 颜色矩 阵生成单元 962, 用于生成所述待校正的视频图像的颜色空间矩阵; 颜色矩阵 变换单元 963,用于将所述颜色校正矩阵与所述待校正的视频图像的颜色空间 矩阵相乘, 将所述相乘的结果作为校正后的视频图像的颜色空间矩阵; 校正 结果生成单元 964,用于根据所述校正后的视频图像的颜色空间矩阵生成所述 校正后的视频图像。
具体的, 如图 9E所示, 当所述获取单元 910获取到 N个两两相邻的校正 视频图像时, 所述 N为大于 2的自然数, 所述 N个视频图像包括 N-1组两两 相邻的校正视频图像, 每组校正视频图像对应一个颜色校正矩阵; 所述校正 单元 660可以包括: 视频图像接收单元 96Γ , 用于接收输入装置传输的待校 正的视频图像, 所述待校正的视频图像为所述 N个视频图像中的第 K个视频 图像; 第一颜色矩阵生成单元 962 ' , 用于生成所述待校正的视频图像的颜色 空间矩阵; 校正矩阵生成单元 963 ' , 用于顺序将第一个颜色校正矩阵至第 K -1个颜色校正矩阵相乘得到所述待校正的视频图像的颜色校正矩阵; 第二颜 色矩阵生成单元 964' , 用于将所述颜色校正矩阵与所述待校正的视频的颜色 空间矩阵相乘, 将所述相乘的结果作为校正后的视频图像的颜色空间矩阵; 校正结果生成单元 965,用于根据所述校正后的视频图像的颜色空间矩阵生成 所述校正后的视频图像。
下面以对五台摄像机传输的五个两两相邻的视频图像为例说明本发明实 施例中颜色校正的过程, 假设这五个视频图像分别用 Fl、 F2、 F3、 F4、 F5表 示, 其中每两个相邻的视频图像为一组, 共分为四组, 即 F1和 F2—组, 表 示为 Zl, F2和 F3—组, 表示为 Z2, F3和 F4—组, 表示为 Z3, F4和 F5— 组, 表示为 Z4。
按照前述方法第三实施例中歩骤 S504至歩骤 S507, 分别求解 Zl、 Z2、 Z 3和 Z4的颜色校正矩阵, 表示为 Mat— 1、 Mat— 2、 Mat— 3、 Mat— 4。
根据将第一个颜色校正矩阵至第 K-1 个颜色校正矩阵相乘能够得到第 K 个摄像机输入的视频图像的颜色校正矩阵可知, 第二台摄像机所拍摄图像对 应的颜色校正矩阵 Mat— 2' = Mat— 1; 第三台摄像机所拍摄图像对应的颜色校 正矩阵 Mat— 3' = Mat— lXMat— 2; 第四台摄像机所拍摄图像对应的颜色校正矩 阵 Mat— 4' = Mat— IX Mat— 2 X Mat— 3; 第 5台摄像机所拍摄图像对应的颜色校 正矩阵 Mat— 5' = Mat— 1 X Mat— 2 X Mat— 3 X Mat— 4。
这样, 对第二台摄像机拍摄图像进行校正时, 只要用颜色校正矩阵 Mat— 2'与所拍摄图像的颜色空间矩阵相乘;对第三台摄像机拍摄图像进行校正时, 用颜色校正矩阵 Mat— 3' 与所拍摄图像的颜色空间矩阵相乘; 对第四台摄像机 拍摄图像进行校正时, 用颜色校正矩阵 Mat— 4' 与所拍摄图像的颜色空间矩阵 相乘; 对第五台摄像机拍摄图像进行校正时, 用颜色校正矩阵 Mat— 5' 与所拍 摄图像的颜色空间矩阵相乘。 实施例七
如图 10所示为本发明实施例七提供的的立体全景视频会议方法流程图, 该方法包括:
S1001: 至少从两个视角同歩获取同一会场的视频流;
S1002: 根据每个视频流的深度信息从对应的视频流中获取多个深度层次 的图像数据;
S1003: 对获取的不同视角的视频流进行基于深度信息的拼接, 生成立体 全景视频流;
S1004: 根据终端显示器的类别, 将所述立体全景视频流的视频图像显示 在终端显示器上。
如图 11所示为本发明实施例七提供的视频拼接流程图, 其包括: 歩骤 s
1101: 对初始帧进行视频拼接从而计算得到完整的拼接图, 拼接初始帧第一 图和第二图; 歩骤 S1102: 对后续的视频序列首先检测当前帧第一图与前一帧 的变化区域; 歩骤 S1103 : 如果变化区域较小; 歩骤 S1105: 则仅对变化的区 域进行视频拼接计算得到局部拼接图; 歩骤 S1106: 并利用该局部区域更新前 一帧或初始帧之间的对应变化区域, 生成当前帧的拼接图; 歩骤 S1104: 若变 化区域过大, 则完整计算当前帧的拼接; 歩骤 S1107、 读取下一帧图像; 歩骤 S1108: 判断是否为最后一帧? 如果是则结束, 如果否则转到歩骤 S1102。 由 于图像序列之间一般具有较强的相关性, 变化的区域仅为场景内容的一部分, 所以采用该算法可以显著降低视频拼接的算法复杂度, 如此, 在进行视频拼 接时可以采用较复杂的算法, 在满足视频拼接实时的同时获得较准确的全景 拼接视频。
上述方案中, 采用的是当前帧参考前一帧计算变化区域, 对于场景相对 固定的会议等其它场景, 也可以采用当前帧相对初始帧检测变化区域。
利用如图 1所示的摄像机即可获得如图 12所示的第一、 第二图像序列, 视频序列即对第一图像序列和第二图像序列中对应的图像对进行拼接, 获得 每一图像对的拼接图, 对拼接图进行立体编码并输出。 对终端显示类别进行 判断, 如果终端显示设备是二维显示器, 则显示合成视频图像的二维图像信 息; 如果终端显示设备是三维立体显示器, 则显示合成视频图像的三维立体 图像信息; 如果终端显示设备是多层显示器, 则显示合成视频图像的多个深 度层次的图像信息。
本发明实施例的有益效果在于, 可以向用户提供全景的、 高分辨率的、 无缝的、 三维立体的会议体验。 解决了在多视点视频拼接时, 出现的由于视 差带来的重影问题, 特别是对于近景视差较为明显的场景。 提供了一种针对 不同显示设备的多显示方式。 可以利用多层显示器, 实现前后景的分别显示, 也能够有较好的立体感受。 同样, 可以利用立体显示器和平面显示器, 实现 更精确、 更好的立体感受。 实施例八
如图 13所示为本发明实施例八提供的立体全景视频会议方法流程图, 该 方法包括:
S 1301 : 至少从两个视角同歩获取同一会场的视频流。
S 1302 : 根据每个视频流的深度信息从对应的视频流中获取多个深度层次 的图像数据。
歩骤 S1301— S1302和实施例七中相类似, 在此就不再进行赘述了。
S 1303 : 获取至少两个校正视频图像, 所述至少两个校正视频图像中两两 相邻的校正视频图像具有重叠区域。
S 1304: 从所述重叠区域选择所述两两相邻的校正视频图像的配对特征 点。
S 1305 : 根据所述配对特征点生成所述两两相邻的校正视频图像的颜色校 正矩阵。
S 1306: 通过所述颜色校正矩阵对所述视频流进行校正。
歩骤 S1303-S1306是对获取的视频流进行颜色校正。
需要指出的是, 该颜色校正矩阵只需要生成一次, 以后如果需要对不同 的待拼接视频图像进行颜色校正, 歩骤 S1303-S1305可以不必重复进行, 而 直接执行歩骤 S1306即可。
因此, 作为本发明的一个实施例, 歩骤 S1303—S1305可以在歩骤 S1301 前面予以执行, 以得到颜色校正矩阵为目的。
S 1307: 对获取的经过校正后的不同视角的视频流进行基于深度信息的拼 接, 生成立体全景视频流。
S 1308: 根据终端显示器的类别, 将所述立体全景视频流的视频图像显示 在终端显示器上。 实施例九
如图 14所示为本发明实施例九提供的的立体全景视频会议设备结构图, 该设备包括: 深度信息获取装置 1401用于至少从两个视角同歩获取同一会场 的视频流; 分层图像获取装置 1402用于根据每个视频流的深度信息从对应的 视频流中获取多个深度层次的图像数据; 立体全景视频流生成装置 1403用于 对获取的不同视角的视频流进行基于深度信息的拼接, 生成立体全景视频流; 视频图像显示装置 1404用于根据终端显示器的类别, 将所述立体全景视频流 的视频图像显示在终端显示器上。
如图 15A所示, 深度摄像机 (1501, 1502 , 1503 , 1504) 与立体全景视 频会议设备 1400相连接, 深度信息获取装置 1401接收从四个视角同歩获取 的同一会场的视频流; 分层图像获取装置 1402根据每个视频流的深度信息从 对应的视频流中获取多个深度层次的图像数据; 立体全景视频流生成装置 14 03对获取的不同视角的视频流进行基于视频图像深度信息的拼接, 获得拼接 立体视频序列, 对所述的拼接立体视频序列进行立体视频编码, 生成传输立 体全景视频流。 视频图像显示装置 1404用于根据终端显示器的类别, 将所述 立体全景视频流的视频图像显示在终端显示器上。
立体全景视频会议设备 1400还包括手势指令存储装置 1505用于存储手 势信息与显示控制指令的映射关系; 显示指令获取装置 1506用于根据获取的 手势信息从所述的映射关系中获取对应的显示控制指令;显示指令获取装置 1 507用于根据获取的手势信息从所述的映射关系中获取对应的显示控制指令; 显示器控制装置 1508用于根据获取的显示控制指令控制所述终端显示器的显 如图 15B所示, 视频图像显示装置 1404包括: 显示器类别确定单元 140 41和显示器 14042, 显示器 14042包括: 二维显示器或三维立体显示器或多 层显示器; 显示器类别确定单元 14041 确定所述的终端显示器是二维、 三维 或多维显示器后, 如果显示器 14042 是二维显示器, 则显示合成视频图像的 二维图像信息; 如果显示器 14042 是三维立体显示器, 则显示合成视频图像 的三维立体图像信息; 如果显示器 14042 是多层显示器, 则显示合成视频图 像的多个深度层次的图像信息。
如图 16所示, 本发明实施例的快速视频拼接方法流程如下:
S 1601 : 预先通过两台或多台摄像机拍摄会议场景背景, 拼接该无人会场 背景图, 预存该全景图和会议背景图;
S 1602 : 输入两个或多个视频流, 拼接初始帧第一图和第二图;
S 1603 : 检测当前帧第一图相对上一帧的变化区域;
S 1604: 变化区域是否过大?
S 1605 : 若变化区域过大, 则进行完整的全景图拼接; 歩骤 S1608: 再读 取下一帧; 歩骤 S1606: 若不过大, 则进行变化区域的前景图像拼接; 歩骤 S 1607: 更新上一帧对应区域的拼接图, 加入背景全景图; 歩骤 S1608: 再读取 下一帧;
歩骤 S1609: 判断是否是最后一帧?若是最后一帧, 则结束; 若不是最后 一帧转至歩骤 S 1603。
对全景拼接图进行立体编码并输出。 对终端显示类别进行判断, 如果终 端显示设备是二维显示器, 则显示合成视频图像的二维图像信息; 如果终端 显示设备是三维立体显示器, 则显示合成视频图像的三维立体图像信息; 如 果终端显示设备是多层显示器, 则显示合成视频图像的多个深度层次的图像 信息。
本发明实施例的有益效果在于, 可以向用户提供全景的、 高分辨率的、 无缝的、 三维立体的会议体验。 解决了在多视点视频拼接时, 出现的由于视 差带来的重影问题, 特别是对于近景视差较为明显的场景。 提供了一种针对 不同显示设备的多显示方式。 可以利用多层显示器, 实现前后景的分别显示, 也能够有较好的立体感受。 同样, 可以利用立体显示器和平面显示器, 实现 更精确、 更好的立体感受。 并且提供了一种更为友好的数据协作方式, 可以 实现不同会场不同人员发出的手势指令, 产生作用显示在同一个显示设备上, 实现不同会场不同人员有同一会场地点同时控制数据、 及会议系统的感受。
本实施例利用了深度摄像机, 使得视讯或网真会议系统的远程终端数据 协作、 会议控制变得更加方便快捷。 由于深度摄像机的存在, 可根据深度摄 像机识别出手、 手指、 手心。 进而识别出手所发出的指令。
手势识别所采取的歩骤如下:
歩骤 1、不同会场参与人员发出手势指令,并由深度摄像机做出指令判定; 歩骤 2、指令示意的驱动作用显示在远程终端设备上。这里所呈现的一种 应用场景就是: 如图 17所示的基于深度摄像机构成的多会场 2D/3D/Mul it-L ayer多视点视频会议系统。
如图 18所示, 把会场 B、 C、 D、 E中的数据同时显示到会场 A的显示数 据的显示器中。
会场 B可以通过手势来控制其数据内容显示方式, 同样 C、 D、 E也可以 通过手势来控制各自的数据内容显示方式。
会场 A中的人通过手势控制会场 C的数据显示内容, 看自己想看的内容。 在这里, 可以合理定义一些远程控制数据显示方式的手势, 来友好的进 行不同会场间的会议数据内容控制与显示。例如: 会场 B控制其数据在会场 A 的显示, 手势可以定义为一些常见的在本地会场应用中的手势模型;
会场 A中的人通过手势控制会场 C的数据显示内容, 看自己想看内容, 则, 手势与显示控制指令的映射关系可以定义为:
如图 19所示, 竖起一个食指表示显示第一个会场的数据, 并将控制焦点 放到第一个会场数据。 如图 20所示, 竖起食指和中指, 表示显示第二个会场 数据, 并将焦点放到第一个会场数据。
依此类推, 竖起中指、 无名指、 小拇指表示显示第三个会场数据, 并将 焦点放到第三个会场数据; 竖起除大拇指外手指, 表示显示第四个会场数据, 并将焦点放到第四个会场数据; 固定大拇指, 其他手指旋转, 表示轮流显示 第五个、 第六个……会场数据, 焦点定位到随旋转停止时定位的会场数据; 手掌伸开, 垂直手臂, 拉回胸前, 表示满屏幕显示焦点会场数据。
这样, 可以通过存储手势信息与显示控制指令的映射关系; 根据深度摄 像机对会场内人的手势进行摄像而并生成手势信息, 从手势信息与显示控制 指令的映射关系中获取对应的显示控制指令; 并根据获取的显示控制指令控 制所述终端显示设备的显示。 如果终端显示设备是 2D显示器, 则显示合成视 频图像的二维图像信息; 如果所述的终端显示设备是 3D立体显示器, 则显示 合成视频图像的三维立体图像信息; 如果所述的终端显示设备是多层显示器, 则显示多个深度层次的图像信息。
本发明实施例能够向用户提供全景的、 高分辨率的、 无缝的、 三维立体 的会议体验。 能够获得比传统网真更高级的、 真实的感受。 解决了在多视点 视频拼接时, 出现由于视差带来的重影问题, 特别是对于近景视差较为明显 的场景。 提供了一种快速、 实时的视频拼接方法。 可以降低视频拼接的复杂 度, 提高视频拼接的效率。 同时还提供了一种针对不同显示设备的多显示方 式。 我们可以利用多层显示器, 实现前后景的分别显示, 也能够有较好的立 体感受。 同样, 可以利用立体显示器, 实现更精确、 更好的立体感受。 也提 供了一种更为友好的数据协作方式。 可以实现不同会场不同人员发出的手势 指令, 产生作用显示在同一个显示设备上, 实现不同会场不同人员有同一会 场地点同时控制数据、 会议系统的感受。 实施例十
如图 21所示为本发明实施例十提供的的立体全景视频会议设备结构图, 该设备包括: 深度信息获取装置 2110、 分层图像获取装置 2120、 视频图像校 正装置 2130、 立体全景视频流生成装置 2140和视频图像显示装置 2150。 其 中深度信息获取装置 2110、 分层图像获取装置 2120和视频图像显示装置 215 0与实施例九相类似, 在此就不再进行赘述了。
视频图像校正装置 2130用于对深度信息获取装置 2110所获取的视频流 进行颜色校正, 在本实施例中, 其和分层图像获取装置 2120相连, 即在完成 了视频流深度层次的图像数据的获取后再对其进行颜色校正, 当然本实施例 也不排除先对获取的视频流进行颜色校正, 再获取其深度层次的图像数据。
在本实施例中, 视频图像校正装置 2130包括: 获取单元 2131、 选择单元 2132、 生成单元 2133和校正单元 2134, 它们之间依次相连:
获取单元 2131用于获取至少两个校正视频图像, 所述至少两个校正视频 图像中两两相邻的校正视频图像具有重叠区域;
选择单元 2132用于从所述重叠区域选择所述两两相邻的校正视频图像的 配对特征点;
生成单元 2133用于根据所述配对特征点生成所述两两相邻的校正视频图 像的颜色校正矩阵;
校正单元 2134用于通过所述颜色校正矩阵对所述视频流进行校正。
需要指出的是, 本实施例为了对视频流进行颜色校正, 需要依靠生成单 元 2133所生成的颜色校正矩阵, 在本实施例中, 该颜色校正矩阵可以是在分 层图像获取装置 2120 完成了对视频流深度层次的图像数据的获取后再生成 的, 但是, 本实施例也不排除在获取视频流前就已经生成好该颜色校正矩阵, 而在本实施例中仅需利用该预先生成好的颜色校正矩阵对视频流进行校正即 可。
在本实施例中, 立体全景视频流生成装置 2140用于对获取的经过校正后 的不同视角的视频流进行基于深度信息的拼接, 生成立体全景视频流。
本实施例不但具有实施例九所述的优点, 而且本发明实施例对视频图像 进行颜色校正, 因此可以获得亮色度较好的全景视频会议体验。 另外由于本 实施例仅在计算颜色校正矩阵时需要两两图像之间具有重叠区域, 在校正过 程中无论视频图像之间是否有重叠区域, 均可以通过该颜色校正矩阵进行颜 色校正, 并且颜色校正矩阵只需要生成一次, 节省了对视频图像进行颜色校 正的时间。
本领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬 件平台的方式来实现。 基于这样的理解, 本发明的技术方案本质上或者说对 现有技术做出贡献的部分可以以软件产品的形式体现出来, 该计算机软件产 品可以存储在存储介质中, 如 R0M/RAM、 磁碟、 光盘等, 包括若干指令用以使 得一台计算机设备 (可以是个人计算机, 服务器, 或者网络设备等) 执行本 发明各个实施例或者实施例的某些部分所述的方法。
虽然通过实施例描绘了本发明, 本领域普通技术人员知道, 本发明有许 多变形和变化而不脱离本发明的精神, 希望所附的权利要求包括这些变形和 变化而不脱离本发明的精神。
+

Claims

权 利 要 求 书
1、 一种立体全景视频流生成方法, 其特征在于, 所述方法包括: 获取至少两个待拼接视频图像的深度信息;
根据每个待拼接视频图像的深度信息从对应的待拼接视频图像中获取多 个深度层次的图像数据;
根据获取的多个深度层次的图像数据进行视频图像数据间的拼接, 生成 立体全景视频流。
2、 根据权利要求 1所述的方法, 其特征在于, 所述的获取至少两个待拼 接视频图像的深度信息是指: 由至少两个深度摄像机获取视频流, 并从每个 视频流中同歩获取每帧待拼接视频图像的深度信息。
3、 根据权利要求 1所述的方法, 其特征在于, 所述的根据获取的多个深 度层次的图像数据进行视频图像数据间的拼接包括: 对相同深度层次的图像 数据进行视频图像数据间的拼接; 或
对不同深度层次的图像数据进行视频图像数据间的拼接。
4、 根据权利要求 3所述的方法, 其特征在于, 所述的对相同深度层次的 图像数据进行拼接包括: 检测每个视频流当前帧中每个深度层次的图像数据 相对上一帧对应深度层次图像数据的图像变化区域, 确定所述的变化区域大 于设定的阈值后, 则仅对所述变化区域的图像数据进行拼接。
5、 根据权利要求 1所述的方法, 其特征在于, 所述根据获取的多个深度 层次的图像数据进行视频图像数据间的拼接之后还包括对所述视频图像进行 校正, 所述对视频图像进行校正包括:
获取至少两个校正视频图像, 所述至少两个校正视频图像中两两相邻的 校正视频图像具有重叠区域;
从所述重叠区域选择所述两两相邻的校正所述视频图像的配对特征点; 根据所述配对特征点生成所述两两相邻的校正视频图像的颜色校正矩 阵;
通过所述颜色校正矩阵对所述待拼接视频图像进行校正。
6、 根据权利要求 5所述的方法, 其特征在于, 所述获取至少两个校正视 频图像后, 还包括: 对所述至少两个校正视频图像进行预处理;
所述预处理包括: 平滑降噪处理、 和 /或畸变校正处理。
7、 根据权利要求 5所述的方法, 其特征在于, 所述获取至少两个校正视 频图像后, 还包括: 对所述至少两个校正视频图像进行颜色空间转换;
所述颜色空间转换前后的所述视频图像的格式包括: RGB、 或 HSV、 或 YU V、 或 HSL、 或 CIE-Lab、 或 CIE_Luv、 或 CMY、 或 CMYK、 或 XYZ。
8、 根据权利要求 5所述的方法, 其特征在于, 所述从重叠区域选择所述 两两相邻的校正视频图像的配对特征点包括:
对所述重叠区域进行 SIFT特征点检测,对所述检测到的特征点进行匹配, 得到相邻两个校正视频图像的多组配对特征点; 或
对所述重叠区域进行 SIFT特征点检测,对所述检测到的特征点进行匹配, 得到相邻两个校正视频图像的多组配对特征点, 以所述配对特征点为中心划 分范围相同的区域, 将所述划分的区域的颜色特征的平均值赋值给所述配对 特征点; 或
对所述重叠区域进行分割, 所述分割后的两个校正视频图像的对应区域 作为配对特征点, 将所述对应的区域的颜色特征的平均值赋值给所述配对特 征点; 或
接收通过手动从所述重叠区域中选取的区域块, 所述选取的两个校正视 频图像的对应区域块作为配对特征点, 将所述对应的区域块的颜色特征的平 均值赋值给所述配对特征点。
9、 根据权利要求 5所述的方法, 其特征在于, 所述两两相邻的校正视频 图像中一个作为源视频图像, 另一个作为目标视频图像,
所述根据配对特征点生成所述两两相邻的校正视频图像的颜色校正矩阵 包括:
分别建立所述源视频图像和目标视频图像的颜色空间矩阵, 所述颜色空 间矩阵的每一行表示所述配对特征点中一个特征点的颜色空间属性;
建立所述源视频图像的颜色空间矩阵和目标视频图像的颜色空间矩阵变 换关系, 所述变换关系为: 所述源视频图像的颜色空间矩阵与所述颜色校正 矩阵的乘积加上误差量等于所述目标视频图像的颜色空间矩阵;
根据所述变换关系求出当所述误差量最小时的颜色校正矩阵。
10、 根据权利要求 5所述的方法, 其特征在于, 当获取两个校正视频图 像时,
所述通过颜色校正矩阵对所述待拼接视频图像进行校正包括:
接收输入所述源视频图像的输入装置传输的待拼接视频图像;
生成所述待拼接频图像的颜色空间矩阵;
将所述颜色校正矩阵与所述待拼接视频图像的颜色空间矩阵相乘, 将所 述相乘的结果作为校正后的待拼接视频图像的颜色空间矩阵;
根据所述校正后的待拼接视频图像的颜色空间矩阵生成所述校正后的待 拼接视频图像。
11、 根据权利要求 5所述的方法, 其特征在于, 当获取到 N个两两相邻 的校正视频图像时, 所述 N为大于 2的自然数, 所述 N个校正视频图像包括 N -1组两两相邻的校正视频图像, 每组校正视频图像对应一个颜色校正矩阵, 所述通过颜色校正矩阵对所述待拼接视频图像进行校正包括:
接收输入装置传输的待拼接视频图像, 所述待拼接视频图像为所述 N个 视频图像中的第 K个视频图像 ·'
生成所述待拼接视频图像的颜色空间矩阵;
顺序将第一个颜色校正矩阵至第 K-1 个颜色校正矩阵相乘得到所述待拼 接视频图像的颜色校正矩阵;
将所述颜色校正矩阵与所述待拼接视频图像的颜色空间矩阵相乘, 将所 述相乘的结果作为校正后的待拼接视频图像的颜色空间矩阵;
根据所述校正后的待拼接视频图像的颜色空间矩阵生成所述校正后的待 拼接视频图像。
12、 一种立体全景视频会议方法, 其特征在于, 所述的方法包括: 至少从两个视角同歩获取同一会场的视频流;
根据每个视频流的深度信息从对应的视频流中获取多个深度层次的图像 数据;
对获取的不同视角的视频流进行基于深度信息的拼接, 生成立体全景视 频流;
根据终端显示器的类别, 将所述立体全景视频流的视频图像显示在终端 显 器上 °
13、 根据权利要求 12所述的方法, 其特征在于, 所述根据每个视频流的 深度信息从对应的视频流中获取多个深度层次的图像数据之后还包括:
获取至少两个校正视频图像, 所述至少两个校正视频图像中两两相邻的 校正视频图像具有重叠区域;
从所述重叠区域选择所述两两相邻的校正视频图像的配对特征点; 根据所述配对特征点生成所述两两相邻的校正视频图像的颜色校正矩 阵;
通过所述颜色校正矩阵对所述视频流进行校正;
所述对获取的不同视角的视频流进行基于深度信息的拼接包括: 对获取的经过校正后的不同视角的视频流进行基于深度信息的拼接。
14、 根据权利要求 12所述的方法, 其特征在于, 所述的根据终端显示设 备类别将所述传输立体视频流的视频图像显示在终端显示设备上包括: 如果 确定所述的终端显示设备是二维显示器, 则显示所述视频图像的二维图像信 息;
如果确定所述的终端显示设备是三维立体显示器, 则显示所述视频图像 的三维立体图像信息;
如果确定所述的终端显示设备是多层显示器, 则显示所述视频图像的多 个深度层次的图像信息。
15、 根据权利要求 12所述的方法, 其特征在于, 所述的方法还包括: 建立手势信息与显示控制指令的映射关系;
获取所述会场内的人的手势视频图像, 获取手势信息;
根据获取的手势信息从所述的映射关系中获取对应的显示控制指令; 根据获取的显示控制指令控制所述终端显示设备的显示。
16、 一种立体全景视频流生成设备, 其特征在于, 所述的设备包括: 深度信息获取装置, 用于获取至少两个待拼接视频图像的深度信息; 分层图像获取装置, 用于根据每个待拼接视频图像的深度信息从对应的 待拼接视频图像中获取多个深度层次的图像数据;
立体全景视频流生成装置, 用于根据获取的多个深度层次的图像数据进 行视频图像数据间的拼接, 生成立体全景视频流。
17、 根据权利要求 16所述的设备, 其特征在于, 所述的深度信息获取装 置包括: 至少两个深度摄像机; 所述的深度摄像机从摄取的视频流中同歩获 取每帧待拼接视频图像的深度信息。
18、 根据权利要求 16所述的设备, 其特征在于, 所述的立体全景视频流 生成装置包括: 图像拼接单元, 用于对相同深度层次的图像数据进行视频图 像数据间的拼接或对不同深度层次的图像数据进行视频图像数据间的拼接。
19、 根据权利要求 18所述的设备, 其特征在于, 所述的立体全景视频流 生成装置包括: 图像检测单元, 用于检测每个视频流当前帧中每个深度层次 的图像数据相对上一帧对应深度层次图像数据的图像变化区域, 确定所述的 变化区域大于设定的阈值后输出图像拼接指令;
所述的图像拼接单元根据所述的图像拼接指令对所述变化区域的图像数 据进行图像数据间的拼接。
20、 根据权利要求 16所述的设备, 其特征在于, 所述设备还包括一视频 图像校正装置, 所述视频图像校正装置包括:
获取单元, 用于获取至少两个校正视频图像, 所述至少两个校正视频图 像中两两相邻的校正视频图像具有重叠区域;
选择单元, 用于从所述重叠区域选择所述两两相邻的校正视频图像的配 对特征点;
生成单元, 用于根据所述配对特征点生成所述两两相邻的校正视频图像 的颜色校正矩阵;
校正单元, 用于通过所述颜色校正矩阵对所述待拼接视频图像进行校正; 所述立体全景视频流生成装置, 用于根据获取的多个深度层次的图像数 据对校正后的待拼接视频图像进行拼接, 生成立体全景视频流。
21、 根据权利要求 20所述的设备, 其特征在于, 所述视频图像校正装置 还包括:
预处理单元, 用于当所述获取单元获取到至少两个校正视频图像后, 对 所述至少两个校正视频图像进行预处理, 所述预处理包括平滑降噪处理, 和 / 或畸变校正处理。
22、 根据权利要求 20所述的设备, 其特征在于, 所述视频图像校正装置 还包括:
转换单元, 用于对所述至少两个校正视频图像进行颜色空间转换, 所述 转换前后的视频图像的格式包括 RGB、 或 HSV、 或 YUV、 或 HSL、 或 CIE_Lab、 或 CIE-Luv、 或 CMY、 或 CMYK、 或 XYZ。
23、 根据权利要求 20所述的设备, 其特征在于, 所述选择单元至少包括 一个下述单元:
第一选择单元, 用于对所述重叠区域进行 SIFT特征点检测, 对所述检测 到的特征点进行匹配, 得到相邻两个校正视频图像的多组配对特征点;
第二选择单元, 用于对所述重叠区域进行 SIFT特征点检测, 对所述检测 到的特征点进行匹配, 得到相邻两个校正视频图像的多组配对特征点, 以所 述配对特征点为中心划分范围相同的区域, 将所述划分的区域的颜色特征的 平均值赋值给所述配对特征点;
第三选择单元, 用于对所述重叠区域进行分割, 所述分割后的两个校正 视频图像的对应区域作为配对特征点, 将所述对应的区域的颜色特征的平均 值赋值给所述配对特征点;
第四选择单元, 用于接收通过手动从所述重叠区域中选取的区域块, 所 述选取的两个校正视频图像的对应区域块作为配对特征点, 将所述对应的区 域块的颜色特征的平均值赋值给所述配对特征点。
24、 根据权利要求 20所述的设备, 其特征在于, 所述两两相邻的校正视 频图像中一个作为源视频图像, 另一个作为目标视频图像,
所述生成单元包括:
颜色矩阵建立单元, 用于分别建立所述源视频图像和目标视频图像的颜 色空间矩阵, 所述颜色空间矩阵的每一行表示所述配对特征点中一个特征点 的颜色空间属性;
变换关系建立单元, 用于建立所述源视频图像的颜色空间矩阵和目标视 频图像的颜色空间矩阵变换关系, 所述变换关系为: 所述源视频图像的颜色 空间矩阵与所述颜色校正矩阵的乘积加上误差量等于所述目标视频图像的颜 色空间矩阵;
校正矩阵求解单元, 用于根据所述变换关系求出当所述误差量最小时的 颜色校正矩阵。
25、 根据权利要求 20所述的设备, 其特征在于, 当所述获取单元获取两 个校正视频图像时,
所述校正单元包括:
视频图像接收单元, 用于接收输入所述源视频图像的输入装置传输的待 拼接视频图像; 颜色矩阵生成单元, 用于生成所述待拼接视频图像的颜色空间矩阵; 颜色矩阵变换单元, 用于将所述颜色校正矩阵与所述待拼接视频图像的 颜色空间矩阵相乘, 将所述相乘的结果作为校正后的待拼接视频图像的颜色 空间矩阵;
校正结果生成单元, 用于根据所述校正后的待拼接视频图像的颜色空间 矩阵生成所述校正后的待拼接视频图像。
26、 根据权利要求 20所述的设备, 其特征在于, 当所述获取单元获取到 N个两两相邻的校正视频图像时, 所述 N为大于 2的自然数, 所述 N个视频图 像包括 N-1 组两两相邻的校正视频图像, 每组校正视频图像对应一个颜色校 正矩阵;
所述校正单元包括:
视频图像接收单元, 用于接收输入装置传输的待拼接视频图像, 所述待 拼接视频图像为所述 N个视频图像中的第 K个视频图像;
第一颜色矩阵生成单元, 用于生成所述待拼接视频图像的颜色空间矩阵; 校正矩阵生成单元, 用于顺序将第一个颜色校正矩阵至第 K-1 个颜色校 正矩阵相乘得到所述待拼接视频图像的颜色校正矩阵;
第二颜色矩阵生成单元, 用于将所述颜色校正矩阵与所述待拼接视频的 颜色空间矩阵相乘, 将所述相乘的结果作为校正后的待拼接视频图像的颜色 空间矩阵;
校正结果生成单元, 用于根据所述校正后的待拼接视频图像的颜色空间 矩阵生成所述校正后的待拼接视频图像。
27、 一种立体全景视频会议设备, 其特征在于, 所述的设备包括: 深度信息获取装置, 至少从两个视角同歩获取同一会场的视频流; 分层图像获取装置, 用于根据每个视频流的深度信息从对应的视频流中 获取多个深度层次的图像数据;
立体全景视频流生成装置, 对获取的不同视角的视频流进行基于深度信 息的拼接, 生成立体全景视频流;
视频图像显示装置, 用于根据终端显示器的类别, 将所述立体全景视频 流的视频图像显示在终端显示器上。
28、 根据权利要求 27所述的设备, 其特征在于, 所述的设备还包括一视 频图像校正装置, 所述视频图像校正装置包括:
获取单元, 用于获取至少两个校正视频图像, 所述至少两个校正视频图 像中两两相邻的校正视频图像具有重叠区域;
选择单元, 用于从所述重叠区域选择所述两两相邻的校正视频图像的配 对特征点;
生成单元, 用于根据所述配对特征点生成所述两两相邻的校正视频图像 的颜色校正矩阵;
校正单元, 用于通过所述颜色校正矩阵对所述视频流进行校正; 所述立体全景视频流生成装置, 用于对获取的经过校正后的不同视角的 视频流进行基于深度信息的拼接, 生成立体全景视频流。
29、 如权利要求 27或 28所述设备, 其特征在于, 所述设备还包括显示 器类别确定单元及显示器, 所述显示器类别确定单元用于确定所述显示器的 类别, 并根据确认结果输出显示指令, 所述显示器根据所述显示指令显示所 述立体全景视频流。
30、 如权利要求 27或 28所述设备, 其特征在于, 所述设备还包括: 手势指令存储装置, 用于存储手势信息与显示控制指令的映射关系; 手势信息获取装置, 用于获取所述会场内的人的手势视频图像和手势信 息;
显示指令获取装置, 用于根据获取的手势信息从所述的映射关系中获取 对应的显示控制指令;
显示器控制装置, 用于根据获取的显示控制指令控制所述终端显示器的
PCT/CN2009/075383 2008-12-30 2009-12-08 立体全景视频流生成方法、设备及视频会议方法和设备 Ceased WO2010075726A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP09836013A EP2385705A4 (en) 2008-12-30 2009-12-08 METHOD AND DEVICE FOR GENERATING STEREOSCOPIC PANORAMIC VIDEO FLOW AND METHOD AND DEVICE FOR VISIOCONFERENCE
US13/172,193 US8717405B2 (en) 2008-12-30 2011-06-29 Method and device for generating 3D panoramic video streams, and videoconference method and device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN200810247531.5 2008-12-30
CN200810247531A CN101771830B (zh) 2008-12-30 2008-12-30 立体全景视频流生成方法、设备及视频会议方法和设备
CN200910118629.5 2009-02-26
CN2009101186295A CN101820550B (zh) 2009-02-26 2009-02-26 多视点视频图像校正方法、装置及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/172,193 Continuation US8717405B2 (en) 2008-12-30 2011-06-29 Method and device for generating 3D panoramic video streams, and videoconference method and device

Publications (1)

Publication Number Publication Date
WO2010075726A1 true WO2010075726A1 (zh) 2010-07-08

Family

ID=42309794

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/075383 Ceased WO2010075726A1 (zh) 2008-12-30 2009-12-08 立体全景视频流生成方法、设备及视频会议方法和设备

Country Status (3)

Country Link
US (1) US8717405B2 (zh)
EP (1) EP2385705A4 (zh)
WO (1) WO2010075726A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075773A (zh) * 2010-11-25 2011-05-25 深圳市创凯电子有限公司 立体与平面图像混合信号在超大屏幕上显像的同步方法
WO2012164148A1 (en) * 2011-05-31 2012-12-06 Nokia Corporation Methods, apparatuses and computer program products for generating panoramic images using depth map data
WO2013061334A1 (en) * 2011-10-25 2013-05-02 Mohan Devaraj 3d stereoscopic imaging device with auto parallax
CN111526323A (zh) * 2020-03-24 2020-08-11 视联动力信息技术股份有限公司 一种全景视频的处理方法和装置
CN115965526A (zh) * 2022-12-07 2023-04-14 瑞芯微电子股份有限公司 图像拼接方法、片上系统,存储介质和电子设备
CN117425000A (zh) * 2023-10-31 2024-01-19 清研灵智信息咨询(北京)有限公司 基于全景摄像的沉浸式视频巡检监测系统

Families Citing this family (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984463A (zh) * 2010-11-02 2011-03-09 中兴通讯股份有限公司 全景图合成方法及装置
CN102812497B (zh) * 2011-03-03 2016-06-08 松下知识产权经营株式会社 能够提供随后体验影像的影像提供装置、影像提供方法
US9491445B2 (en) * 2011-05-05 2016-11-08 Empire Technology Development Llc Lenticular directional display
TWI449408B (zh) * 2011-08-31 2014-08-11 Altek Corp 三維影像擷取方法與裝置及三維影像顯示裝置
KR101917764B1 (ko) * 2011-09-08 2018-11-14 삼성디스플레이 주식회사 입체 영상 표시 장치 및 입체 영상 표시 방법
JP2013118468A (ja) * 2011-12-02 2013-06-13 Sony Corp 画像処理装置および画像処理方法
EP2810247B1 (en) * 2012-01-31 2018-04-25 Sony Mobile Communications Inc. Method and electronic device for creating a combined image
US11202003B1 (en) * 2012-05-25 2021-12-14 Altia Systems Inc. Switchable cloud-optimized real-time stitching multiple imager method and system
US20130329985A1 (en) * 2012-06-07 2013-12-12 Microsoft Corporation Generating a three-dimensional image
US9870504B1 (en) * 2012-07-12 2018-01-16 The United States Of America, As Represented By The Secretary Of The Army Stitched image
EP4221187A3 (en) 2012-09-10 2023-08-09 Aemass, Inc. Multi-dimensional data capture of an environment using plural devices
US9058683B2 (en) * 2013-02-21 2015-06-16 Qualcomm Incorporated Automatic image rectification for visual search
US9600703B2 (en) * 2013-03-15 2017-03-21 Cognex Corporation Systems and methods for sorting image acquisition settings for pattern stitching and decoding using multiple captured images
JP6266761B2 (ja) 2013-05-10 2018-01-24 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. マルチビューレンダリング装置とともに使用するためのビデオデータ信号の符号化方法
EP2806401A1 (en) * 2013-05-23 2014-11-26 Thomson Licensing Method and device for processing a picture
US20150002636A1 (en) * 2013-06-28 2015-01-01 Cable Television Laboratories, Inc. Capturing Full Motion Live Events Using Spatially Distributed Depth Sensing Cameras
EP3686754A1 (en) * 2013-07-30 2020-07-29 Kodak Alaris Inc. System and method for creating navigable views of ordered images
WO2015026874A1 (en) 2013-08-19 2015-02-26 Nant Holdings Ip, Llc Metric based recognition, systems and methods
US9686479B2 (en) 2013-09-16 2017-06-20 Duke University Method for combining multiple image fields
TWI537767B (zh) * 2013-10-04 2016-06-11 財團法人工業技術研究院 可調體感範圍之多人指引系統與其方法
KR102265109B1 (ko) 2014-01-24 2021-06-15 삼성전자주식회사 영상 처리 방법 및 장치
US9742995B2 (en) 2014-03-21 2017-08-22 Microsoft Technology Licensing, Llc Receiver-controlled panoramic view video share
US10187569B2 (en) 2014-07-28 2019-01-22 Mediatek Inc. Portable device capable of generating panoramic file
KR102332752B1 (ko) * 2014-11-24 2021-11-30 삼성전자주식회사 지도 서비스를 제공하는 전자 장치 및 방법
CN105812649B (zh) * 2014-12-31 2019-03-29 联想(北京)有限公司 一种摄像方法和装置
WO2016145625A1 (en) * 2015-03-18 2016-09-22 Xiaoou Tang 3d hand pose recovery from binocular imaging system
US11095869B2 (en) * 2015-09-22 2021-08-17 Fyusion, Inc. System and method for generating combined embedded multi-view interactive digital media representations
US10147211B2 (en) 2015-07-15 2018-12-04 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US10222932B2 (en) 2015-07-15 2019-03-05 Fyusion, Inc. Virtual reality environment based manipulation of multilayered multi-view interactive digital media representations
US12261990B2 (en) 2015-07-15 2025-03-25 Fyusion, Inc. System and method for generating combined embedded multi-view interactive digital media representations
US10242474B2 (en) 2015-07-15 2019-03-26 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
US11006095B2 (en) 2015-07-15 2021-05-11 Fyusion, Inc. Drone based capture of a multi-view interactive digital media
US12495134B2 (en) 2015-07-15 2025-12-09 Fyusion, Inc. Drone based capture of multi-view interactive digital media
US10609438B2 (en) * 2015-08-13 2020-03-31 International Business Machines Corporation Immersive cognitive reality system with real time surrounding media
WO2017030985A1 (en) 2015-08-14 2017-02-23 Pcms Holdings, Inc. System and method for augmented reality multi-view telepresence
KR20170025058A (ko) 2015-08-27 2017-03-08 삼성전자주식회사 영상 처리 장치 및 이를 포함하는 전자 시스템
US11783864B2 (en) 2015-09-22 2023-10-10 Fyusion, Inc. Integration of audio into a multi-view interactive digital media representation
CN106683071B (zh) 2015-11-06 2020-10-30 杭州海康威视数字技术股份有限公司 图像的拼接方法和装置
KR101773929B1 (ko) * 2016-02-29 2017-09-01 (주)에프엑스기어 광 시야각 영상 처리 시스템, 광 시야각 영상의 전송 및 재생 방법, 및 이를 위한 컴퓨터 프로그램
CN108702498A (zh) 2016-03-10 2018-10-23 索尼公司 信息处理器和信息处理方法
EP3223524A1 (en) * 2016-03-22 2017-09-27 Thomson Licensing Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
US10762712B2 (en) 2016-04-01 2020-09-01 Pcms Holdings, Inc. Apparatus and method for supporting interactive augmented reality functionalities
EP3249929A1 (en) 2016-05-25 2017-11-29 Thomson Licensing Method and network equipment for establishing a manifest
US11202017B2 (en) 2016-10-06 2021-12-14 Fyusion, Inc. Live style transfer on a mobile device
KR102561860B1 (ko) * 2016-10-25 2023-08-02 삼성전자주식회사 전자장치 및 그 제어방법
CN106534825B (zh) * 2016-11-29 2018-09-07 宁波易维视显示技术有限公司 基于中线边缘特征投影的自动检测全景视频、图片的方法
WO2018106211A1 (en) * 2016-12-05 2018-06-14 Hewlett-Packard Development Company, L.P. Audiovisual transmissions adjustments via omnidirectional cameras
US10038894B1 (en) * 2017-01-17 2018-07-31 Facebook, Inc. Three-dimensional scene reconstruction from set of two dimensional images for consumption in virtual reality
US10437879B2 (en) 2017-01-18 2019-10-08 Fyusion, Inc. Visual search using multi-view interactive digital media representations
US20180227482A1 (en) 2017-02-07 2018-08-09 Fyusion, Inc. Scene-aware selection of filters and effects for visual digital media content
WO2018147329A1 (ja) * 2017-02-10 2018-08-16 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 自由視点映像生成方法及び自由視点映像生成システム
US11184599B2 (en) * 2017-03-15 2021-11-23 Pcms Holdings, Inc. Enabling motion parallax with multilayer 360-degree video
US11218683B2 (en) 2017-03-22 2022-01-04 Nokia Technologies Oy Method and an apparatus and a computer program product for adaptive streaming
US10313651B2 (en) 2017-05-22 2019-06-04 Fyusion, Inc. Snapshots at predefined intervals or angles
US11069147B2 (en) 2017-06-26 2021-07-20 Fyusion, Inc. Modification of multi-view interactive digital media representation
CN107360354B (zh) * 2017-07-31 2020-06-26 Oppo广东移动通信有限公司 拍照方法、装置、移动终端和计算机可读存储介质
KR102150847B1 (ko) * 2017-08-22 2020-09-02 미쓰비시덴키 가부시키가이샤 화상 처리 장치 및 화상 처리 방법
CN108317954B (zh) * 2017-10-27 2020-06-12 广东康云多维视觉智能科技有限公司 一种激光引导扫描系统和方法
CN107743222B (zh) * 2017-11-22 2023-12-01 中国安全生产科学研究院 一种基于采集器的图像数据处理方法及三维全景vr采集器
KR102431488B1 (ko) * 2018-03-05 2022-08-12 삼성전자주식회사 전자 장치 및 이미지 처리 방법
TWI667529B (zh) * 2018-04-24 2019-08-01 財團法人工業技術研究院 環景點雲資料的建立方法與建立系統
US10694103B2 (en) * 2018-04-24 2020-06-23 Industrial Technology Research Institute Building system and building method for panorama point cloud
US10592747B2 (en) 2018-04-26 2020-03-17 Fyusion, Inc. Method and apparatus for 3-D auto tagging
CN110795052B (zh) * 2018-08-01 2023-06-02 北京大麦文化传媒发展有限公司 显示控制方法、显示控制装置、显示系统和电子设备
US11323754B2 (en) 2018-11-20 2022-05-03 At&T Intellectual Property I, L.P. Methods, devices, and systems for updating streaming panoramic video content due to a change in user viewpoint
WO2020181065A1 (en) * 2019-03-07 2020-09-10 Alibaba Group Holding Limited Method, apparatus, medium, terminal, and device for processing multi-angle free-perspective data
CN110930310B (zh) * 2019-12-09 2023-04-07 中国科学技术大学 全景图像拼接方法
CN114155254B (zh) * 2021-12-09 2022-11-08 成都智元汇信息技术股份有限公司 基于图像校正的切图方法、电子设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040100565A1 (en) * 2002-11-22 2004-05-27 Eastman Kodak Company Method and system for generating images used in extended range panorama composition
CN1715987A (zh) * 2005-06-16 2006-01-04 武汉理工大学 显微镜下全景深大幅图片的拼接方法
CN101015220A (zh) * 2004-09-10 2007-08-08 江良一成 三维图像再现设备
CN101277454A (zh) * 2008-04-28 2008-10-01 清华大学 一种基于双目摄像机的实时立体视频生成方法
CN101577795A (zh) * 2009-06-17 2009-11-11 深圳华为通信技术有限公司 一种实现全景图像的实时预览的方法和装置

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5649032A (en) 1994-11-14 1997-07-15 David Sarnoff Research Center, Inc. System for automatically aligning images to form a mosaic image
US5850352A (en) 1995-03-31 1998-12-15 The Regents Of The University Of California Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized, including panoramic, scene interactive and stereoscopic images
US6249613B1 (en) 1997-03-31 2001-06-19 Sharp Laboratories Of America, Inc. Mosaic generation and sprite-based coding with automatic foreground and background separation
US5986668A (en) 1997-08-01 1999-11-16 Microsoft Corporation Deghosting method and apparatus for construction of image mosaics
JPH11113028A (ja) 1997-09-30 1999-04-23 Toshiba Corp 3次元映像表示装置
US6208373B1 (en) * 1999-08-02 2001-03-27 Timothy Lo Fong Method and apparatus for enabling a videoconferencing participant to appear focused on camera to corresponding users
US7015954B1 (en) 1999-08-09 2006-03-21 Fuji Xerox Co., Ltd. Automatic video system using multiple cameras
US6724417B1 (en) * 2000-11-29 2004-04-20 Applied Minds, Inc. Method and apparatus maintaining eye contact in video delivery systems using view morphing
KR100591616B1 (ko) 2001-05-16 2006-06-20 에스케이씨 주식회사 임피던스 특성이 개선된 고분자 전해질, 이의 제조방법 및이를 채용한 리튬 전지
US20030107646A1 (en) * 2001-08-17 2003-06-12 Byoungyi Yoon Method and system for adjusting display angles of a stereoscopic image based on a camera location
US7006709B2 (en) * 2002-06-15 2006-02-28 Microsoft Corporation System and method deghosting mosaics using multiperspective plane sweep
US7298392B2 (en) * 2003-06-26 2007-11-20 Microsoft Corp. Omni-directional camera design for video conferencing
JP3962676B2 (ja) 2002-11-29 2007-08-22 キヤノン株式会社 画像処理方法及び装置
US20050185047A1 (en) 2004-02-19 2005-08-25 Hii Desmond Toh O. Method and apparatus for providing a combined image
KR100653200B1 (ko) 2006-01-09 2006-12-05 삼성전자주식회사 기하 정보를 교정하여 파노라마 영상을 제공하는 방법 및장치
WO2008111080A1 (en) * 2007-03-15 2008-09-18 Yissum Research Development Company Of The Hebrew University Of Jerusalem Method and system for forming a panoramic image of a scene having minimal aspect distortion
CN101051386B (zh) 2007-05-23 2010-12-08 北京航空航天大学 多幅深度图像的精确配准方法
CN101771830B (zh) 2008-12-30 2012-09-19 华为终端有限公司 立体全景视频流生成方法、设备及视频会议方法和设备
TWI483612B (zh) * 2011-12-22 2015-05-01 Nat Univ Chung Cheng Converting the video plane is a perspective view of the video system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040100565A1 (en) * 2002-11-22 2004-05-27 Eastman Kodak Company Method and system for generating images used in extended range panorama composition
CN101015220A (zh) * 2004-09-10 2007-08-08 江良一成 三维图像再现设备
CN1715987A (zh) * 2005-06-16 2006-01-04 武汉理工大学 显微镜下全景深大幅图片的拼接方法
CN101277454A (zh) * 2008-04-28 2008-10-01 清华大学 一种基于双目摄像机的实时立体视频生成方法
CN101577795A (zh) * 2009-06-17 2009-11-11 深圳华为通信技术有限公司 一种实现全景图像的实时预览的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2385705A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075773A (zh) * 2010-11-25 2011-05-25 深圳市创凯电子有限公司 立体与平面图像混合信号在超大屏幕上显像的同步方法
WO2012164148A1 (en) * 2011-05-31 2012-12-06 Nokia Corporation Methods, apparatuses and computer program products for generating panoramic images using depth map data
US10102827B2 (en) 2011-05-31 2018-10-16 Nokia Technologies Oy Methods, apparatuses and computer program products for generating panoramic images using depth map data
WO2013061334A1 (en) * 2011-10-25 2013-05-02 Mohan Devaraj 3d stereoscopic imaging device with auto parallax
CN111526323A (zh) * 2020-03-24 2020-08-11 视联动力信息技术股份有限公司 一种全景视频的处理方法和装置
CN111526323B (zh) * 2020-03-24 2023-05-23 视联动力信息技术股份有限公司 一种全景视频的处理方法和装置
CN115965526A (zh) * 2022-12-07 2023-04-14 瑞芯微电子股份有限公司 图像拼接方法、片上系统,存储介质和电子设备
CN117425000A (zh) * 2023-10-31 2024-01-19 清研灵智信息咨询(北京)有限公司 基于全景摄像的沉浸式视频巡检监测系统
CN117425000B (zh) * 2023-10-31 2024-04-26 清研灵智信息咨询(北京)有限公司 基于全景摄像的沉浸式视频巡检监测系统

Also Published As

Publication number Publication date
US8717405B2 (en) 2014-05-06
EP2385705A4 (en) 2011-12-21
US20110316963A1 (en) 2011-12-29
EP2385705A1 (en) 2011-11-09

Similar Documents

Publication Publication Date Title
WO2010075726A1 (zh) 立体全景视频流生成方法、设备及视频会议方法和设备
CN101453662B (zh) 立体视频通信终端、系统及方法
US6724417B1 (en) Method and apparatus maintaining eye contact in video delivery systems using view morphing
US20090033737A1 (en) Method and System for Video Conferencing in a Virtual Environment
CN104756489B (zh) 一种虚拟视点合成方法及系统
US20080253685A1 (en) Image and video stitching and viewing method and system
CN110798673A (zh) 基于深度卷积神经网络的自由视点视频生成及交互方法
CN113963094B (zh) 深度图及视频处理、重建方法、装置、设备及存储介质
US10791313B2 (en) Method and apparatus for providing 6DoF omni-directional stereoscopic image based on layer projection
TW200824427A (en) Arrangement and method for the recording and display of images of a scene and/or an object
CN116962745A (zh) 视频图像的混画方法、装置及直播系统
WO2022022348A1 (zh) 视频压缩方法、解压方法、装置、电子设备及存储介质
CN107067452A (zh) 一种基于全卷积神经网络的电影2d转3d方法
US10893258B1 (en) Displacement-oriented view synthesis system and method
JP4188224B2 (ja) 画像処理方法
EP4246988A1 (en) Image synthesis
CN116016977A (zh) 基于直播的虚拟同台连麦互动方法、计算机设备及介质
CN114339120A (zh) 沉浸式视频会议系统
CN109862262A (zh) 图像虚化方法、装置、终端及存储介质
CN104052990B (zh) 一种基于融合深度线索的全自动二维转三维方法和装置
CN102802003A (zh) 基于gpu与网络摄像机的实时拍摄与实时自由立体显示系统
CN117478819A (zh) 远程会议下的实物展示方法、装置及系统
EP4107694A1 (en) Method and device for processing image content
CN119011871B (zh) 一种实时任意视角、自由视角视频生成方法及系统
CN114071074B (zh) 北斗三号短报文信道的图片采集、处理及显示方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09836013

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2009836013

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2009836013

Country of ref document: EP