CN107622495A

CN107622495A - Image processing method and device, electronic device, and computer-readable storage medium

Info

Publication number: CN107622495A
Application number: CN201710811779.9A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2018-01-23

Abstract

The invention discloses an image processing method for an electronic device. The image processing method comprises the following steps: acquiring a plurality of frames of three-dimensional scene images and depth images of a current user at a preset frequency; processing the multi-frame scene images and the multi-frame depth images to segment the character areas and the background areas except the character areas in each frame scene image to obtain multi-frame background area images, wherein the multi-frame background area images correspond to multi-frame preset three-dimensional images; and fusing each frame of preset three-dimensional image with the corresponding background area image to obtain a plurality of frames of combined images so as to output a video image. The invention also discloses an image processing device, an electronic device and a computer readable storage medium. According to the image processing method and device, the electronic device and the computer readable storage medium, the preset three-dimensional image replaces the character region in the three-dimensional scene image to obtain the multi-frame combined image, and the multi-frame combined image is formed into the video image to be output, so that the interestingness of image fusion is increased.

Description

Image processing method and device, electronic device, and computer-readable storage medium

技术领域technical field

本发明涉及图像处理技术领域，特别涉及一种图像处理方法及装置、电子装置和计算机可读存储介质。The present invention relates to the technical field of image processing, in particular to an image processing method and device, an electronic device, and a computer-readable storage medium.

背景技术Background technique

现有的图像融合通常是将用户的人像与背景图像进行融合，但此种融合方式的趣味性较低。The existing image fusion usually fuses the user's portrait with the background image, but this fusion method is less interesting.

发明内容Contents of the invention

本发明的实施例提供了一种图像处理方法、图像处理装置、电子装置和计算机可读存储介质。Embodiments of the present invention provide an image processing method, an image processing device, an electronic device, and a computer-readable storage medium.

本发明实施方式的图像处理方法用于电子装置，所述图像处理方法包括：The image processing method in the embodiment of the present invention is used in an electronic device, and the image processing method includes:

以预设频率采集多帧当前用户的三维的场景图像和深度图像；Collect multi-frame 3D scene images and depth images of the current user at a preset frequency;

处理多帧所述场景图像和多帧所述深度图像以分割每帧所述场景图像中的人物区域及除却所述人物区域以外的背景区域以获得多帧背景区域图像，多帧所述背景区域图像对应多帧预定三维图像；和Processing multiple frames of the scene image and multiple frames of the depth image to segment the character area in each frame of the scene image and the background area except the character area to obtain multiple frames of background area images, and multiple frames of the background area The image corresponds to a plurality of frames of predetermined three-dimensional images; and

将每帧所述预定三维图像与对应的所述背景区域图像融合得到多帧合并图像以输出视频图像。Merging each frame of the predetermined three-dimensional image with the corresponding background area image to obtain a multi-frame merged image to output a video image.

本发明实施方式的图像处理装置用于电子装置，所述图像处理装置包括成像设备和处理器。所述成像设备用于以预设频率采集多帧当前用户的三维的场景图像和深度图像。所述处理器用于处理多帧所述场景图像和多帧所述深度图像以分割每帧所述场景图像中的人物区域及除却所述人物区域以外的背景区域以获得多帧背景区域图像，多帧所述背景区域图像对应多帧预定三维图像，以及将每帧所述预定三维图像与对应的所述背景区域图像融合得到多帧合并图像以输出视频图像。An image processing device according to an embodiment of the present invention is used for an electronic device, and the image processing device includes an imaging device and a processor. The imaging device is used to collect multiple frames of three-dimensional scene images and depth images of the current user at a preset frequency. The processor is configured to process multiple frames of the scene image and multiple frames of the depth image to segment the character area in each frame of the scene image and the background area except the character area to obtain multiple frames of background area images, The frame of the background area image corresponds to multiple frames of predetermined three-dimensional images, and each frame of the predetermined three-dimensional image is fused with the corresponding background area image to obtain multiple frames of merged images to output a video image.

本发明实施方式的电子装置包括一个或多个处理器、存储器和一个或多个程序。其中所述一个或多个程序被存储在所述存储器中，并且被配置成由所述一个或多个处理器执行，所述程序包括用于执行上述的图像处理方法的指令。An electronic device according to an embodiment of the present invention includes one or more processors, memory and one or more programs. Wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, the programs include instructions for executing the above image processing method.

本发明实施方式的计算机可读存储介质包括与能够摄像的电子装置结合使用的计算机程序，所述计算机程序可被处理器执行以完成上述的图像处理方法。The computer-readable storage medium according to the embodiment of the present invention includes a computer program used in combination with an electronic device capable of taking pictures, and the computer program can be executed by a processor to implement the above-mentioned image processing method.

本发明实施方式的图像处理方法、图像处理装置、电子装置和计算机可读存储介质在获取到三维的场景图像和深度图像后，通过深度信息分割每帧场景图像的人物和背景，使得分割出的三维的人物区域和三维的背景区域更加准确，并将每帧分割得到的三维的背景区域图像与对应的预定三维图像融合，即由预定三维图像替换当前用户在场景图像中的人物区域，得到多帧预定三维图像与三维背景区域图像融合的三维合并图像，多帧的三维合并图像还可形成视频图像输出，如此，可增加图像融合的趣味性，提升用户的使用体验。The image processing method, image processing device, electronic device, and computer-readable storage medium in the embodiments of the present invention, after acquiring the three-dimensional scene image and the depth image, use the depth information to segment the characters and the background of each frame of the scene image, so that the segmented The three-dimensional character area and the three-dimensional background area are more accurate, and the three-dimensional background area image obtained by each frame segmentation is fused with the corresponding predetermined three-dimensional image, that is, the predetermined three-dimensional image replaces the current user's character area in the scene image, and multiple The 3D merged image of the predetermined 3D image and the 3D background area image is fused, and the 3D merged image of multiple frames can also form a video image output. In this way, the fun of image fusion can be increased and the user experience can be improved.

本发明的附加方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1是本发明某些实施方式的图像处理方法的流程示意图。Fig. 1 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图2是本发明某些实施方式的电子装置的结构示意图。Fig. 2 is a schematic structural diagram of an electronic device according to some embodiments of the present invention.

图3是本发明某些实施方式的图像处理装置的示意图。Fig. 3 is a schematic diagram of an image processing device according to some embodiments of the present invention.

图4是本发明某些实施方式的图像处理方法的流程示意图。Fig. 4 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图5是本发明某些实施方式的图像处理方法的流程示意图。Fig. 5 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图6(a)至图6(e)是根据本发明一个实施例的结构光测量的场景示意图。Fig. 6(a) to Fig. 6(e) are schematic diagrams of scenes of structured light measurement according to an embodiment of the present invention.

图7(a)和图7(b)是根据本发明一个实施例的结构光测量的场景示意图。Fig. 7(a) and Fig. 7(b) are schematic diagrams of scenes of structured light measurement according to an embodiment of the present invention.

图8是本发明某些实施方式的图像处理方法的流程示意图。Fig. 8 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图9是本发明某些实施方式的图像处理方法的流程示意图。Fig. 9 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图10是本发明某些实施方式的图像处理方法的流程示意图。Fig. 10 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图11是本发明某些实施方式的图像处理方法的流程示意图。Fig. 11 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图12是本发明某些实施方式的图像处理方法的流程示意图。Fig. 12 is a schematic flowchart of an image processing method in some embodiments of the present invention.

图13是本发明某些实施方式的图像处理装置的示意图。Fig. 13 is a schematic diagram of an image processing device according to some embodiments of the present invention.

图14是本发明某些实施方式的电子装置的示意图。Figure 14 is a schematic diagram of an electronic device according to some embodiments of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

请一并参阅图1至2，本发明实施方式的图像处理方法用于电子装置1000。图像处理方法包括：Please refer to FIGS. 1 to 2 together. The image processing method according to the embodiment of the present invention is used in an electronic device 1000 . Image processing methods include:

03：以预设频率采集多帧当前用户的三维的场景图像和深度图像；03: Collect multi-frame 3D scene images and depth images of the current user at a preset frequency;

05：处理多帧场景图像和多帧深度图像以分割每帧场景图像中的人物区域及除却人物区域以外的背景区域以获得多帧背景区域图像，多帧背景区域图像对应多帧预定三维图像；和05: Process multiple frames of scene images and multiple frames of depth images to segment the character area in each frame of the scene image and the background area except the character area to obtain multiple frames of background area images, and the multiple frames of background area images correspond to multiple frames of predetermined three-dimensional images; and

07：将每帧预定三维图像与对应的背景区域图像融合得到多帧合并图像以输出视频图像。07: Merge each frame of the predetermined three-dimensional image with the corresponding background area image to obtain a multi-frame merged image to output a video image.

请参阅图3，本发明实施方式的图像处理方法可以由本发明实施方式的图像处理装置100实现。本发明实施方式的图像处理装置100用于电子装置1000。图像处理装置100包括成像设备10和处理器20。步骤03可以由成像设备10实现，步骤05和步骤07可以由处理器20实现。Referring to FIG. 3 , the image processing method of the embodiment of the present invention can be implemented by the image processing apparatus 100 of the embodiment of the present invention. The image processing device 100 according to the embodiment of the present invention is used in an electronic device 1000 . The image processing apparatus 100 includes an imaging device 10 and a processor 20 . Step 03 may be implemented by the imaging device 10 , and Step 05 and Step 07 may be implemented by the processor 20 .

也即是说，成像设备10可用于以预设频率采集多帧当前用户的三维的场景图像和深度图像；处理器20可用于处理多帧场景图像和多帧深度图像以分割每帧场景图像中的人物区域及除却人物区域以外的背景区域以获得多帧背景区域图像，多帧背景区域图像对应多帧预定三维图像，以及将每帧预定三维图像与对应的背景区域图像融合得到多帧合并图像以输出视频图像。That is to say, the imaging device 10 can be used to collect multiple frames of three-dimensional scene images and depth images of the current user at a preset frequency; the processor 20 can be used to process multiple frames of scene images and multiple frames of depth images to segment each frame of scene images The character area and the background area except the character area to obtain multiple frames of background area images, the multiple frames of background area images correspond to multiple frames of predetermined three-dimensional images, and each frame of predetermined three-dimensional images is fused with the corresponding background area image to obtain a multi-frame merged image to output video images.

其中，预设频率指的是成像设备10每秒钟采集图像的帧率，帧率的取值可以是每秒钟30帧、每秒钟60帧、每秒钟120帧等。帧率越高，视频图像越流畅。Wherein, the preset frequency refers to the frame rate at which the imaging device 10 collects images per second, and the value of the frame rate may be 30 frames per second, 60 frames per second, 120 frames per second, etc. The higher the frame rate, the smoother the video image.

背景区域图像由三维的场景图像经人物区域和背景区域分割后得到，因此，背景区域图像也为三维图像。The background area image is obtained by dividing the three-dimensional scene image by the person area and the background area, therefore, the background area image is also a three-dimensional image.

在某些实施方式中，预定三维图像包括三维的虚拟人物、三维的真实人物、三维的动植物中的至少一种。三维的真实人物排除当前用户自身。其中，三维的虚拟人物可以是三维的动画人物，例如马里奥、柯南、大头儿子、蜡笔小新等；三维的真实人物可以是三维图像的著名人物，例如奥黛丽·赫本、憨豆先生、哈利波特等，三维的动植物可以是三维动画的动物或植物，例如米老鼠、唐老鸭、豌豆射手等。In some embodiments, the predetermined three-dimensional image includes at least one of three-dimensional virtual characters, three-dimensional real characters, and three-dimensional animals and plants. The three-dimensional real person excludes the current user himself. Wherein, the three-dimensional virtual character can be a three-dimensional animation character, such as Mario, Conan, Big Head Son, Crayon Shin-chan, etc.; the three-dimensional real character can be a famous figure in three-dimensional image, such as Audrey Hepburn, Mr. Bean, Harry Potter, etc., the three-dimensional animals and plants can be three-dimensional animation animals or plants, such as Mickey Mouse, Donald Duck, pea shooter and so on.

本发明实施方式的图像处理装置100可以应用于本发明实施方式的电子装置1000。也即是说，本发明实施方式的电子装置1000包括本发明实施方式的图像处理装置100。The image processing device 100 according to the embodiment of the present invention can be applied to the electronic device 1000 according to the embodiment of the present invention. That is to say, the electronic device 1000 according to the embodiment of the present invention includes the image processing device 100 according to the embodiment of the present invention.

在某些实施方式中，电子装置1000包括手机、平板电脑、笔记本电脑、智能手环、智能手表、智能头盔、智能眼镜等。In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a laptop computer, a smart bracelet, a smart watch, a smart helmet, smart glasses, and the like.

本发明实施方式的图像处理方法、图像处理装置100和电子装置1000在获取到三维的场景图像和深度图像后，通过深度信息分割每帧场景图像的人物和背景，使得分割出的三维的人物区域和三维的背景区域更加准确，并将每帧分割得到的三维的背景区域图像与对应的预定三维图像融合，即由预定三维图像替换当前用户在场景图像中的人物区域，得到多帧预定三维图像与三维背景区域图像融合的三维合并图像，多帧的三维合并图像还可形成视频图像输出，如此，可增加图像融合的趣味性，提升用户的使用体验。此外，由于合并图像中不包含用户的实际人像，因此可在一定程度上保护用户的隐私。After the image processing method, the image processing device 100 and the electronic device 1000 according to the embodiment of the present invention acquire the three-dimensional scene image and the depth image, the characters and the background of each frame of the scene image are segmented according to the depth information, so that the segmented three-dimensional character area And the three-dimensional background area is more accurate, and the three-dimensional background area image obtained by each frame segmentation is fused with the corresponding predetermined three-dimensional image, that is, the predetermined three-dimensional image replaces the character area of the current user in the scene image, and multiple frames of predetermined three-dimensional images are obtained The 3D merged image fused with the 3D background area image, and the multi-frame 3D merged image can also form a video image output, which can increase the fun of image fusion and improve user experience. In addition, since the merged image does not contain the actual portrait of the user, the privacy of the user can be protected to a certain extent.

请参阅图4，在某些实施方式中，步骤03以预设频率采集多帧当前用户的二维的场景图像和深度图像包括：Please refer to FIG. 4. In some embodiments, step 03 collects multiple frames of the current user's two-dimensional scene image and depth image at a preset frequency, including:

031：以预设频率拍摄当前用户以得到多帧二维图像；031: Shoot the current user at a preset frequency to obtain multiple frames of two-dimensional images;

032：向当前用户投射结构光；032: Project structured light to the current user;

033：以预设频率拍摄多帧经当前用户调制的结构光图像；和033: Take multiple frames of structured light images modulated by the current user at a preset frequency; and

034：解调每帧结构光图像的各个像素对应的相位信息以得到多帧深度图像；和034: Demodulate the phase information corresponding to each pixel of each frame of the structured light image to obtain a multi-frame depth image; and

035：处理多帧二维图像及多帧深度图像以得到多帧三维的场景图像。035: Process multiple frames of two-dimensional images and multiple frames of depth images to obtain multiple frames of three-dimensional scene images.

请再参阅图3，在某些实施方式中，图像处理装置100包括成像设备10。成像设备10包括可见光摄像头11和深度图像采集组件12。深度图像采集组件12包括结构光投射器121和结构光摄像头122。步骤031可以由可见光摄像头11实现，步骤032可以由结构光投射器121实现，步骤033、步骤034和步骤035可以由结构光摄像头122实现。Please refer to FIG. 3 again. In some embodiments, the image processing apparatus 100 includes an imaging device 10 . The imaging device 10 includes a visible light camera 11 and a depth image acquisition component 12 . The depth image acquisition component 12 includes a structured light projector 121 and a structured light camera 122 . Step 031 can be realized by the visible light camera 11 , step 032 can be realized by the structured light projector 121 , and step 033 , step 034 and step 035 can be realized by the structured light camera 122 .

也即是说，可见光摄像头11可用于以预设频率拍摄当前用户以得到多帧二维图像；结构光投射器121可用于向当前用户投射结构光；结构光摄像头122可用于以预设频率拍摄多帧经当前用户调制的结构光图像，解调每帧结构光图像的各个像素对应的相位信息以得到多帧深度图像，以及处理多帧二维图像及多帧深度图像以得到多帧三维的场景图像。That is to say, the visible light camera 11 can be used to shoot the current user at a preset frequency to obtain multiple frames of two-dimensional images; the structured light projector 121 can be used to project structured light to the current user; the structured light camera 122 can be used to shoot at a preset frequency Multiple frames of structured light images modulated by the current user, demodulate the phase information corresponding to each pixel of each frame of structured light images to obtain multiple frames of depth images, and process multiple frames of two-dimensional images and multiple frames of depth images to obtain multiple frames of three-dimensional images scene image.

具体地，可见光摄像头11拍摄当前用户的二维图像，二维图像为灰度图像或彩色图像。结构光投射器121将一定模式的结构光投射到当前用户的面部及躯体上后，在当前用户的面部及躯体的表面会形成由当前用户调制后的结构光图像。结构光摄像头122以预设帧率拍摄多帧经调制后的结构光图像，再对每一帧结构光图像进行解调以得到与该帧结构光图像对应的深度图像，如此，对多帧结构光图像进行解调后即可得到多帧深度图像。其中，结构光的模式可以是激光条纹、格雷码、正弦条纹、非均匀散斑等。深度图像表征包含当前用户所在场景中各个人或物体的深度信息。二维图像的场景范围与深度图像的场景范围一直，且二维图像中的各个像素就能在深度图像中找到对应该像素的深度信息。如此，处理器20可根据深度图像汇总的深度信息对结构光摄像头122拍摄的场景进行三维建模，建模后再结合二维图像的色彩信息对三维建模的场景进行颜色填补即可得到三维的彩色的场景图像。Specifically, the visible light camera 11 captures a two-dimensional image of the current user, and the two-dimensional image is a grayscale image or a color image. After the structured light projector 121 projects a certain pattern of structured light onto the current user's face and body, a structured light image modulated by the current user will be formed on the surface of the current user's face and body. The structured light camera 122 captures multiple frames of modulated structured light images at a preset frame rate, and then demodulates each frame of structured light images to obtain a depth image corresponding to the frame of structured light images. After the light image is demodulated, a multi-frame depth image can be obtained. Among them, the pattern of structured light can be laser stripes, gray codes, sinusoidal stripes, non-uniform speckle, etc. The depth image representation contains the depth information of each person or object in the scene where the current user is. The scene range of the 2D image is the same as that of the depth image, and for each pixel in the 2D image, the depth information corresponding to the pixel can be found in the depth image. In this way, the processor 20 can perform three-dimensional modeling on the scene captured by the structured light camera 122 according to the depth information summarized in the depth image, and then combine the color information of the two-dimensional image to fill in the color of the three-dimensional modeled scene to obtain a three-dimensional Colored scene image.

需要说明的是，在本发明的具体实施例中，可见光摄像头11和深度图像采集组件12应采用同一预设频率分别进行二维图像和深度图像的采集，如此，三维的多帧场景图像与多帧深度图像一一对应，便于步骤07中对预定三维图像与背景区域图像的融合处理。It should be noted that, in a specific embodiment of the present invention, the visible light camera 11 and the depth image acquisition component 12 should use the same preset frequency to collect two-dimensional images and depth images respectively. In this way, three-dimensional multi-frame scene images and multi-frame The one-to-one correspondence between the frame depth images facilitates the fusion processing of the predetermined three-dimensional image and the background area image in step 07.

请参阅图5，在某些实施方式中，步骤034解调每帧结构光图像的各个像素对应的相位信息以得到多帧深度图像的步骤包括：Please refer to FIG. 5 , in some embodiments, the step 034 of demodulating the phase information corresponding to each pixel of each frame of structured light image to obtain multiple frames of depth images includes:

0341：解调每帧结构光图像中各个像素对应的相位信息；0341: Demodulate the phase information corresponding to each pixel in each frame of structured light image;

0342：将相位信息转化为深度信息；和0342: Convert phase information to depth information; and

0343：根据深度信息生成深度图像。0343: Generate a depth image from depth information.

请再参阅图2，在某些实施方式中，步骤0341、步骤0342和步骤0343均可以由结构光摄像头122实现。Please refer to FIG. 2 again. In some implementation manners, Step 0341 , Step 0342 and Step 0343 can all be implemented by the structured light camera 122 .

也即是说，结构光摄像头122可进一步用于解调每帧结构光图像中各个像素对应的相位信息，将相位信息转化为深度信息，以及根据深度信息生成深度图像。That is to say, the structured light camera 122 can be further used to demodulate the phase information corresponding to each pixel in each frame of the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

具体地，与未经调制的结构光相比，调制后的结构光的相位信息发生了变化，在结构光图像中呈现出的结构光是产生了畸变之后的结构光，其中，变化的相位信息即可表征物体的深度信息。因此，结构光摄像头122首先解调出每帧结构光图像中各个像素对应的相位信息，再根据相位信息计算出深度信息，从而得到与该帧结构光图像对应的深度图像。Specifically, compared with unmodulated structured light, the phase information of the modulated structured light has changed, and the structured light presented in the structured light image is the structured light after distortion, wherein the changed phase information It can represent the depth information of the object. Therefore, the structured light camera 122 first demodulates the phase information corresponding to each pixel in each frame of the structured light image, and then calculates the depth information according to the phase information, thereby obtaining the depth image corresponding to the frame of the structured light image.

为了使本领域的技术人员更加清楚的了解根据结构光来采集当前用户的面部及躯体的深度图像的过程，下面以一种应用广泛的光栅投影技术(条纹投影技术)为例来阐述其具体原理。其中，光栅投影技术属于广义上的面结构光。In order to make those skilled in the art more clearly understand the process of collecting the depth image of the current user's face and body based on structured light, the following uses a widely used grating projection technology (stripe projection technology) as an example to illustrate its specific principles . Among them, grating projection technology belongs to surface structured light in a broad sense.

如图6(a)所示，在使用面结构光投影的时候，首先通过计算机编程产生正弦条纹，并将正弦条纹通过结构光投射器121投射至被测物，再利用结构光摄像头122拍摄条纹受物体调制后的弯曲程度，随后解调该弯曲条纹得到相位，再将相位转化为深度信息即可获取深度图像。为避免产生误差或误差耦合的问题，使用结构光进行深度信息采集前需对深度图像采集组件12进行参数标定，标定包括几何参数(例如，结构光摄像头122与结构光投射器121之间的相对位置参数等)的标定、结构光摄像头122的内部参数以及结构光投射器121的内部参数的标定等。As shown in Figure 6(a), when using surface structured light projection, firstly generate sinusoidal fringes through computer programming, project the sinusoidal fringes to the measured object through the structured light projector 121, and then use the structured light camera 122 to capture the fringes The degree of curvature modulated by the object is then demodulated to obtain the phase of the curved fringe, and then the phase is converted into depth information to obtain a depth image. In order to avoid the problem of error or error coupling, it is necessary to calibrate the parameters of the depth image acquisition component 12 before using structured light for depth information acquisition. The calibration includes geometric parameters (for example, the relative relationship between the structured light camera 122 and the structured light projector 121). position parameters, etc.), the internal parameters of the structured light camera 122 and the internal parameters of the structured light projector 121, etc.

具体而言，第一步，计算机编程产生正弦条纹。由于后续需要利用畸变的条纹获取相位，比如采用四步移相法获取相位，因此这里产生四幅相位差为π/2的条纹，然后结构光投射器121将该四幅条纹分时投射到被测物(图6(a)所示的面具)上，结构光摄像头122采集到如图6(b)左边的图，同时要读取如图6(b)右边所示的参考面的条纹。Specifically, in the first step, the computer is programmed to generate sinusoidal fringes. Since the distorted fringes need to be used to obtain the phase later, for example, the four-step phase shifting method is used to obtain the phase, so four fringes with a phase difference of π/2 are generated here, and then the structured light projector 121 time-sharing projects the four fringes to the measured object (the mask shown in FIG. 6( a )), the structured light camera 122 collects the picture on the left side of FIG. 6( b ), and at the same time reads the fringes of the reference surface shown on the right side of FIG. 6( b ).

第二步，进行相位恢复。结构光摄像头122根据采集到的四幅受调制的条纹图(即结构光图像)计算出被调制相位，此时得到的相位图是截断相位图。因为四步移相算法得到的结果是由反正切函数计算所得，因此结构光调制后的相位被限制在[-π,π]之间，也就是说，每当调制后的相位超过[-π,π]，其又会重新开始。最终得到的相位主值如图6(c)所示。The second step is to perform phase recovery. The structured light camera 122 calculates the modulated phase according to the collected four modulated fringe patterns (ie structured light images), and the obtained phase pattern at this time is a truncated phase pattern. Because the result obtained by the four-step phase-shift algorithm is calculated by the arctangent function, the modulated phase of the structured light is limited to [-π, π], that is, whenever the modulated phase exceeds [-π ,π], which will start all over again. The resulting main value of the phase is shown in Fig. 6(c).

其中，在进行相位恢复过程中，需要进行消跳变处理，即将截断相位恢复为连续相位。如图6(d)所示，左边为受调制的连续相位图，右边是参考连续相位图。Wherein, during the phase recovery process, transition elimination processing is required, that is, the truncated phase is restored to a continuous phase. As shown in Figure 6(d), the modulated continuous phase map is on the left, and the reference continuous phase map is on the right.

第三步，将受调制的连续相位和参考连续相位相减得到相位差(即相位信息)，该相位差表征了被测物相对参考面的深度信息，再将相位差代入相位与深度的转化公式(公式中涉及到的参数经过标定)，即可得到如图6(e)所示的待测物体的三维模型。The third step is to subtract the modulated continuous phase from the reference continuous phase to obtain the phase difference (that is, phase information), which represents the depth information of the measured object relative to the reference plane, and then substitute the phase difference into the conversion of phase and depth formula (the parameters involved in the formula have been calibrated), the three-dimensional model of the object to be measured can be obtained as shown in Figure 6(e).

应当理解的是，在实际应用中，根据具体应用场景的不同，本发明实施例中所采用的结构光除了上述光栅之外，还可以是其他任意图案。It should be understood that, in practical applications, according to different specific application scenarios, the structured light used in the embodiments of the present invention may be other arbitrary patterns besides the above-mentioned grating.

作为一种可能的实现方式，本发明还可使用散斑结构光进行当前用户的深度信息的采集。As a possible implementation manner, the present invention may also use speckle structured light to collect depth information of the current user.

具体地，散斑结构光获取深度信息的方法是使用一基本为平板的衍射元件，该衍射元件具有特定相位分布的浮雕衍射结构，横截面为具有两个或多个凹凸的台阶浮雕结构。衍射元件中基片的厚度大致为1微米，各个台阶的高度不均匀，高度的取值范围可为0.7微米～0.9微米。图7(a)所示结构为本实施例的准直分束元件的局部衍射结构。图7(b)为沿截面A-A的剖面侧视图，横坐标和纵坐标的单位均为微米。散斑结构光生成的散斑图案具有高度的随机性，并且会随着距离的不同而变换图案。因此，在使用散斑结构光获取深度信息前，首先需要标定出空间中的散斑图案，例如，在距离结构光摄像头122的0～4米的范围内，每隔1厘米取一个参考平面，则标定完毕后就保存了400幅散斑图像，标定的间距越小，获取的深度信息的精度越高。随后，结构光投射器121将散斑结构光投射到被测物(即当前用户)上，被测物表面的高度差使得投射到被测物上的散斑结构光的散斑图案发生变化。结构光摄像头122拍摄投射到被测物上的散斑图案(即结构光图像)后，再将散斑图案与前期标定后保存的400幅散斑图像逐一进行互相关运算，进而得到400幅相关度图像。空间中被测物体所在的位置会在相关度图像上显示出峰值，把上述峰值叠加在一起并经过插值运算后即可得到被测物的深度信息。Specifically, the method for obtaining depth information by speckle structured light is to use a substantially flat diffraction element, which has a relief diffraction structure with a specific phase distribution, and a stepped relief structure with two or more concavo-convex cross sections. The thickness of the substrate in the diffraction element is approximately 1 micron, and the height of each step is not uniform, and the height can range from 0.7 micron to 0.9 micron. The structure shown in Fig. 7(a) is the partial diffraction structure of the collimating beam splitting element of this embodiment. Fig. 7(b) is a cross-sectional side view along section A-A, and the units of the abscissa and ordinate are both micrometers. The speckle pattern generated by speckle structured light is highly random, and the pattern will change with the distance. Therefore, before using speckle structured light to obtain depth information, it is first necessary to calibrate the speckle pattern in space, for example, within the range of 0 to 4 meters from the structured light camera 122, take a reference plane every 1 cm, After the calibration is completed, 400 speckle images are saved. The smaller the calibration interval, the higher the accuracy of the acquired depth information. Subsequently, the structured light projector 121 projects the speckle structured light onto the object under test (that is, the current user), and the height difference of the surface of the object under test changes the speckle pattern of the speckle structured light projected on the object under test. After the structured light camera 122 shoots the speckle pattern projected on the object to be measured (that is, the structured light image), the cross-correlation operation is performed on the speckle pattern and the 400 speckle images saved after previous calibration, and then 400 correlation images are obtained. degree image. The position of the measured object in the space will show a peak on the correlation image, and the depth information of the measured object can be obtained by superimposing the above peaks and interpolating.

由于普通的衍射元件对光束进行衍射后得到多束衍射光，但每束衍射光光强差别大，对人眼伤害的风险也大。即便是对衍射光进行二次衍射，得到的光束的均匀性也较低。因此，利用普通衍射元件衍射的光束对被测物进行投射的效果较差。本实施例中采用准直分束元件，该元件不仅具有对非准直光束进行准直的作用，还具有分光的作用，即经反射镜反射的非准直光经过准直分束元件后往不同的角度出射多束准直光束，且出射的多束准直光束的截面面积近似相等，能量通量近似相等，进而使得利用该光束衍射后的散点光进行投射的效果更好。同时，激光出射光分散至每一束光，进一步降低了伤害人眼的风险，且散斑结构光相对于其他排布均匀的结构光来说，达到同样的采集效果时，散斑结构光消耗的电量更低。Since the ordinary diffraction element diffracts the light beam to obtain multiple beams of diffracted light, but the intensity of each beam of diffracted light varies greatly, and the risk of damage to human eyes is also large. Even if the diffracted light is diffracted twice, the uniformity of the obtained beam is low. Therefore, the projection effect of the light beam diffracted by the common diffraction element on the measured object is relatively poor. In this embodiment, a collimating beam-splitting element is used, which not only has the function of collimating the uncollimated beam, but also has the function of splitting light, that is, the uncollimated light reflected by the mirror passes through the collimating beam-splitting element and then Multiple collimated beams are emitted from different angles, and the cross-sectional areas of the emitted multiple collimated beams are approximately equal, and the energy flux is approximately equal, so that the projection effect of the scattered light after the diffraction of the beam is better. At the same time, the emitted laser light is dispersed to each beam, which further reduces the risk of damage to human eyes. Compared with other evenly arranged structured light, when the speckle structured light achieves the same collection effect, the speckle structured light consumes less lower power.

请参阅图8，在某些实施方式中，步骤05处理多帧场景图像和多帧深度图像以分割每帧场景图像中的人物区域及除却人物区域以外的背景区域以获得背景区域图像包括：Please refer to FIG. 8 , in some implementations, step 05 processes the multi-frame scene image and the multi-frame depth image to segment the character area in each frame scene image and the background area except the character area to obtain the background area image including:

051：识别每帧场景图像中的人脸区域；051: Identify the face area in each frame of the scene image;

052：从该帧场景图像或与该帧场景图像对应的深度图像中获取与人脸区域对应的深度信息；052: Obtain the depth information corresponding to the face area from the scene image of the frame or the depth image corresponding to the scene image of the frame;

053：根据人脸区域的深度信息确定人物区域的深度范围；053: Determine the depth range of the character area according to the depth information of the face area;

054：根据人物区域的深度范围确定与人脸区域连接且落入深度范围内的人物区域；054: Determine the character area connected to the face area and falling within the depth range according to the depth range of the character area;

055：根据人物区域及该帧场景图像确定背景区域图像。055: Determine the background area image according to the character area and the frame scene image.

请再参阅图2，在某些实施方式中，步骤051、步骤052、步骤053、步骤054和步骤055均可以由处理器20实现。Please refer to FIG. 2 again, in some implementation manners, step 051 , step 052 , step 053 , step 054 and step 055 can all be implemented by the processor 20 .

也即是说，处理器20可进一步用于识别每帧场景图像中的人脸区域，从该帧场景图像或与该帧场景图像对应的深度图像中获取与人脸区域对应的深度信息，根据人脸区域的深度信息确定人物区域的深度范围，根据人物区域的深度范围确定与人脸区域连接且落入深度范围内的人物区域，以及根据人物区域及该帧场景图像确定所述背景区域图像。That is to say, the processor 20 can be further used to identify the face area in each frame of the scene image, and obtain the depth information corresponding to the face area from the frame of the scene image or the depth image corresponding to the frame of the scene image, according to The depth information of the face area determines the depth range of the character area, determines the character area connected to the face area and falls within the depth range according to the depth range of the character area, and determines the background area image according to the character area and the frame scene image .

具体地，首先可采用已训练好的深度学习模型识别出每帧场景图像中的人脸区域，随后，由于每帧三维的场景图像除包含色彩信息外，还包含深度信息，因此可直接根据该帧三维的场景图像获取人脸区域的深度信息，或者，也可以根据深度图像与二维图像的对应的关系可确定出各帧场景图像中人脸区域的深度信息。由于人脸区域包括鼻子、眼睛、耳朵、嘴唇等特征，因此，人脸区域中的各个特征在三维的场景图像或深度图像中所对应的深度数据是不同的，例如，在人脸正对深度图像采集组件12时，深度图像采集组件12拍摄得的深度图像中，鼻子对应的深度数据可能较小，而耳朵对应的深度数据可能较大。因此，上述的人脸区域的深度信息可能为一个数值或是一个数值范围。其中，当人脸区域的深度信息为一个数值时，该数值可通过对人脸区域的深度数据取平均值得到；或者，可以通过对人脸区域的深度数据取中值得到。Specifically, firstly, the trained deep learning model can be used to identify the face area in each frame of scene image, and then, since each frame of three-dimensional scene image contains not only color information, but also depth information, it can be directly based on the The depth information of the human face area is obtained from the three-dimensional scene images of the frame, or the depth information of the human face area in each frame of the scene image may be determined according to the corresponding relationship between the depth image and the two-dimensional image. Since the face area includes features such as nose, eyes, ears, and lips, the depth data corresponding to each feature in the face area in the three-dimensional scene image or depth image is different. When using the image acquisition component 12, in the depth image captured by the depth image acquisition component 12, the depth data corresponding to the nose may be smaller, while the depth data corresponding to the ear may be larger. Therefore, the above-mentioned depth information of the face area may be a value or a range of values. Wherein, when the depth information of the face area is a value, the value may be obtained by taking an average value of the depth data of the face area; or, may be obtained by taking a median value of the depth data of the face area.

由于人物区域包含人脸区域，也即是说，人物区域与人脸区域同处于某一个深度范围内，因此，处理器20确定出人脸区域的深度信息后，可以根据人脸区域的深度信息设定人物区域的深度范围，再根据人物区域的深度范围提取落入该深度范围内且与人脸区域相连接的人物区域。确定出人物区域后，场景图像中除却人物区域的部分即为背景区域的部分，处理器20将场景图像中除却人物区域外的部分进行提取，即可得到背景区域图像。Since the character area includes the face area, that is to say, the character area and the face area are in a certain depth range, therefore, after the processor 20 determines the depth information of the face area, it can The depth range of the character area is set, and then the character area falling within the depth range and connected to the face area is extracted according to the depth range of the character area. After the character area is determined, the part of the scene image excluding the character area is the part of the background area, and the processor 20 extracts the part of the scene image excluding the character area to obtain the background area image.

如此，即可根据深度信息从每帧场景图像中分割出人物区域和背景区域以得到多帧背景区域图像。由于深度信息的获取不受环境中光照、色温等因素的影像响，因此，提取出的背景区域图像更加准确。In this way, the person area and the background area can be segmented from each frame of the scene image according to the depth information to obtain multiple frames of background area images. Since the acquisition of depth information is not affected by factors such as illumination and color temperature in the environment, the extracted image of the background area is more accurate.

请参阅图9，在某些实施方式中，图像处理方法还包括以下步骤：Referring to Fig. 9, in some embodiments, the image processing method also includes the following steps:

061：处理每帧场景图像以得到每帧场景图像的全场边缘图像；和061: Process each frame of the scene image to obtain a full-field edge image of each frame of the scene image; and

062：根据每帧全场边缘图像修正与该帧全场边缘图像对应的背景区域图像。062: Correct the background area image corresponding to the full-field edge image of each frame according to the full-field edge image of the frame.

请再参阅图2，在某些实施方式中，步骤061和步骤062均可以由处理器20实现。Please refer to FIG. 2 again, in some implementation manners, both step 061 and step 062 may be implemented by the processor 20 .

也即是说，处理器20还可用于处理每帧场景图像以得到每帧场景图像的全场边缘图像，以及根据每帧全场边缘图像修正与该帧全场边缘图像对应的背景区域图像。That is to say, the processor 20 is further configured to process each frame of the scene image to obtain a full-field edge image of each frame of scene image, and modify the background area image corresponding to the frame of the full-field edge image according to each frame of the full-field edge image.

处理器20首先对每帧场景图像进行边缘提取以得到多帧全场边缘图像，其中，全场边缘图像中的边缘线条包括当前用户以及当前用户所处场景中背景物体的边缘线条。具体地，可通过Canny算子对每帧场景图像进行边缘提取。Canny算子进行边缘提取的算法的核心主要包括以下几步：首先，用2D高斯滤波模板对场景图像进行卷积以消除噪声；随后，利用微分算子得到各个像素的灰度的梯度值，并根据梯度值计算各个像素的灰度的梯度方向，通过梯度方向可以找到对应像素沿梯度方向的邻接像素；随后，遍历每一个像素，若某个像素的灰度值与其梯度方向上前后两个相邻像素的灰度值相比不是最大的，那么认为这个像素不是边缘点。如此，即可确定场景图像中处于边缘位置的像素点，从而获得边缘提取后的全场边缘图像。The processor 20 first performs edge extraction on each frame of the scene image to obtain multiple frames of full-field edge images, wherein the edge lines in the full-field edge image include edge lines of the current user and background objects in the scene where the current user is located. Specifically, edge extraction can be performed on each frame of the scene image through the Canny operator. The core of the algorithm for edge extraction by Canny operator mainly includes the following steps: first, the scene image is convolved with a 2D Gaussian filter template to eliminate noise; then, the gradient value of the gray level of each pixel is obtained by using the differential operator, and Calculate the gradient direction of the gray level of each pixel according to the gradient value, through the gradient direction, you can find the adjacent pixels of the corresponding pixel along the gradient direction; then, traverse each pixel, if the gray value of a pixel is the same as the two before and after the gradient direction If the gray value of the adjacent pixel is not the largest, then this pixel is considered not to be an edge point. In this way, the pixel points at the edge position in the scene image can be determined, so as to obtain the edge image of the whole field after edge extraction.

每一帧场景图像对应一帧全场边缘图像，同样地，每一帧场景图像对应一帧背景区域图像，因此，全场边缘图像和背景区域图像是一一对应的。处理器20获取全场边缘图像后，根据全场边缘图像对与全场边缘图像对应的背景区域图像进行修正，具体地，先利用全场边缘图像对场景图像中的人物区域进行修正，再根据修正后的人物区域确定出最终的背景区域图像。可以理解，人物区域是将场景图像中与人脸区域连接并落入设定的深度范围的所有像素进行归并后得到的，在某些场景下，可能存在一些与人脸区域连接且落入深度范围内的物体。因此，可使用全场边缘图对人物区域进行修正以得到更加准确的人物区域，再根据准确度较高的人物区域确定背景区域。如此，最终得到的背景区域图像也更加准确。Each frame of the scene image corresponds to a frame of the full-field edge image. Similarly, each frame of the scene image corresponds to a frame of the background area image. Therefore, the full-field edge image and the background area image are in one-to-one correspondence. After the processor 20 acquires the edge image of the entire field, it corrects the background region image corresponding to the edge image of the entire field according to the edge image of the entire field. The corrected person area determines the final background area image. It can be understood that the character area is obtained by merging all the pixels in the scene image that are connected to the face area and fall into the set depth range. In some scenes, there may be some pixels that are connected to the face area and fall into the depth range. objects within the range. Therefore, the full-field edge map can be used to correct the character area to obtain a more accurate character area, and then the background area can be determined based on the more accurate character area. In this way, the final image of the background region is also more accurate.

进一步地，处理器20还可对修正后的人物区域进行二次修正，例如，可对修正后的人物区域进行膨胀处理，扩大人物区域以保留人物区域的边缘细节。随后，处理器20进一步根据更为准确的人物区域确定背景区域，则最终得到的背景区域图像的精准度更高。Further, the processor 20 may also perform a secondary correction on the corrected character area, for example, may perform dilation processing on the corrected character area, and expand the character area to preserve edge details of the character area. Subsequently, the processor 20 further determines the background area according to the more accurate person area, and the final image of the background area is more accurate.

请参阅图10，在某些实施方式中，本发明实施方式的图像处理方法还包括：Please refer to Fig. 10, in some embodiments, the image processing method of the embodiment of the present invention also includes:

063：处理每帧场景图像和深度图像以提取当前用户的动作信息；和063: Process each frame of the scene image and the depth image to extract the current user's motion information; and

064：根据动作信息渲染预定三维图像以使每帧预定三维图像跟随当前用户的动作；064: Render the predetermined three-dimensional image according to the action information so that each frame of the predetermined three-dimensional image follows the current user's action;

步骤07将每帧预定三维图像与对应的背景区域图像融合得到合并图像以输出视频图像的步骤包括：Step 07. The step of merging the predetermined three-dimensional image of each frame with the corresponding background area image to obtain a combined image to output a video image includes:

071：将渲染后的每帧预定三维图像与对应的背景区域图像融合得到合并图像以输出视频图像。071: Merge the rendered predetermined three-dimensional image of each frame with the corresponding background area image to obtain a merged image to output a video image.

请再参阅图2，在某些实施方式中，步骤063、步骤064和步骤071可以由处理器20实现。也即是说，处理器20可用于处理每帧场景图像和深度图像以提取当前用户的动作信息，根据动作信息渲染预定三维图像以使每帧预定三维图像跟随当前用户的动作，以及将每帧渲染后的预定三维图像与对应的背景区域图像融合得到合并图像以输出视频图像。Please refer to FIG. 2 again, in some implementation manners, step 063 , step 064 and step 071 may be implemented by the processor 20 . That is to say, the processor 20 can be used to process each frame of the scene image and the depth image to extract the action information of the current user, render the predetermined three-dimensional image according to the action information so that each frame of the predetermined three-dimensional image follows the current user's action, and convert each frame The rendered predetermined three-dimensional image is fused with the corresponding background area image to obtain a combined image to output a video image.

其中，动作信息包括当前用户的表情和肢体动作中的至少一种。也即是说，动作信息可以是当前用户的表情，或者是当前用户的肢体动作，还可以是当前用户的表情和肢体动作。Wherein, the action information includes at least one of the current user's facial expressions and body movements. That is to say, the action information may be the current user's expression, or the current user's body movement, or the current user's expression and body movement.

具体地，在步骤05中，处理器20已识别出每帧场景图像的人脸区域，并且已分割出人物区域和背景区域，因此，在执行步骤063时，处理器20通过处理每帧人脸区域以识别出当前用户的表情，并对每帧场景图像中的人物区域进行处理以得到当前用户肢体动作的信息。其中，当前用户肢体动作的信息可以通过模板匹配的方式来获得。处理器20将人物区域与多个人物模板进行匹配。首先匹配人物区域的头部；在头部匹配完成后，再对头部相匹配的剩余的多个人物模板进行下一肢体的匹配，即上半身躯干的匹配；在上半身躯干匹配完成后，再对头部及上半身躯干相匹配的剩余的多个人物模板进行下一肢体的匹配，即上肢体和下肢体的匹配，从而根据模板匹配的方法确定当前用户肢体动作的信息。随后，处理器20根据将识别出的当前用户的表情及肢体动作对预定三维图像进行渲染，使预定三维图像中的人物或动植物能够跟随模仿当前用户的表情和肢体动作。最后，处理器20将渲染后的预定三维图像与背景区域图像融合以得到合并图像。Specifically, in step 05, the processor 20 has identified the face area of each frame of the scene image, and has segmented the character area and the background area, therefore, when executing step 063, the processor 20 processes each frame of the face area to identify the current user's expression, and process the character area in each frame of the scene image to obtain information about the current user's body movements. Wherein, the information of the current user's body movement can be obtained through template matching. Processor 20 matches the character region to a plurality of character templates. First match the head of the character area; after the head matching is completed, the next limb matching is performed on the remaining multiple character templates that match the head, that is, the matching of the upper body torso; after the upper body torso matching is completed, the The rest of the plurality of character templates that match the head and upper body torso are matched to the next body, that is, the matching of the upper body and the lower body, so as to determine the information of the current user's body movement according to the method of template matching. Subsequently, the processor 20 renders the predetermined three-dimensional image according to the recognized facial expressions and body movements of the current user, so that people, animals and plants in the predetermined three-dimensional image can follow and imitate the current user's facial expressions and body movements. Finally, the processor 20 fuses the rendered predetermined three-dimensional image with the background region image to obtain a merged image.

由于每一帧场景图像中的当前用户所在的人物区域均可以由三维的人物或动植物替代，且每一帧预定三维图像中的人物或动植物可以跟随与该帧预定三维图像对应的场景图像中当前用户的动作，因此，多帧合并图像形成视频图像播放之后，视频图像会显示一个跟随模仿当前用户动作的三维的人物或动植物。如此，极大地提升了图像融合的趣味性，给用户带来更好的视觉体验。Since the person area where the current user is located in each frame of the scene image can be replaced by a three-dimensional person or animal or plant, and the person or animal or plant in each frame of the predetermined three-dimensional image can follow the scene image corresponding to the frame of the predetermined three-dimensional image Therefore, after multiple frames are merged to form a video image and played, the video image will display a three-dimensional character or animal or plant following the current user's action. In this way, the fun of image fusion is greatly improved, and a better visual experience is brought to users.

请参阅图11，在某些实施方式中，步骤07将每帧预定三维图像与对应的背景区域图像融合得到合并图像以输出视频图像包括：Please refer to FIG. 11. In some embodiments, step 07 fuses each frame of the predetermined three-dimensional image with the corresponding background area image to obtain a merged image to output a video image including:

072：比较每帧预定三维图像与对应的场景图像中的人物区域的尺寸的大小；072: Comparing the size of the predetermined three-dimensional image of each frame with the size of the character area in the corresponding scene image;

073：在该帧预定三维图像的尺寸大于人物区域的尺寸时，缩小该帧预定三维图像并填充进场景图像中的人物区域以融合得到合并图像；和073: When the size of the frame of the predetermined three-dimensional image is larger than the size of the character area, reduce the frame of the predetermined three-dimensional image and fill it into the character area in the scene image for fusion to obtain a merged image; and

074：在该帧预定三维图像的尺寸小于人物区域的尺寸时，放大该帧预定三维图像并填充进场景图像中的人物区域以融合得到合并图像，或将该帧预定三维图像填充进场景图像中的人物区域，并利用人物区域相邻近的像素填充该帧预定三维图像与人物区域之间的空隙以得到合并图像，以及处理多帧合并图像以输出视频图像。074: When the frame size of the predetermined 3D image is smaller than the size of the character area, enlarge the frame of the predetermined 3D image and fill it into the character area in the scene image to fuse to obtain a merged image, or fill the frame of the predetermined 3D image into the scene image character area, and use adjacent pixels of the character area to fill the gap between the frame predetermined three-dimensional image and the character area to obtain a merged image, and process multiple frames of the merged image to output a video image.

请再参阅图2，在某些实施方式中，步骤072、步骤073、步骤074和步骤075均可以由处理器20实现。也即是说，处理器20还可用于比较每帧预定三维图像与对应的场景图像中的人物区域的尺寸的大小，在该帧预定三维图像的尺寸大于人物区域的尺寸时，缩小该帧预定三维图像并填充进场景图像中的人物区域以融合得到合并图像，在该帧预定三维图像的尺寸小于人物区域的尺寸时，放大该帧预定三维图像并填充进场景图像中的人物区域以融合得到合并图像，或将该帧预定三维图像填充进场景图像中的人物区域，并利用人物区域相邻近的像素填充该帧预定三维图像与人物区域之间的空隙以得到合并图像，以及处理多帧合并图像以输出视频图像。Please refer to FIG. 2 again, in some implementation manners, step 072 , step 073 , step 074 and step 075 can all be implemented by the processor 20 . That is to say, the processor 20 can also be used to compare the size of the character area in each frame of the predetermined three-dimensional image and the corresponding scene image, and when the size of the predetermined three-dimensional image of the frame is larger than the size of the character area, reduce the size of the frame predetermined three-dimensional image. The three-dimensional image is filled into the character area in the scene image to fuse to obtain a merged image. When the size of the predetermined three-dimensional image of the frame is smaller than the size of the character area, the predetermined three-dimensional image of the frame is enlarged and filled into the character area in the scene image to obtain a merged image. Merge images, or fill the frame of predetermined 3D images into the character area in the scene image, and use adjacent pixels of the character area to fill the gap between the frame of predetermined 3D image and the character area to obtain a merged image, and process multiple frames Merge images to output video image.

具体地，由于当前用户与可见光摄像头11之间的采集距离不是固定的，因此，每帧场景图像中人物区域的尺寸大小也不是固定的。如此，在将背景区域图像与预定三维图像融合之前，首先需要比较预定三维图像和与之融合的背景区域图像对应的人物区域的尺寸大小。其中，尺寸包括人物区域和预定三维图像的高度和宽度。在预定三维图像的宽度和高度均比人物区域的尺寸大时，可根据人物区域的尺寸确定合适的高度和宽度的缩小值，并根据缩小值对预定三维图像进行缩小，从而使预定三维图像填充进场景图像中人物区域所在的部分。在预定三维图像的高度和宽度均小于人物区域的高度和宽度时，可根据人物区域的尺寸确定合适的高度和宽度的放大值，并依据放大值对预定三维图像进行放大以填充进场景图像中人物区域所在的部分；或者，直接将预定三维图像以原始尺寸填充进场景图像中人物区域所在的部分，再利用人物区域周围的像素对预定三维图像与人物区域之间的空隙进行填补。在预定三维图像的宽度大于人物区域的宽度，且预定三维图像的高度小于人物区域的高度时，可根据人物区域的高度适当缩小预定三维图像的宽度，并根据人物区域的高度适当放大预定三维图像的高度，并将调整尺寸后的预定三维图像填充进场景图像中人物区域所在的部分。在预定三维图像的高度大于人物区域的高度，且预定三维图像的宽度小于人物区域的宽度时，可根据人物区域的高度适当缩小预定三维图像的高度，并根据人物区域的宽度适当放大预定三维图像的宽度，并将调整尺寸后的预定三维图像填充进场景图像中人物区域所在的部分。Specifically, since the collection distance between the current user and the visible light camera 11 is not fixed, the size of the character area in each frame of the scene image is also not fixed. In this way, before the background region image is fused with the predetermined 3D image, it is first necessary to compare the size of the character region corresponding to the predetermined 3D image and the fused background region image. Wherein, the size includes the character area and the height and width of the predetermined three-dimensional image. When the width and height of the predetermined three-dimensional image are larger than the size of the character area, an appropriate reduction value of the height and width can be determined according to the size of the character area, and the predetermined three-dimensional image is reduced according to the reduction value, so that the predetermined three-dimensional image fills the Into the part of the scene image where the character area is located. When the height and width of the predetermined three-dimensional image are smaller than the height and width of the character area, an appropriate height and width magnification value can be determined according to the size of the character area, and the predetermined three-dimensional image can be enlarged according to the magnification value to fill in the scene image The part where the character area is located; or, directly fill the predetermined 3D image into the part where the character area is located in the scene image with the original size, and then use the pixels around the character area to fill the gap between the predetermined 3D image and the character area. When the width of the predetermined 3D image is greater than the width of the character area, and the height of the predetermined 3D image is smaller than the height of the character area, the width of the predetermined 3D image can be appropriately reduced according to the height of the character area, and the predetermined 3D image can be appropriately enlarged according to the height of the character area height, and fill the part where the character area is located in the scene image with the predetermined three-dimensional image after resizing. When the height of the predetermined 3D image is greater than the height of the character area, and the width of the predetermined 3D image is smaller than the width of the character area, the height of the predetermined 3D image can be appropriately reduced according to the height of the character area, and the predetermined 3D image can be appropriately enlarged according to the width of the character area , and fill the part of the scene image where the character area is located with the predetermined three-dimensional image after resizing.

在某些实施方式中，预定三维图像可以由处理器20随机选取，或者由当前用户自行选定。In some implementations, the predetermined three-dimensional image may be randomly selected by the processor 20, or selected by the current user.

处理器20得到多帧合并图像后，多帧合并图像顺序排列并存储，多帧合并图像可由处理器20存储为视频格式形成视频图像，当视频图像以一定帧率在电子装置1000的显示器50(图13所示)上显示时，用户即可观看到流畅的视频画面。After the processor 20 obtains the multi-frame merged image, the multi-frame merged image is sequentially arranged and stored, and the multi-frame merged image can be stored as a video format by the processor 20 to form a video image. When the video image is displayed on the display 50 ( As shown in Fig. 13 ), the user can watch a smooth video picture.

请参阅图12，在某些实施方式中，本发明实施方式的图像处理方法还包括：Please refer to Fig. 12, in some embodiments, the image processing method of the embodiment of the present invention also includes:

081：采集当前用户的声音信息；和081: collect the voice information of the current user; and

082：将视频图像与声音信息融合以输出有声视频。082: Fuse video images with sound information to output video with sound.

请再参阅图3，在某些实施方式中，图像处理装置100还包括声电元件70。步骤081可以由声电元件70实现，步骤082可以由处理器20实现。也即是说，声电元件70可用于采集当前用户的声音信息，处理器20可用于将视频图像与声音信息融合以输出有声视频。Please refer to FIG. 3 again. In some implementations, the image processing device 100 further includes an acoustoelectric element 70 . Step 081 may be implemented by the acoustoelectric element 70 , and Step 082 may be implemented by the processor 20 . That is to say, the acoustoelectric element 70 can be used to collect the sound information of the current user, and the processor 20 can be used to fuse the video image and the sound information to output video with sound.

具体地，在成像设备开启以采集三维的场景图像和深度图像时，声电元件70也同时开启以采集当前用户的声音信息。如此，声电元件70采集的声音信息与多帧合并图像形成的视频图像保持同步。随后，处理器20将声音信息和视频图像融合以输出成为有声视频。有声视频在电子装置1000的显示器50(图13所示)上播放时，有声视频中的画面和声音可保持同步播放。Specifically, when the imaging device is turned on to collect three-dimensional scene images and depth images, the acoustoelectric element 70 is also turned on simultaneously to collect voice information of the current user. In this way, the sound information collected by the acoustoelectric element 70 is synchronized with the video image formed by combining multiple frames of images. Subsequently, the processor 20 fuses the sound information and the video image to output a video with sound. When the video with audio is played on the display 50 (shown in FIG. 13 ) of the electronic device 1000 , the picture and sound in the video with audio can be played synchronously.

请一并参阅图2和图13，本发明实施方式还提出一种电子装置1000。电子装置1000包括图像处理装置100。图像处理装置100可以利用硬件和/或软件实现。图像处理装置100包括成像设备10和处理器20。Please refer to FIG. 2 and FIG. 13 together. Embodiments of the present invention also provide an electronic device 1000 . The electronic device 1000 includes an image processing device 100 . The image processing device 100 can be realized by hardware and/or software. The image processing apparatus 100 includes an imaging device 10 and a processor 20 .

成像设备10包括可见光摄像头11和深度图像采集组件12。The imaging device 10 includes a visible light camera 11 and a depth image acquisition component 12 .

具体地，可见光摄像头11包括图像传感器111和透镜112，可见光摄像头11可用于捕捉当前用户的彩色信息以获得多帧二维图像，其中，图像传感器111包括彩色滤镜阵列(如Bayer滤镜阵列)，透镜112的个数可为一个或多个。可见光摄像头11在获取每帧二维图像过程中，图像传感器111中的每一个成像像素感应来自拍摄场景中的光强度和波长信息，生成一组原始图像数据；图像传感器111将该组原始图像数据发送至处理器20中，处理器20对原始图像数据进行去噪、插值等运算后即得到彩色的二维图像。处理器20可按多种格式对原始图像数据中的每个图像像素逐一处理，例如，每个图像像素可具有8、10、12或14比特的位深度，处理器20可按相同或不同的位深度对每一个图像像素进行处理。Specifically, the visible light camera 11 includes an image sensor 111 and a lens 112. The visible light camera 11 can be used to capture the color information of the current user to obtain multiple frames of two-dimensional images, wherein the image sensor 111 includes a color filter array (such as a Bayer filter array) , the number of lenses 112 may be one or more. When the visible light camera 11 acquires each frame of two-dimensional images, each imaging pixel in the image sensor 111 senses light intensity and wavelength information from the shooting scene to generate a set of original image data; the image sensor 111 takes the set of original image data After sending to the processor 20, the processor 20 performs operations such as denoising and interpolation on the original image data to obtain a color two-dimensional image. The processor 20 can process each image pixel in the raw image data one by one in a variety of formats, for example, each image pixel can have a bit depth of 8, 10, 12 or 14 bits, and the processor 20 can use the same or different Bit depth is processed for each image pixel.

深度图像采集组件12包括结构光投射器121和结构光摄像头122，深度图像采集组件12可用于捕捉当前用户的深度信息以得到深度图像。结构光投射器121用于将结构光投射至当前用户，其中，结构光图案可以是激光条纹、格雷码、正弦条纹或者随机排列的散斑图案等。结构光摄像头122包括图像传感器1221和透镜1222，透镜1222的个数可为一个或多个。图像传感器1221用于捕捉结构光投射器121投射至当前用户上的多帧结构光图像。每帧结构光图像均可由深度采集组件12发送至处理器20进行解调、相位恢复、相位信息计算等处理以获取当前用户的深度信息。The depth image acquisition component 12 includes a structured light projector 121 and a structured light camera 122 , and the depth image acquisition component 12 can be used to capture depth information of a current user to obtain a depth image. The structured light projector 121 is used to project the structured light to the current user, wherein the structured light pattern may be laser stripes, gray codes, sinusoidal stripes or randomly arranged speckle patterns and the like. The structured light camera 122 includes an image sensor 1221 and a lens 1222, and the number of the lens 1222 may be one or more. The image sensor 1221 is used to capture multiple frames of structured light images projected onto the current user by the structured light projector 121 . Each frame of structured light image can be sent by the depth acquisition component 12 to the processor 20 for processing such as demodulation, phase recovery, and phase information calculation to obtain the depth information of the current user.

在某些实施方式中，可见光摄像头11与结构光摄像头122的功能可由一个摄像头实现，也即是说，成像设备10仅包括一个摄像头和一个结构光投射器121，上述摄像头不仅可以拍摄二维图像，还可拍摄结构光图像。In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be realized by one camera, that is to say, the imaging device 10 only includes one camera and one structured light projector 121, and the above-mentioned camera can not only capture two-dimensional images , can also take structured light images.

除了采用结构光获取深度图像外，还可通过双目视觉方法、基于飞行时间差(Timeof Flight，TOF)等深度像获取方法来获取当前用户的深度图像。In addition to using structured light to obtain depth images, depth images of the current user can also be obtained through binocular vision methods and depth image acquisition methods based on Time of Flight (TOF).

此外，图像处理装置100还包括存储器30。存储器30可内嵌在电子装置1000中，也可以是独立于电子装置1000外的存储器，并可包括直接存储器存取(Direct MemoryAccess，DMA)特征。可见光摄像头11采集的原始图像数据或深度图像采集组件12采集的结构光图像相关数据均可传送至存储器30中进行存储或缓存。处理器20可从存储器30中读取原始图像数据以进行处理得到二维图像，也可从存储器30中读取结构光图像相关数据以进行处理得到深度图像，还可从存储器30中读取原始图像数据和结构光图像相关数据进行处理以得到三维的彩色的场景图像。另外，二维图像、场景图像和深度图像还可存储在存储器30中，以供处理器20随时调用处理，例如，处理器20调用多帧场景图像和多帧深度图像进行背景区域提取，并将提取后的得到的多帧背景区域图像与对应的预定三维图像进行融合处理以得到多帧合并图像，多帧合并图像顺序排列或存储形成视频图像。其中，预定三维图像、合并图像、视频图像也可存储在存储器30中。In addition, the image processing device 100 further includes a memory 30 . The memory 30 may be embedded in the electronic device 1000, or may be a memory independent of the electronic device 1000, and may include a direct memory access (Direct Memory Access, DMA) feature. The original image data collected by the visible light camera 11 or the data related to the structured light image collected by the depth image collection component 12 can be transmitted to the memory 30 for storage or buffering. The processor 20 can read the original image data from the memory 30 for processing to obtain a two-dimensional image, and can also read the data related to the structured light image from the memory 30 for processing to obtain a depth image, and can also read the original image data from the memory 30. The image data and the data related to the structured light image are processed to obtain a three-dimensional color scene image. In addition, the two-dimensional image, the scene image and the depth image can also be stored in the memory 30, so that the processor 20 can call and process at any time. For example, the processor 20 calls multiple frames of scene images and multiple frames of depth images to extract the background area, and The extracted multi-frame background region images are fused with corresponding predetermined three-dimensional images to obtain multi-frame merged images, and the multi-frame merged images are sequentially arranged or stored to form a video image. Wherein, the predetermined three-dimensional image, merged image, and video image may also be stored in the memory 30 .

图像处理装置100还可包括显示器50。显示器50可直接从处理器20中获取视频图像，还可从存储器30中获取视频图像。显示器50显示视频图像以供用户观看，或者由图形引擎或图形处理器(Graphics Processing Unit，GPU)进行进一步的处理。图像处理装置100还包括编码器/解码器60，编码器/解码器60可编解码二维图像、场景图像、深度图像、合并图像、视频图像等的图像数据，编码的图像数据可被保存在存储器30中，并可以在图像显示在显示器50上之前由解码器解压缩以进行显示。编码器/解码器60可由中央处理器(Central Processing Unit，CPU)、GPU或协处理器实现。换言之，编码器/解码器60可以是中央处理器(Central Processing Unit，CPU)、GPU、及协处理器中的任意一种或多种。The image processing device 100 may further include a display 50 . The display 50 can directly obtain video images from the processor 20 , and can also obtain video images from the memory 30 . The display 50 displays video images for users to watch, or is further processed by a graphics engine or a graphics processing unit (Graphics Processing Unit, GPU). The image processing device 100 also includes an encoder/decoder 60, which can encode and decode image data of two-dimensional images, scene images, depth images, merged images, video images, etc., and the encoded image data can be stored in memory 30 and may be decompressed by the decoder for display before the image is displayed on the display 50. The encoder/decoder 60 may be implemented by a central processing unit (Central Processing Unit, CPU), a GPU or a coprocessor. In other words, the encoder/decoder 60 may be any one or more of a central processing unit (Central Processing Unit, CPU), a GPU, and a coprocessor.

图像处理装置100还包括控制逻辑器40。成像设备10在成像时，处理器20会根据成像设备获取的数据进行分析以确定成像设备10的一个或多个控制参数(例如，曝光时间等)的图像统计信息。处理器20将图像统计信息发送至控制逻辑器40，控制逻辑器40控制成像设备10以确定好的控制参数进行成像。控制逻辑器40可包括执行一个或多个例程(如固件)的处理器和/或微控制器。一个或多个例程可根据接收的图像统计信息确定成像设备10的控制参数。The image processing device 100 also includes a control logic 40 . When the imaging device 10 is imaging, the processor 20 analyzes the data acquired by the imaging device to determine image statistical information of one or more control parameters (eg, exposure time, etc.) of the imaging device 10 . The processor 20 sends the image statistical information to the control logic 40, and the control logic 40 controls the imaging device 10 to perform imaging with the determined control parameters. Control logic 40 may include a processor and/or a microcontroller executing one or more routines (eg, firmware). One or more routines may determine control parameters of imaging device 10 based on received image statistics.

图像处理装置100还包括声电元件70。声电元件70利用电磁感应原理将声音转化为电流输出。当前用户发声时会带动声电元件70内部的空气振动，使得声电元件70内部的线圈与磁芯之间出现微小位移，从而切割磁感线产生电流。声电元件70将电流传送至处理器20，处理器20处理电流以生成声音信息。声音信息可送至存储器30进行存储。当处理器20将视频图像与声音信息融合得到有声视频时，处理器20可将有声视频发送至显示器50及电声元件(图未示)中，显示器50显示有声视频中的视频画面，电声元件同步播放声音信息。The image processing device 100 also includes an acoustoelectric element 70 . The acoustic electric element 70 uses the principle of electromagnetic induction to convert sound into electric current output. When the current user makes a sound, the air inside the acoustic-electric element 70 will be driven to vibrate, causing a slight displacement between the coil and the magnetic core inside the acoustic-electric element 70 , thereby cutting the magnetic induction line to generate current. The acoustoelectric element 70 delivers electrical current to the processor 20, which processes the electrical current to generate audio information. The voice information can be sent to the memory 30 for storage. When the processor 20 fuses the video image and sound information to obtain a video with sound, the processor 20 can send the video with sound to the display 50 and the electroacoustic element (not shown), and the display 50 displays the video picture in the video with sound, and the electroacoustic The components play sound information synchronously.

请参阅图14，本发明实施方式的电子装置1000包括一个或多个处理器20、存储器30和一个或多个程序31。其中一个或多个程序31被存储在存储器30中，并且被配置成由一个或多个处理器20执行。程序31包括用于执行上述任意一项实施方式的图像处理方法的指令。Referring to FIG. 14 , an electronic device 1000 according to an embodiment of the present invention includes one or more processors 20 , a memory 30 and one or more programs 31 . One or more programs 31 are stored in memory 30 and configured to be executed by one or more processors 20 . The program 31 includes instructions for executing the image processing method of any one of the above-mentioned embodiments.

例如，程序31包括用于执行以下步骤所述的图像处理方法的指令：For example, the program 31 includes instructions for performing the image processing method described in the following steps:

05：处理多帧场景图像和多帧深度图像以分割每帧场景图像中的人物区域及除却人物区域以外的背景区域以获得多帧背景区域图像，多帧所述背景区域图像对应多帧预定三维图像；和05: Process multiple frames of scene images and multiple frames of depth images to segment the character area in each frame of the scene image and the background area except the character area to obtain multiple frames of background area images, and the multiple frames of the background area images correspond to multiple frames of predetermined three-dimensional images; and

再例如，程序31还包括用于执行以下步骤所述的图像处理方法的指令：For another example, the program 31 also includes instructions for performing the image processing method described in the following steps:

本发明实施方式的计算机可读存储介质包括与能够摄像的电子装置1000结合使用的计算机程序。计算机程序可被处理器20执行以完成上述任意一项实施方式的图像处理方法。A computer-readable storage medium according to an embodiment of the present invention includes a computer program used in conjunction with the electronic device 1000 capable of imaging. The computer program can be executed by the processor 20 to complete the image processing method in any one of the above-mentioned embodiments.

例如，计算机程序可被处理器20执行以完成以下步骤所述的图像处理方法：For example, the computer program can be executed by the processor 20 to complete the image processing method described in the following steps:

再例如，计算机程序还可被处理器20执行以完成以下步骤所述的图像处理方法：For another example, the computer program can also be executed by the processor 20 to complete the image processing method described in the following steps:

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment used. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and stored in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.

此外，在本发明各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. a kind of image processing method, for electronic installation, it is characterised in that described image processing method includes：

With the three-dimensional scene image and depth image of predeterminated frequency collection multiframe active user；

Depth image described in scene image described in multiframe and multiframe is handled to split people's object area in scene image described in every frame And except the background area beyond people's object area to obtain multiframe background area image, background area image pair described in multiframe Answer multiframe predetermined three-dimensional image；With

It will merge to obtain multiframe merging image to export with the corresponding background area image per predetermined three-dimensional image described in frame Video image.

2. image processing method according to claim 1, it is characterised in that described image processing method also includes：

Gather the acoustic information of the active user；With

The video image is merged to export sound video with the acoustic information.

3. image processing method according to claim 1, it is characterised in that described to be worked as with predeterminated frequency collection multiframe The step of three-dimensional scene image and depth image of preceding user, includes：

The active user is shot with the predeterminated frequency to obtain multiframe two dimensional image；

To active user's projective structure light；

The structure light image modulated with predeterminated frequency shooting multiframe through the active user；With

Phase information corresponding to each pixel of structure light image described in per frame is demodulated to obtain depth image described in multiframe；With

Depth image described in two dimensional image described in multiframe and multiframe is handled to obtain the three-dimensional scene image of multiframe.

4. image processing method according to claim 3, it is characterised in that demodulation structure light image described in per frame The step of phase information corresponding to each pixel is to obtain depth image described in multiframe includes：

Phase information corresponding to each pixel in demodulation structure light image described in per frame；

The phase information is converted into depth information；With

The depth image is generated according to the depth information.

5. image processing method according to claim 1, it is characterised in that the predetermined three-dimensional image includes three-dimensional void At least one of anthropomorphic thing, three-dimensional real person, three-dimensional animals and plants, the three-dimensional real person eliminate described work as Preceding user itself.

6. image processing method according to claim 1, it is characterised in that described image processing method also includes：

Scene image described in per frame and the depth image are handled to extract the action message of the active user；With

The predetermined three-dimensional image is rendered according to the action message so as to be followed per predetermined three-dimensional image described in frame described current The action of user；

It is described to merge to obtain merging image to export with the corresponding background area image per predetermined three-dimensional image described in frame The step of video image, includes：

Predetermined three-dimensional image described in every frame after rendering merges to obtain conjunction described in multiframe with the corresponding background area image And image is with output video image.

7. image processing method according to claim 6, it is characterised in that the action message includes the active user Expression and/or limb action.

8. image processing method according to claim 1, it is characterised in that predetermined three-dimensional image corresponds to multiframe described in multiframe The scene image, it is described to merge to obtain merging figure with the corresponding background area image per predetermined three-dimensional image described in frame As being included with the step of output video image：

Compare size of the predetermined three-dimensional image described in every frame with the size of people's object area in the corresponding scene image；

When the size of predetermined three-dimensional image described in the frame is more than the size of people's object area, predetermined three-dimensional described in the frame is reduced Image is simultaneously filled into people's object area in the scene image to merge to obtain merging image；

When the size of predetermined three-dimensional image described in the frame is less than the size of people's object area, amplify predetermined three-dimensional described in the frame Image is simultaneously filled into people's object area in the scene image to merge to obtain the merging image, or will be pre- described in the frame Determine people's object area that 3-D view is filled into the scene image, and filled out using the adjoining pixel of people's object area The space filled between predetermined three-dimensional image described in the frame and people's object area is to obtain described merging image；With

Merge image described in processing multiframe to export the video image.

9. a kind of image processing apparatus, for electronic installation, it is characterised in that described image processing unit includes：

Imaging device, the imaging device are used for the three-dimensional scene image and depth that multiframe active user is gathered with predeterminated frequency Image；

Processor, the processor are used to handle depth image described in scene image described in multiframe and multiframe to split described in every frame People's object area in scene image and except the background area beyond people's object area to obtain multiframe background area image, it is more Background area image described in frame corresponds to multiframe predetermined three-dimensional image；With

It will merge to obtain with the corresponding background area image per predetermined three-dimensional image described in frame and merge image to export video figure Picture.

10. image processing apparatus according to claim 9, it is characterised in that described image processing unit also includes acoustic-electric Element, the acoustoelectric element are used for the acoustic information for gathering the active user；

The processor is additionally operable to the acoustic information merge the video image to export sound video.

11. image processing apparatus according to claim 9, it is characterised in that the imaging device includes visible image capturing Head and depth image acquisition component, the depth image acquisition component includes structured light projector and structure light video camera head, described Visible image capturing head is used to shoot the active user with the predeterminated frequency to obtain multiframe two dimensional image；

The structured light projector is used for active user's projective structure light；

The structure light video camera head is used for：

The structure light image modulated with predeterminated frequency shooting multiframe through the active user；

12. image processing apparatus according to claim 11, it is characterised in that the structure light video camera head is further used In：

The phase information is converted into depth information；With

The depth image is generated according to the depth information.

13. image processing apparatus according to claim 9, it is characterised in that the predetermined three-dimensional image includes three-dimensional At least one of virtual portrait, three-dimensional real person, three-dimensional animals and plants, the three-dimensional real person eliminates described Active user itself.

14. image processing apparatus according to claim 9, it is characterised in that the processor is additionally operable to：

Scene image described in per frame and the depth image are handled to extract the action message of the active user；

15. image processing apparatus according to claim 14, it is characterised in that the action message includes the current use The expression and/or limb action at family.

16. image processing apparatus according to claim 9, it is characterised in that predetermined three-dimensional image described in multiframe is corresponding more Scene image described in frame, the processor are additionally operable to：

When the size of predetermined three-dimensional image described in the frame is more than the size of people's object area, predetermined three-dimensional described in the frame is reduced Image is simultaneously filled into people's object area in the scene image to merge to obtain merging image；With

Merge image described in processing multiframe to export the video image.

17. a kind of electronic installation, it is characterised in that the electronic installation includes：

One or more processors；

Memory；With

One or more programs, wherein one or more of programs are stored in the memory, and be configured to by One or more of computing devices, described program include being used at the image that perform claim is required described in 1 to 8 any one The instruction of reason method.

A kind of 18. computer-readable recording medium, it is characterised in that the meter being used in combination including the electronic installation with that can image Calculation machine program, the computer program can be executed by processor to complete the image procossing described in claim 1 to 8 any one Method.