KR102806920B1

KR102806920B1 - Creating method of 3d volumetric video and recording medium thereof

Info

Publication number: KR102806920B1
Application number: KR1020220154207A
Authority: KR
Inventors: 임완택
Original assignee: 임완택
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2025-05-14
Anticipated expiration: 2042-11-17
Also published as: KR20240072528A

Abstract

본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법은 3D 자세 추정 방법을 이용하여 객체에 대한 3D 뼈대 정보들을 추출하는 단계; 리깅 방법을 이용하여 상기 3D 뼈대 정보들을 상기 객체에 적용하여 3D 동영상 모델을 생성하는 단계; 그리고 상기 3D 동영상 모델을 이용하여 애니메이팅된 3D 볼류메트릭 비디오를 생성하는 단계를 포함하고, 상기 3D 뼈대 정보들은 카메라 시스템을 이용하여 캡쳐된 상기 객체에 대한 3D 포인트 클라우드를 이용하여 추출하고, 상기 리깅 방법은 추출된 상기 3D 뼈대 정보들 사이의 시간적인 움직임을 추정하여 움직임 벡터를 생성하는 단계, 그리고 상기 움직임 벡터를 분석하여 상기 3D 동영상 모델을 생성하는 단계를 포함한다.According to one embodiment of the present invention, a method for generating a 3D volumetric video includes the steps of: extracting 3D skeleton information for an object using a 3D pose estimation method; generating a 3D video model by applying the 3D skeleton information to the object using a rigging method; and generating an animated 3D volumetric video using the 3D video model, wherein the 3D skeleton information is extracted using a 3D point cloud for the object captured using a camera system, and the rigging method includes the steps of estimating a temporal motion between the extracted 3D skeleton information to generate a motion vector, and analyzing the motion vector to generate the 3D video model.

Description

{CREATING METHOD OF 3D VOLUMETRIC VIDEO AND RECORDING MEDIUM THEREOF}

본 발명은 3D 볼류메트릭 비디오의 생성 방법 및 이의 기록 매체에 관한 것으로, 보다 상세하게는 복수의 카메라를 이용하여 사람에 대한 360도 3D 입체 영상을 생성하는 3D 볼류메트릭 비디오의 생성 방법 및 이의 기록 매체에 관한 것이다. The present invention relates to a method for generating a 3D volumetric video and a recording medium thereof, and more particularly, to a method for generating a 3D volumetric video for generating a 360-degree 3D stereoscopic image of a person using a plurality of cameras and a recording medium thereof.

최근, 차세대 혼합 현실(Mixed Reality, MR) 시대가 도래하고 있다. MR 기술은 인간의 상상력이 극대화된 형태의 차세대 미디어 서비스를 제공할 수 있다. 온라인 시장조사기관에 따르면, 혼합현실 시장 규모는 급성장할 것으로 예측하고 있다. 혼합 현실(MR) 기술은 증강 현실(Augmented Reality, AR) 기술을 더욱 확대하고 가상 현실(Virtual Reality, VR) 기술의 한계를 극복함으로써 현실과의 인터랙션(interaction) 요소를 강화할 수 있다. 또한, 혼합 현실 기술은 교육, 엔터테인먼트, 비즈니스 컨설팅, 건축, 토목, 물류, 에너지와 환경 관리, 의료, 군사 등 다방면에서 활용될 수 있다.Recently, the next-generation mixed reality (MR) era is coming. MR technology can provide next-generation media services that maximize human imagination. According to an online market research agency, the mixed reality market size is expected to grow rapidly. Mixed reality (MR) technology can strengthen interaction elements with reality by further expanding augmented reality (AR) technology and overcoming the limitations of virtual reality (VR) technology. In addition, mixed reality technology can be utilized in various fields such as education, entertainment, business consulting, architecture, civil engineering, logistics, energy and environmental management, medicine, and the military.

혼합 현실이란 증강 현실과 가상 현실의 장점을 통합하고 사용자와의 인터랙션을 더욱 강화한 방식으로 정의할 수 있는데, 이를 위해 사람에 대해 실사 형태를 가지면서 360도의 전방위 관찰이 가능한 동적인 3D 모델 제작 기술이 가장 핵심적인 요소이다.Mixed reality can be defined as a way to integrate the advantages of augmented reality and virtual reality and further enhance interaction with users. For this, the most essential element is the technology to create dynamic 3D models that have a realistic appearance and allow 360-degree omnidirectional observation of people.

즉, 기존 실사 기반의 AR, VR, MR, 홀로그램용 3D 콘텐츠 서비스는 주어진 시점에서만 서비스가 가능하다는 한계를 가지고 있다. 따라서, 인터랙션이 가능하면서 360도 다시점 체험이 요구되는 MR 환경에서는, 원천적으로 현실의 실사 데이터를 전방위에서 3D 데이터로 서비스할 수 있는 시스템 및 제작 기술이 필요하다.In other words, existing real-world-based AR, VR, MR, and hologram 3D content services have the limitation that they can only be provided at a given point in time. Therefore, in an MR environment that requires interaction and a 360-degree multi-view experience, a system and production technology that can fundamentally provide real-world data as 3D data from all directions are required.

특히, 사람에 대해 실사형태를 전방위로 관찰 가능하도록 동적인 3차원 모델을 처리할 때, 정확한 뼈대 정보를 추출하고 처리하는 것이 중요하다.In particular, when processing dynamic 3D models to enable full-scale observation of a person's real-world form, it is important to extract and process accurate skeletal information.

본 발명의 사상이 이루고자 하는 기술적 과제는 고정밀의 3D 동영상 모델 및 3D 볼류메트릭 비디오를 생성할 수 있는 3D 볼류메트릭 비디오의 생성 방법 및 이의 기록 매체를 제공하는 것이다.The technical problem to be achieved by the present invention is to provide a method for generating a 3D volumetric video capable of generating a high-precision 3D moving image model and a 3D volumetric video, and a recording medium thereof.

상기 3D 볼류메트릭 비디오는 복수의 동작 템플릿을 포함할 수 있다.The above 3D volumetric video may include multiple motion templates.

상기 3D 자세 추정 방법은 상기 객체에 대한 3D 포인트 클라우드의 4개의 가상 투영 영상을 생성하는 단계; 자세 추출 프로그램을 이용하여 상기 4개의 가상 투영 영상의 2D 뼈대 정보를 추출하는 단계; 상기 4개의 2D 뼈대 정보의 2D 조인트 좌표를 가상 투영선으로 서로 연결하여 3D 교차 공간을 형성하는 단계; 그리고 상기 3D 교차 공간을 통과하는 가상 투영선의 2D 조인트 좌표들을 통합하여 보정 3D 조인트 좌표를 생성하여 상기 3D 뼈대 정보를 추출하는 단계를 포함하고, 상기 보정 3D 조인트 좌표를 생성하는 단계에서, 상기 3D 교차 공간을 벗어나는 오차 가상 투영선의 2D 조인트 좌표는 포함하지 않을 수 있다.The above 3D pose estimation method includes the steps of: generating four virtual projection images of a 3D point cloud for the object; extracting 2D skeleton information of the four virtual projection images using a pose extraction program; connecting 2D joint coordinates of the four 2D skeleton information with virtual projection lines to form a 3D intersection space; and integrating 2D joint coordinates of virtual projection lines passing through the 3D intersection space to generate corrected 3D joint coordinates to extract the 3D skeleton information, wherein in the step of generating the corrected 3D joint coordinates, 2D joint coordinates of error virtual projection lines that deviate from the 3D intersection space may not be included.

상기 가상 투영 영상을 생성하는 단계는 상기 3D 포인트 클라우드의 정면을 결정하는 단계, 상기 3D 포인트 클라우드의 정면을 기준으로 상기 3D 포인트 클라우드를 둘러싸는 축 정렬 경계 상자(AABB)를 설정하는 단계, 상기 축 정렬 경계 상자의 4개의 가상 투영 평면에 상기 3D 포인트 클라우드를 투영하는 단계를 포함할 수 있다.The step of generating the above virtual projection image may include the step of determining a front side of the 3D point cloud, the step of setting an axis-aligned bounding box (AABB) surrounding the 3D point cloud based on the front side of the 3D point cloud, and the step of projecting the 3D point cloud onto four virtual projection planes of the axis-aligned bounding box.

추출된 상기 2D 뼈대 정보를 보정하는 단계를 더 포함하고, 상기 2D 뼈대 정보를 보정하는 단계는 상기 3D 포인트 클라우드의 외부에 위치하는 오차 조인트 좌표를 확인하는 단계, 오차 조인트 좌표에 인접한 포인트 클라우드 세트를 클러스터링하는 단계, 그리고 클러스터링된 포인트 클라우드 세트의 각 클러스터의 중심을 기준으로 상기 오차 조인트 좌표를 보정하여 상기 3D 포인트 클라우드의 내부에 조인트 좌표를 위치시키는 단계를 포함할 수 있다.The method may further include a step of correcting the extracted 2D skeleton information, wherein the step of correcting the 2D skeleton information may include a step of identifying an error joint coordinate located outside the 3D point cloud, a step of clustering a set of point clouds adjacent to the error joint coordinate, and a step of correcting the error joint coordinate based on the center of each cluster of the set of clustered point clouds to position the joint coordinate inside the 3D point cloud.

또한, 본 발명은 전술한 3D 볼류메트릭 비디오의 생성 방법을 수행하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체에 관한 것이다.In addition, the present invention relates to a computer-readable recording medium having recorded thereon a program for performing the method for generating a 3D volumetric video as described above.

본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법 및 이의 기록 매체는 3D 자세 추정 방법을 이용하여 객체에 대한 정확한 3D 뼈대 정보를 추출하고, 리깅 방법을 이용하여 추출된 3D 뼈대 정보들 사이의 시간적인 움직임을 추정함으로써, 고정밀의 3D 동영상 모델을 생성할 수 있다.A method for generating a 3D volumetric video and a recording medium thereof according to one embodiment of the present invention can generate a high-precision 3D moving image model by extracting accurate 3D skeletal information for an object using a 3D pose estimation method and estimating temporal movement between the extracted 3D skeletal information using a rigging method.

도 1은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 순서도이다.
도 2는 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법의 순서도이다.
도 3은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법의 가상 투영 영상을 생성하는 단계를 설명하는 도면이다.
도 4는 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법에서 추출된 2D 뼈대 정보와 3D 교차 공간을 형성하는 단계를 설명하는 도면이다.
도 5는 3D 볼류메트릭 비디오의 생성 방법에서, 객체에 대한 3D 포인트 클라우드의 외부에 위치하는 오차 조인트 좌표와 객체에 대한 3D 포인트 클라우드의 내부에 위치하는 보정 조인트 좌표를 설명하는 도면이다.
도 6은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법에서 오차 평면을 설정하는 방법을 설명하는 도면이다.
도 7은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법에서 클러스터링 전의 포인트 클라우드 세트와 클러스터링 후의 포인트 크라우드 세트를 도시한 도면이다.
도 8은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법의 보정 3D 조인트 좌표를 생성하여 3D 뼈대 정보를 추출하는 단계를 설명하는 도면이다.
도 9는 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 리깅 방법에 의해 3D 동영상 모델을 생성한 상태를 도시한 도면이다.
도 10은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법을 이용하여 생성된 3D 볼류메트릭 비디오를 나타낸 도면이다.FIG. 1 is a flowchart of a method for generating 3D volumetric video according to one embodiment of the present invention.
FIG. 2 is a flowchart of a 3D pose estimation method of a method for generating a 3D volumetric video according to one embodiment of the present invention.
FIG. 3 is a drawing illustrating a step of generating a virtual projection image of a 3D pose estimation method of a method for generating a 3D volumetric video according to one embodiment of the present invention.
FIG. 4 is a drawing explaining a step of forming a 3D intersection space with 2D skeleton information extracted from a 3D pose estimation method of a method for generating a 3D volumetric video according to one embodiment of the present invention.
FIG. 5 is a diagram illustrating error joint coordinates located outside a 3D point cloud for an object and correction joint coordinates located inside a 3D point cloud for an object, in a method for generating a 3D volumetric video.
FIG. 6 is a drawing explaining a method for setting an error plane in a 3D pose estimation method of a method for generating a 3D volumetric video according to one embodiment of the present invention.
FIG. 7 is a diagram illustrating a point cloud set before clustering and a point cloud set after clustering in a 3D pose estimation method of a method for generating a 3D volumetric video according to one embodiment of the present invention.
FIG. 8 is a drawing illustrating a step of extracting 3D skeleton information by generating corrected 3D joint coordinates in a 3D pose estimation method of a method for generating a 3D volumetric video according to one embodiment of the present invention.
FIG. 9 is a drawing illustrating a state in which a 3D video model is created by a rigging method of a method for creating a 3D volumetric video according to one embodiment of the present invention.
FIG. 10 is a drawing showing a 3D volumetric video generated using a method for generating a 3D volumetric video according to one embodiment of the present invention.

이하, 첨부한 도면을 참고로 하여 본 발명의 실시예들에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예들에 한정되지 않는다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings so that those skilled in the art can easily implement the present invention. The present invention may be implemented in various different forms and is not limited to the embodiments described herein.

이하 도면을 참조하여 본 발명의 실시예에 대해서 구체적으로 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 순서도이고, 도 2는 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법의 순서도이고, 도 3은 2는 본 발명의 일 실시예에 따른 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법의 가상 투영 영상을 생성하는 단계를 설명하는 도면이며, 도 4는 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법에서 2D 뼈대 정보와 3D 교차 공간을 형성하는 단계를 설명하는 도면이다. FIG. 1 is a flowchart of a method for generating a 3D volumetric video according to an embodiment of the present invention, FIG. 2 is a flowchart of a 3D pose estimation method of the method for generating a 3D volumetric video according to an embodiment of the present invention, FIG. 3 is a drawing explaining a step of generating a virtual projection image of a 3D pose estimation method of the method for generating a 3D volumetric video according to an embodiment of the present invention, and FIG. 4 is a drawing explaining a step of forming 2D skeleton information and a 3D intersection space in a 3D pose estimation method of the method for generating a 3D volumetric video according to an embodiment of the present invention.

도 1 내지 도 4에 도시한 바와 같이, 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법은 우선, 3D 자세 추정 방법을 이용하여 객체에 대한 3D 뼈대 정보들을 추출한다(S10).As illustrated in FIGS. 1 to 4, a method for generating a 3D volumetric video according to one embodiment of the present invention first extracts 3D skeletal information for an object using a 3D pose estimation method (S10).

3D 뼈대 정보들은 카메라 시스템을 이용하여 캡쳐된 객체에 대한 3D 포인트 클라우드를 이용하여 추출할 수 있다. 3D skeletal information can be extracted using a 3D point cloud of an object captured using a camera system.

이에 대해 이하에서 도면을 참고로 상세히 설명한다.This is explained in detail below with reference to the drawings.

3차원 자세 추정 방법은 도 2 및 도 3에 도시된 바와 같이, 객체에 대한 3D 포인트 클라우드(PC)에 대한 4개의 가상 투영 영상(100)을 생성한다(S100).The 3D pose estimation method generates four virtual projection images (100) for a 3D point cloud (PC) for an object (S100), as illustrated in FIGS. 2 and 3.

객체에 대한 3D 포인트 클라우드(PC)는 다시점 RGB-D 카메라 시스템을 이용하여 캡쳐된다. 다시점 RGB-D 카메라 시스템은 복수의 시점에서 색상 영상과 깊이 영상을 동시에 촬영할 수 있다.A 3D point cloud (PC) of an object is captured using a multi-view RGB-D camera system. A multi-view RGB-D camera system can capture color images and depth images simultaneously from multiple viewpoints.

가상 투영 영상(100)을 생성하기 위해, 우선 3D 포인트 클라우드(PC)의 정면을 결정한다. 3D 포인트 클라우드(PC)의 정면에 투영된 가상 투영 영상(100)으로부터 추출된 2D 뼈대 정보의 정확도가 가장 높기 때문에 3D 포인트 클라우드(PC)의 정면을 결정한다. 이를 위해, 3D 포인트 클라우드(PC)의 3차원 좌표의 공간상의 분포를 분석하여 3D 포인트 클라우드(PC)의 정면을 찾고, 3D 포인트 클라우드(PC)의 정면 방향이 Z축 방향과 평행하게 되도록 3D 포인트 클라우드(PC)를 회전시킨다. 분포된 데이터의 주성분을 찾는 주성분 분석을 이용하여 3D 포인트 클라우드(PC)의 정면 방향을 찾을 수 있다. In order to generate a virtual projection image (100), first, the front of the 3D point cloud (PC) is determined. Since the accuracy of the 2D skeleton information extracted from the virtual projection image (100) projected on the front of the 3D point cloud (PC) is the highest, the front of the 3D point cloud (PC) is determined. To this end, the spatial distribution of the 3D coordinates of the 3D point cloud (PC) is analyzed to find the front of the 3D point cloud (PC), and the 3D point cloud (PC) is rotated so that the front direction of the 3D point cloud (PC) becomes parallel to the Z-axis direction. The front direction of the 3D point cloud (PC) can be found by using principal component analysis that finds principal components of distributed data.

그리고, 3D 포인트 클라우드(PC)의 정면을 기준으로 3D 포인트 클라우드(PC)를 둘러싸는 축 정렬 경계 상자(Axis Aligned Bounding Box, AABB)를 설정한다. 축 정렬 경계 상자(Axis Aligned Bounding Box, AABB)는 4개의 가상 투영 평면(IP1, IP2, IP3, IP4)을 결정할 수 있다. Then, an axis-aligned bounding box (AABB) surrounding the 3D point cloud (PC) is set based on the front of the 3D point cloud (PC). The axis-aligned bounding box (AABB) can determine four virtual projection planes (IP1, IP2, IP3, IP4).

그리고, 축 정렬 경계 상자(AABB)의 4개의 가상 투영 평면(IP1, IP2, IP3, IP4)에 3D 포인트 클라우드(PC)를 투영하여 4개의 가상 투영 영상(110, 120, 130, 140)을 생성할 수 있다. 즉, 가상 투영 영상(100)은 제1 가상 투영 영상(110), 제2 가상 투영 영상(120), 제3 가상 투영 영상(130), 그리고 제4 가상 투영 영상(140)을 포함할 수 있다. 이 때, 모델 뷰 투영 매트릭스(model view projection matrix)를 이용하여 월드 좌표계에서 가상 투영 평면(IP1, IP2, IP3, IP4) 위의 좌표로 변환하여 3D 포인트 클라우드(PC)를 투영할 수 있다.And, by projecting the 3D point cloud (PC) onto four virtual projection planes (IP1, IP2, IP3, IP4) of an axis-aligned bounding box (AABB), four virtual projection images (110, 120, 130, 140) can be generated. That is, the virtual projection image (100) can include a first virtual projection image (110), a second virtual projection image (120), a third virtual projection image (130), and a fourth virtual projection image (140). At this time, the 3D point cloud (PC) can be projected by converting the world coordinate system into coordinates on the virtual projection planes (IP1, IP2, IP3, IP4) using a model view projection matrix.

다음으로, 도 4에 도시된 바와 같이, 자세 추출 프로그램을 이용하여 4개의 가상 투영 영상(110, 120, 130, 140)의 2D 뼈대 정보(200)를 추출한다(S200). 자세 추출 프로그램은 오픈 포즈 (OpenPose) 라이브러리를 포함하나, 반드시 이에 한정되는 것은 아니다. 오픈 포즈 라이브러리는 딥러닝의 합성곱 신경망(Convolutional Neural Network, CNN)을 기반으로 하며,　사진에서 실시간으로 여러 사람의 몸,　손,　그리고 얼굴의 특장점을 추출할 수 있는 라이브러리이다. 2D 뼈대 정보(200)는 제1 가상 투영 영상(110)에 추출된 제1 2D 뼈대 정보(210), 제2 가상 투영 영상(120)에 추출된 제2 2D 뼈대 정보(220), 제3 가상 투영 영상(130)에 추출된 제3 2D 뼈대 정보(230), 그리고 제4 가상 투영 영상(140)에 추출된 제4 2D 뼈대 정보(240)를 포함할 수 있다. Next, as illustrated in Fig. 4, 2D skeleton information (200) of four virtual projection images (110, 120, 130, 140) is extracted (S200) using a pose extraction program. The pose extraction program includes, but is not necessarily limited to, the OpenPose library. The OpenPose library is based on a deep learning convolutional neural network (CNN), and is a library that can extract special features of the body, hands, and face of multiple people in real time from a photo. The 2D skeleton information (200) may include first 2D skeleton information (210) extracted from a first virtual projection image (110), second 2D skeleton information (220) extracted from a second virtual projection image (120), third 2D skeleton information (230) extracted from a third virtual projection image (130), and fourth 2D skeleton information (240) extracted from a fourth virtual projection image (140).

다음으로, 추출된 2D 뼈대 정보(200)를 보정한다(S300).Next, the extracted 2D skeleton information (200) is corrected (S300).

도 5는 객체에 대한 3D 포인트 클라우드(PC)의 외부에 위치하는 오차 조인트 좌표(OJ)와 객체에 대한 3D 포인트 클라우드(PC)의 내부에 위치하는 보정 조인트 좌표(CJ)를 설명하는 도면이다. FIG. 5 is a diagram illustrating error joint coordinates (OJ) located outside a 3D point cloud (PC) for an object and correction joint coordinates (CJ) located inside a 3D point cloud (PC) for an object.

자세 추출 프로그램을 이용하여 2D 뼈대 정보(200)를 추출하는 경우, 소정 알고리즘과 딥러닝 모델을 포함하는 자세 추출 프로그램은 오차를 가지는 2D 뼈대 정보(200)를 추출할 수 있다. 오차를 가지는 2D 뼈대 정보에 의해 3D 뼈대 정보도 오차를 가질 수 있다. 이 경우, 도 5에 도시된 바와 같이, 객체에 대한 3D 포인트 클라우드(PC)의 외부에 오차 조인트 좌표(OJ)가 위치할 수 있다. 따라서, 3D 포인트 클라우드(PC)의 내부로 오차 조인트 좌표(OJ)를 위치시키는 보정을 진행한다. When extracting 2D skeleton information (200) using a posture extraction program, the posture extraction program including a predetermined algorithm and a deep learning model can extract 2D skeleton information (200) having an error. Due to the 2D skeleton information having an error, 3D skeleton information may also have an error. In this case, as illustrated in FIG. 5, the error joint coordinates (OJ) may be located outside the 3D point cloud (PC) for the object. Therefore, correction is performed to position the error joint coordinates (OJ) inside the 3D point cloud (PC).

이를 위해 우선, 객체에 대한 3D 포인트 클라우드(PC)의 외부에 위치하는 오차 조인트 좌표(OJ)를 확인한다.To do this, first, the error joint coordinates (OJ) located outside the 3D point cloud (PC) for the object are identified.

그리고, 오차 조인트 좌표(OJ)에 인접한 포인트 클라우드 세트(S1)를 클러스터링한다. Then, the set of point clouds (S1) adjacent to the error joint coordinates (OJ) are clustered.

도 6은 본 발명의 일 실시예에 따른 3D 자세 추정 방법에서 오차 평면을 설정하는 방법을 설명하는 도면이다. FIG. 6 is a drawing explaining a method for setting an error plane in a 3D pose estimation method according to one embodiment of the present invention.

도 6에 도시된 바와 같이, 우선, 오차 조인트 좌표(OJ)와 연결된 뼈대(B1, B2)의 방향을 이용하여 오차 평면(OP)을 설정한다. 즉, 오차 평면(OP)은 오차 조인트 좌표(OJ)에 연결된 뼈대(B1, B2)의 이등분선을 지나는 평면이다. 그리고, 오차 평면(OP)에 소정 거리만큼 인접한 포인트 클라우드 세트(S1)를 구한 후, 구해진 포인트 클라우드 세트(S1) 중 해당 조인트 좌표에 유효한 포인트 클라우드 세트(S2)를 찾기 위해 클러스터링을 진행한다.As illustrated in Fig. 6, first, an error plane (OP) is set using the direction of the skeleton (B1, B2) connected to the error joint coordinate (OJ). That is, the error plane (OP) is a plane passing through the bisector of the skeleton (B1, B2) connected to the error joint coordinate (OJ). Then, a point cloud set (S1) adjacent to the error plane (OP) by a predetermined distance is obtained, and then clustering is performed to find a point cloud set (S2) valid for the corresponding joint coordinate among the obtained point cloud sets (S1).

도 7은 본 발명의 일 실시예에 따른 3D 자세 추정 방법에서 클러스터링 전의 포인트 클라우드 세트(S1)와 클러스터링 후의 포인트 크라우드 세트(S2)를 도시한 도면이다. FIG. 7 is a diagram illustrating a point cloud set (S1) before clustering and a point cloud set (S2) after clustering in a 3D pose estimation method according to one embodiment of the present invention.

도 7에 도시된 바와 같이, 클러스터링 후에 포인트 크라우드 세트(S2)는 오차 평면(OP)을 중심으로 보다 밀집될 수 있다. 포인트 클라우드 세트(S1)의 클러스터링은 DBSCAN 알고리즘을 이용하여 진행한다. DBSCAN 알고리즘은 밀도 기반의 알고리즘으로 불특정한 분포를 따르는 데이터 세트들에 대하여 정확도가 더 높다. 또한, DBSCAN 알고리즘은 노이즈에 대한 판별도 가능하기 때문에 노이즈가 많은 직접 촬영한 3D 데이터에 대해 노이즈를 제거할 수 있다.As shown in Fig. 7, after clustering, the point cloud set (S2) can be more densely packed around the error plane (OP). Clustering of the point cloud set (S1) is performed using the DBSCAN algorithm. The DBSCAN algorithm is a density-based algorithm, and has higher accuracy for data sets that follow an unspecified distribution. In addition, since the DBSCAN algorithm can also determine noise, it can remove noise from directly captured 3D data that has a lot of noise.

그리고, 클러스터링된 포인트 클라우드 세트(S2)의 각 클러스터들의 중심을 기준으로 오차 조인트 좌표(OJ)를 보정함으로써, 도 4에 도시된 바와 같이, 객체에 대한 3D 포인트 클라우드(PC)의 내부에 보정된 보정 조인트 좌표(CJ)를 위치시킨다.Then, by correcting the error joint coordinates (OJ) based on the centers of each cluster of the clustered point cloud set (S2), the corrected correction joint coordinates (CJ) are positioned inside the 3D point cloud (PC) for the object, as illustrated in Fig. 4.

이 때, 클러스터링된 포인트 클라우드 세트(S2)의 각 클러스터들의 중심은 원 적합법(Circle fitting)으로 계산할 수 있다. At this time, the center of each cluster in the clustered point cloud set (S2) can be calculated using the circle fitting method.

원 적합법은 아래와 같은 방법으로 수행할 수 있다. The original fit method can be performed in the following manner.

우선, 하나의 클러스터에 대해 구하고자 하는 원의 중심 좌표를 라고 하면 클러스터 안의 n개의 포인트 들에 대해 오차가 가장 작은 원은 아래 수학식 1로 표현될 수 있다.First, the center coordinates of the circle you want to find for one cluster If n points in a cluster The circle with the smallest error for the fields can be expressed by the mathematical formula 1 below.

[수학식 1][Mathematical formula 1]

에 대해 수학식 1은 아래 수학식 2와 같이 표현될 수 있다. Mathematical expression 1 can be expressed as mathematical expression 2 below.

[수학식 2][Mathematical formula 2]

그리고, 가능한 W 중에 가장 오차를 작게 만들어주는 는 아래 수학식 3을 통해 얻을 수 있다. And, among the possible Ws, it makes the smallest error. can be obtained through the mathematical formula 3 below.

[수학식 3][Mathematical Formula 3]

이러한 원 적합법(Circle fitting)을 이용하여 클러스터링된 포인트 클라우드 세트(S2)의 각 클러스터의 중심을 계산하여 오차 조인트 좌표(OJ)를 보정함으로써, 도 4에 도시된 바와 같이, 객체에 대한 3D 포인트 클라우드(PC)의 내부에 보정된 보정 조인트 좌표(CJ)를 위치시킬 수 있다.By using this circle fitting method, the center of each cluster of the clustered point cloud set (S2) is calculated and the error joint coordinates (OJ) are corrected, so that the corrected joint coordinates (CJ) can be positioned inside the 3D point cloud (PC) for the object, as shown in Fig. 4.

이와 같이, 추출된 2D 뼈대 정보(200)를 클러스터링 및 오차 조인트 좌표(OJ)의 보정을 통해 보정함으로써, 보다 정확한 3D 뼈대 정보를 추출하여 정확한 3D 자세를 추정할 수 있다. In this way, by correcting the extracted 2D skeleton information (200) through clustering and correction of error joint coordinates (OJ), more accurate 3D skeleton information can be extracted and an accurate 3D pose can be estimated.

다음으로, 도 2 및 도 4에 도시된 바와 같이, 4개의 2D 뼈대 정보(210, 220, 230, 240)의 각각의 2D 조인트 좌표(11, 12, 13, 14)를 4개의 가상 투영선(21, 22, 23, 24)으로 서로 연결하여 3D 교차 공간(IS)을 형성한다(S400). 즉, 4개의 가상 투영 평면(IP1, IP2, IP3, IP4) 상에 생성된 4개의 가상 투영 영상(110, 120, 130, 140)은 4개의 2D 뼈대 정보(210, 220, 230, 240)를 가지며, 4개의 2D 뼈대 정보(210, 220, 230, 240)는 서로 매칭되는 4개의 2D 조인트 좌표(11, 12, 13, 14)를 가진다. 도 3에서는 설명의 편의를 위해 왼쪽 무릎의 2D 조인트 좌표를 도시하고 있으나, 반드시 이에 한정되는 것은 아니다. 서로 매칭되는 4개의 2D 조인트 좌표(11, 12, 13, 14)를 4개의 가상 투영선(21, 22, 23, 24)으로 서로 연결하면, 4개의 가상 투영선(21, 22, 23, 24)이 서로 교차하는 3D 교차 공간(IS)이 형성될 수 있다. Next, as illustrated in FIGS. 2 and 4, each of the 2D joint coordinates (11, 12, 13, 14) of the four 2D skeleton information (210, 220, 230, 240) is connected to each other by four virtual projection lines (21, 22, 23, 24) to form a 3D intersection space (IS) (S400). That is, four virtual projection images (110, 120, 130, 140) generated on four virtual projection planes (IP1, IP2, IP3, IP4) have four 2D bone information (210, 220, 230, 240), and the four 2D bone information (210, 220, 230, 240) have four 2D joint coordinates (11, 12, 13, 14) that match each other. For convenience of explanation, Fig. 3 illustrates the 2D joint coordinates of the left knee, but is not necessarily limited thereto. By connecting four 2D joint coordinates (11, 12, 13, 14) that match each other with four virtual projection lines (21, 22, 23, 24), a 3D intersection space (IS) can be formed where the four virtual projection lines (21, 22, 23, 24) intersect each other.

도 8은 본 발명의 일 실시예에 따른 3D 자세 추정 방법의 보정 3D 조인트 좌표를 생성하여 3D 뼈대 정보를 추출하는 단계를 설명하는 도면이다.FIG. 8 is a drawing illustrating a step of extracting 3D skeleton information by generating corrected 3D joint coordinates in a 3D pose estimation method according to one embodiment of the present invention.

도 8에 도시된 바와 같이, 3D 교차 공간(IS)을 통과하는 가상 투영선(21, 22, 23, 24)에 연결되는 2D 조인트 좌표들(11, 12, 13, 14)을 통합하여 보정 3D 조인트 좌표(300)를 생성하여 3D 뼈대 정보를 추출한다(S500).As illustrated in Fig. 8, 2D joint coordinates (11, 12, 13, 14) connected to virtual projection lines (21, 22, 23, 24) passing through a 3D intersection space (IS) are integrated to generate corrected 3D joint coordinates (300) to extract 3D skeleton information (S500).

보정 3D 조인트 좌표(300)를 생성하는 경우, 3D 교차 공간(IS)을 벗어나는 오차 가상 투영선(EL)의 2D 조인트 좌표는 포함하지 않을 수 있다. When generating the corrected 3D joint coordinates (300), the 2D joint coordinates of the error virtual projection line (EL) that goes out of the 3D intersection space (IS) may not be included.

자세 추출 프로그램을 이용하여 2D 뼈대 정보(200)를 추출하는 경우, 오차가 발생할 수 있으며, 이러한 오차로 인해 3D 교차 공간(IS)을 벗어나는 오차 가상 투영선(EL)이 발생할 수 있다. 도 8에 도시된 정면 뷰(FV) 및 측면 뷰(SV)에서 3D 교차 공간(IS)을 벗어나는 오차 가상 투영선(EL)을 확인할 수 있다. When extracting 2D skeleton information (200) using a posture extraction program, errors may occur, and these errors may cause error virtual projection lines (EL) that deviate from the 3D intersection space (IS). The error virtual projection lines (EL) that deviate from the 3D intersection space (IS) can be confirmed in the front view (FV) and the side view (SV) illustrated in Fig. 8.

그리고, 3D 교차 공간(IS)에서 위치한 4개의 2D 조인트 좌표(11, 12, 13, 14)를 평균하여 보정 3D 조인트 좌표(300)를 생성할 수 있다. 즉, 상면 뷰(TV)에서 (x, z)의 좌표를 결정하고, 측면 뷰(SV)에서 y 좌표를 결정한다. 계산된 (x, y, z) 좌표는 정면 뷰(FV)에서의 (x, y) 좌표와 일치해야 한다.And, the corrected 3D joint coordinates (300) can be generated by averaging the four 2D joint coordinates (11, 12, 13, 14) located in the 3D intersection space (IS). That is, the coordinates of (x, z) are determined in the top view (TV), and the y coordinate is determined in the side view (SV). The calculated (x, y, z) coordinates must match the (x, y) coordinates in the front view (FV).

이와 같이, 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 3D 자세 추정 방법은 객체에 대한 3D 포인트 클라우드(PC)에서 2D 뼈대 정보를 추출하고, 2D 뼈대 정보의 2D 조인트 좌표를 서로 연결하고 통합하여 보정 3D 조인트 좌표를 생성하여 보정된 3D 뼈대 정보를 추출함으로써, 보다 정확한 3D 자세를 추정할 수 있다.In this way, the 3D pose estimation method of the method for generating a 3D volumetric video according to one embodiment of the present invention extracts 2D skeleton information from a 3D point cloud (PC) for an object, connects and integrates 2D joint coordinates of the 2D skeleton information to generate corrected 3D joint coordinates, and extracts corrected 3D skeleton information, thereby enabling more accurate 3D pose estimation.

다음으로, 리깅(Rigging) 방법을 이용하여 3D 뼈대 정보들(200)을 객체(P)에 적용하여 3D 동영상 모델(VM)을 생성한다(S20). 리깅 방법은 객체(P)가 애니메이팅 가능한 상태로 만드는 방법이다. Next, a 3D video model (VM) is generated by applying 3D skeleton information (200) to an object (P) using a rigging method (S20). The rigging method is a method of making the object (P) an animable state.

리깅 방법은 우선, 추출된 3D 뼈대 정보들 사이의 시간적인 움직임을 추정하여 움직임 벡터를 생성한다(S21). The rigging method first estimates the temporal movement between the extracted 3D skeleton information to generate a motion vector (S21).

다음으로, 움직임 벡터를 분석하여 3D 동영상 모델을 생성한다(S22).Next, the motion vector is analyzed to generate a 3D video model (S22).

도 9는 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법의 리깅 방법에 의해 3D 동영상 모델을 생성한 상태를 도시한 도면이다. FIG. 9 is a drawing illustrating a state in which a 3D video model is created by a rigging method of a method for creating a 3D volumetric video according to one embodiment of the present invention.

도 9에 도시된 바와 같이, 리깅 방법으로 객체(P)에 3D 뼈대 정보들(200)을 적용하여 객체가 움직일 수 있는 상태의 3D 동영상 모델(VM)을 만들 수 있다. 따라서, 이후 3D 동영상 모델(VM)을 이용하여 애니메이팅할 수 있다.As illustrated in Fig. 9, by applying 3D skeleton information (200) to an object (P) using a rigging method, a 3D video model (VM) in which the object can move can be created. Accordingly, animation can be performed thereafter using the 3D video model (VM).

다음으로, 3D 동영상 모델을 이용하여 애니메이팅된 3D 볼류메트릭 비디오(VV)를 생성한다(S30). 3D 볼류메트릭 비디오(3D Volumetric Video)란 고해상도 화질을 구현하는 다시점 RGB-D 카메라 시스템을 이용하여 객체의 움직임을 캡처하고, 객체의 움직임에 리깅 방법을 적용하여 생성된 360도 3D 입체 영상을 의미한다.Next, an animated 3D volumetric video (VV) is generated using a 3D video model (S30). A 3D volumetric video refers to a 360-degree 3D stereoscopic image generated by capturing the movement of an object using a multi-view RGB-D camera system that implements high-resolution image quality and applying a rigging method to the movement of the object.

도 10은 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법을 이용하여 생성된 3D 볼류메트릭 비디오를 나타낸 도면이다. FIG. 10 is a drawing showing a 3D volumetric video generated using a method for generating a 3D volumetric video according to one embodiment of the present invention.

도 10에 도시된 바와 같이, 3D 동영상 모델(VM)을 이어 붙이면서 애니메이팅하여 3D 볼류메트릭 비디오(VV)를 생성할 수 있다. 도 10에는 3D 동영상 모델을 이용하여 걷는 동작을 애니메이팅하였으나, 반드시 이에 한정되는 것은 아니며, 춤추는 동작, 박수치는 동작, 부채질 동작, 긁는 동작, 경례 동작 등의 동작도 가능하다. As illustrated in Fig. 10, a 3D volumetric video (VV) can be created by animating 3D video models (VMs) by connecting them. In Fig. 10, a walking motion is animated using a 3D video model, but it is not limited thereto, and other motions such as dancing, clapping, fanning, scratching, and saluting are also possible.

이 때, 3D 볼류메트릭 비디오(VV)는 춤추는 동작, 박수치는 동작, 부채질 동작, 긁는 동작, 경례 동작 등의 복수의 동작 템플릿을 미리 생성함으로써, 사용자가 3D 볼류메트릭 비디오를 용이하게 생성할 수 있게 한다.At this time, 3D volumetric video (VV) enables users to easily create 3D volumetric videos by pre-generating multiple motion templates, such as dancing motions, clapping motions, fanning motions, scratching motions, and saluting motions.

이와 같이, 본 발명의 일 실시예에 따른 3D 볼류메트릭 비디오의 생성 방법 및 이의 기록 매체는 3D 자세 추정 방법을 이용하여 객체에 대한 정확한 3D 뼈대 정보를 추출하고, 리깅 방법을 이용하여 추출된 3D 뼈대 정보들 사이의 시간적인 움직임을 추정함으로써, 고정밀의 3D 동영상 모델을 생성할 수 있다. In this way, the method for generating a 3D volumetric video and the recording medium thereof according to one embodiment of the present invention can generate a high-precision 3D moving image model by extracting accurate 3D skeletal information for an object using a 3D pose estimation method and estimating temporal movement between the extracted 3D skeletal information using a rigging method.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다.An embodiment of the present invention may also be implemented in the form of a recording medium containing computer-executable instructions, such as program modules, that are executed by a computer. The computer-readable medium may be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include all computer storage media. The computer storage media may include both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.

이상을 통해 본 발명의 바람직한 실시예에 대하여 설명하였지만, 본 발명은 이에 한정되는 것이 아니고 특허청구범위와 발명의 상세한 설명 및 첨부한 도면의 범위 안에서 여러 가지로 변형하여 실시하는 것이 가능하고 이 또한 본 발명의 범위에 속하는 것은 당연하다.Although the preferred embodiments of the present invention have been described above, the present invention is not limited thereto, and various modifications may be made within the scope of the claims, the detailed description of the invention, and the attached drawings, which also fall within the scope of the present invention.

VM: 3D 동영상 모델 VV: 3D 볼류메트릭 비디오
100: 가상 투영 영상 200: 2D 뼈대 정보
11, 12, 13, 14: 2D 조인트 좌표 21, 22, 23, 24: 가상 투영선
PC: 3D 포인트 클라우드 AABB: 축 정렬 경계 상자
IPI: 가상 투영 영상 IS: 3D 교차 공간
EL: 오차 가상 투영선 300: 보정 3D 조인트 좌표VM: 3D Video Model VV: 3D Volumetric Video
100: Virtual projection image 200: 2D skeleton information
11, 12, 13, 14: 2D joint coordinates 21, 22, 23, 24: Virtual projection lines
PC: 3D Point Cloud AABB: Axis-aligned bounding box
IPI: Virtual Projection Image IS: 3D Intersecting Space
EL: Error virtual projection line 300: Corrected 3D joint coordinates

Claims

A step of extracting 3D skeleton information for an object using a 3D pose estimation method;
A step of applying the 3D skeleton information to the object using a rigging method to create a 3D video model; and
A step of generating an animated 3D volumetric video using the above 3D video model.
Including,
The above 3D skeleton information is extracted using a 3D point cloud for the object captured using a camera system,
The above rigging method
A step of generating a motion vector by estimating the temporal movement between the extracted 3D skeleton information, and
A step of generating the 3D video model by analyzing the above motion vector.
A method for generating a 3D volumetric video comprising:
The above 3D pose estimation method comprises the steps of generating four virtual projection images of a 3D point cloud for the object;
A step of extracting 2D skeleton information of the four virtual projection images using a posture extraction program;
A step of forming a 3D intersection space by connecting the 2D joint coordinates of the above four 2D skeleton information with virtual projection lines; and
A step of extracting the 3D skeleton information by integrating the 2D joint coordinates of the virtual projection line passing through the 3D intersection space to generate the corrected 3D joint coordinates;
In the step of generating the above-mentioned corrected 3D joint coordinates,
2D joint coordinates of error virtual projection lines that leave the above 3D intersection space are not included,
Further comprising a step of correcting the extracted 2D skeleton information,
The step of correcting the above 2D skeleton information is
A step of checking the error joint coordinates located outside the above 3D point cloud,
A step of clustering a set of point clouds (S1) adjacent to the above error joint coordinates, and
A step of positioning the joint coordinates within the 3D point cloud by correcting the error joint coordinates based on the center of each cluster of the clustered point cloud set (S2),
The step of positioning the joint coordinates inside the above 3D point cloud is
A step of setting an error plane using the direction of the skeleton connected to the above error joint coordinates;
A step of performing clustering to find a valid point cloud set for the corresponding joint coordinates among the point cloud sets obtained after obtaining the point cloud set (S1) adjacent to the error plane by a predetermined distance; and
A method for generating a 3D volumetric video, characterized by comprising a step of positioning the corrected joint coordinates inside the 3D point cloud for the object by correcting the error joint coordinates based on the centers of each cluster of the clustered point cloud set (S2).

In paragraph 1,
The above 3D volumetric video is a method for generating a 3D volumetric video including a plurality of motion templates.

delete

In paragraph 1,
The step of generating the above virtual projection image is
A step of determining the front side of the above 3D point cloud,
A step of setting an axis-aligned bounding box (AABB) surrounding the 3D point cloud based on the front side of the 3D point cloud,
A step of projecting the 3D point cloud onto four virtual projection planes of the above axis-aligned bounding box.
A method for generating a 3D volumetric video, comprising:

delete

A computer-readable recording medium having recorded thereon a program for performing the method of generating a 3D volumetric video according to claim 1, claim 2, or claim 4.