KR102827978B1

KR102827978B1 - 2D image and 3D point cloud registration method and system

Info

Publication number: KR102827978B1
Application number: KR1020220110454A
Authority: KR
Inventors: 전유림; 서승우
Original assignee: 서울대학교산학협력단
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2025-06-30
Anticipated expiration: 2042-09-01
Also published as: KR20240031591A

Abstract

본 발명은 2차원 이미지 및 3차원 포인트 클라우드의 정합 방법 및 그 시스템에 관한 것이다. 구체적으로, 본 발명은 2차원 이미지를 기준 좌표계의 수평 방향에 정렬하고, 3차원 포인트 클라우드를 기준 좌표계의 수직 방향에 정렬한 후에, 2차원 이미지 및 3차원 포인트 클라우드의 상대적인 위치(pose)를 비교하여, 정합 행렬(registration matrix)를 도출하는 방법 및 그 장치에 관한 것이다.
본 발명에 따른 외적 보정 장치 및 그 방법에 따르면, 임의의 초기 상태에서 특별한 사전 준비 없이 주행 중에도 2차원 이미지와 3차원 포인트 클라우드의 정합을 수행할 수 있는 효과가 있다.The present invention relates to a method and a system for registering a two-dimensional image and a three-dimensional point cloud. Specifically, the present invention relates to a method and a device for deriving a registration matrix by comparing the relative positions (poses) of the two-dimensional image and the three-dimensional point cloud after aligning the two-dimensional image in the horizontal direction of the reference coordinate system and aligning the three-dimensional point cloud in the vertical direction of the reference coordinate system.
According to the external correction device and method according to the present invention, it is possible to perform alignment of a two-dimensional image and a three-dimensional point cloud even while driving without any special prior preparation in any initial state.

Description

2D image and 3D point cloud registration method and system {2D image and 3D point cloud registration method and system}

본 발명은 2차원 이미지 및 3차원 포인트 클라우드의 정합 방법 및 그 시스템에 관한 것이다. 구체적으로, 본 발명은 2차원 이미지를 기준 좌표계의 수평 방향에 정렬하고, 3차원 포인트 클라우드를 기준 좌표계의 수직 방향에 정렬한 후에, 2차원 이미지 및 3차원 포인트 클라우드의 상대적인 위치(pose)를 비교하여, 정합 행렬(registration matrix)를 도출하는 방법 및 그 장치에 관한 것이다.The present invention relates to a method and a system for registering a two-dimensional image and a three-dimensional point cloud. Specifically, the present invention relates to a method and a device for deriving a registration matrix by comparing the relative positions (poses) of the two-dimensional image and the three-dimensional point cloud after aligning the two-dimensional image in the horizontal direction of the reference coordinate system and aligning the three-dimensional point cloud in the vertical direction of the reference coordinate system.

로봇, 자율주행차 등 기계 장치들은 센서들에 의하여 주변을 인지하게 되는데, 다양한 종류의 센서들이 동시에 사용되는 경우가 많다. 예를 들어, 라이다 센서 및 카메라 센서는 가장 폭넓게 사용되는 센서들이고, 서로 상보적이므로 함께 이용되는 경우가 많다. Robots, self-driving cars, and other mechanical devices perceive their surroundings using sensors, and many different types of sensors are often used simultaneously. For example, lidar sensors and camera sensors are the most widely used sensors, and are often used together because they are complementary to each other.

라이다 센서는 특정 패턴의 빛(레이저)을 쏘아 그 반사광들을 바탕으로 주변 환경에 대한 3차원 정보들을 센싱하고, 이를 3차원 좌표계에 표시되는 측정 데이터인 포인트 클라우드(point cloud)로 추출하는 3차원 센서이다. 삼차원 센서의 다른 일 예로는 레이더 센서, 초음파 센서 등이 있다. 반면, 카메라 센서는 외부의 빛을 받아들여 이를 2차원 형태의 이미지로 저장하는 2차원 센서이다. 카메라 센서는 2차원 이미지를 형성하는데 이용하는 빛의 종류에 따라서 가시광선 카메라, 적외선 카메라 등으로 분류될 수 있다.A lidar sensor is a 3D sensor that shoots a specific pattern of light (laser), senses 3D information about the surrounding environment based on the reflected light, and extracts it as a point cloud, which is measurement data displayed in a 3D coordinate system. Other examples of 3D sensors include radar sensors and ultrasonic sensors. On the other hand, a camera sensor is a 2D sensor that receives external light and stores it as a 2D image. Camera sensors can be classified into visible light cameras, infrared cameras, etc. depending on the type of light used to form the 2D image.

2차원 센서로부터 얻어진 이미지 데이터 및 3차원 센서로부터 얻어진 포인트 클라우드 데이터들을 동시에 활용하기 위해서는, 2차원 이미지와 3차원 포인트 클라우드를 동일한 위치 및 방향으로 정렬하여, 서로 정합(registration)되도록 하여야 한다. 이러한 정합은 2차원 이미지와 3차원 포인트 클라우드 사이의 변환(transformation)에 의하여 수행될 수 있다.In order to simultaneously utilize image data obtained from a 2D sensor and point cloud data obtained from a 3D sensor, the 2D image and the 3D point cloud must be aligned to the same position and direction so that they are registered with each other. This registration can be performed by transformation between the 2D image and the 3D point cloud.

2차원 이미지 및 3차원 포인트 클라우드의 정합은 컴퓨터 비전, 로보틱스, 자율 주행 등 2차원 이미지 및 3차원 포인트 클라우드를 동시에 활용하는 다양한 분야에서 적용될 수 있다. 예를 들어, 3차원 포인트 클라우드의 기준 좌표계를 기준으로 2차원 이미지 센서의 위치 및 자세를 측정할 수 있고, 이를 이미지 기반 측위(image-based localization)이라 한다. 다른 예로서, 2차원 센서 및 3차원 센서 사이의 상대적인 위치 및 자세를 측정하여, 센서 데이터들 사이의 상관성을 확립하는 외적 보정(extrinsic calibration)이 있다.The alignment of two-dimensional images and three-dimensional point clouds can be applied to various fields that utilize two-dimensional images and three-dimensional point clouds simultaneously, such as computer vision, robotics, and autonomous driving. For example, the position and attitude of a two-dimensional image sensor can be measured based on the reference coordinate system of a three-dimensional point cloud, and this is called image-based localization. As another example, there is extrinsic calibration, which measures the relative position and attitude between two-dimensional sensors and three-dimensional sensors to establish a correlation between sensor data.

2차원 이미지 및 3차원 포인트 클라우드를 정합하는 일반적인 방법은 2차원 이미지 데이터 및 3차원 포인트 클라우드 데이터의 특징들을 매칭하는 알고리즘을 이용하는 것이다. 그러나 매칭 알고리즘에 의한 정합 방법은 2차원 이미지와 3차원 포인트 클라우드의 초기 상태(즉, 상대적인 위치 및 자세)의 차이가 큰 경우에는 이들을 정합하기 어렵다. 즉, 매칭 알고리즘에 의한 정합 방법은 2차원 이미지와 3차원 포인트 클라우드가 초기 상태의 차이가 일정 이하로 제한된 경우에만 적용이 가능한 문제점이 있었다. 특히, 오프 로드와 같이 거친 환경에서는 2차원 센서와 3차원 센서의 상대적인 위치 및 자세의 차이가 큰 상황이 발생할 수 있고, 이러한 상황에서는 일반적인 매칭 알고리즘에 의하여 2차원 이미지 및 3차원 포인트 클라우드를 정합하는 것이 쉽지 않다. A common method for aligning a two-dimensional image and a three-dimensional point cloud is to use an algorithm that matches the features of the two-dimensional image data and the three-dimensional point cloud data. However, it is difficult to align the two-dimensional image and the three-dimensional point cloud when there is a large difference in their initial states (i.e., relative positions and poses) using a matching algorithm. In other words, there was a problem that the matching method using a matching algorithm could only be applied when the difference in the initial states of the two-dimensional image and the three-dimensional point cloud was limited to a certain level or less. In particular, in a rough environment such as an off-road, a situation may occur where there is a large difference in the relative positions and poses of the two-dimensional sensor and the three-dimensional sensor, and in such a situation, it is not easy to align the two-dimensional image and the three-dimensional point cloud using a common matching algorithm.

더욱이, 오프 로드와 같이 거친 환경에서는 2차원 센서 및 3차원 센서의 상대적인 위치 및 자세의 차이가 주행 중에 크게 변화하는 상황이 발생할 수 있는데, 일반적인 매칭 알고리즘은 주행 중에, 초기 상태의 차이가 큰, 2차원 이미지 및 3차원 포인트 클라우드를 정합하는 것이 어려운 문제점이 있었다. Moreover, in harsh environments such as off-roads, situations may arise where the differences in the relative positions and attitudes of the two-dimensional and three-dimensional sensors change significantly during driving. Conventional matching algorithms have had difficulty aligning two-dimensional images and three-dimensional point clouds with large differences in their initial states during driving.

즉, 2차원 이미지와 3차원 포인트 클라우드의 초기 상태(즉, 상대적인 위치 및 자세)의 차이가 큰 경우에도 2차원 이미지 및 3차원 포인트 클라우드를 정합할 수 있는 방법 및 그 시스템에 대한 요구가 있었으나, 종래의 기술에 따르면 이을 제공할 수 없는 문제점이 있었고, 본 발명은 이를 해결하기 위한 것이다.That is, there has been a demand for a method and system capable of aligning a two-dimensional image and a three-dimensional point cloud even when there is a large difference in the initial states (i.e., relative positions and poses) of the two-dimensional image and the three-dimensional point cloud, but there has been a problem in that this cannot be provided according to conventional techniques, and the present invention is intended to solve this.

T. Sattler, B. Leibe, and L. Kobbelt, "Efficient & effective prioritized matching for large-scale image-based localization," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 9, pp. 1744-1756, 2016.T. Sattler, B. Leibe, and L. Kobbelt, "Efficient & effective prioritized matching for large-scale image-based localization," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 9, pp. 1744-1756, 2016.

본 발명은 2차원 이미지와 3차원 포인트 클라우드의 초기 상태의 차이가 큰 경우에도 2차원 이미지 및 3차원 포인트 클라우드를 정합할 수 있는 방법 및 그 시스템을 제공하기 위한 것이다.The present invention provides a method and system for aligning a two-dimensional image and a three-dimensional point cloud even when there is a large difference between the initial states of the two-dimensional image and the three-dimensional point cloud.

또한, 본 발명은 사전 준비 없이, 주행 중에 온라인으로 2차원 이미지 및 3차원 포인트 클라우드를 정합할 수 있는 방법 및 그 시스템을 제공하기 위한 것이다.In addition, the present invention provides a method and system for aligning a two-dimensional image and a three-dimensional point cloud online while driving without prior preparation.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른 2차원 이미지 및 3차원 포인트 클라우드를 정합하는 장치는, 신경망에 기초하여, 상기 3차원 포인트 클라우드로부터 지면 법선 벡터를 추정하고, 상기 지면 법선 벡터가 가상 기준 좌표계의 e3 = [0,0,1]과 일치하도록 정렬하는 변환 행렬 T_E을 산출하는 수직 좌표 정렬부; 신경망에 기초하여, 상기 2차원 이미지로부터 지평선 벡터를 추정하고, 상기 지평선 벡터가 상기 가상 기준 좌표계의 e2 = [0,1,0]과 일치하도록 정렬하는 변환 행렬 T_H을 산출하는 수평 좌표 정렬부; 신경망에 기초하여, 상기 행렬 T_E에 의하여 변환된 포인트 클라우드로부터 레인지 이미지를 도출하고, 상기 레인지 이미지와 상기 행렬 T_H에 의하여 변환된 2차원 이미지의 상관도가 최대가 되는 오프셋(w)에 기초하여, 상기 행렬 T_E에 의하여 변환된 포인트 클라우드의 전방축 벡터를 추정하고, 상기 전방축 벡터를 상기 기준 좌표계의 e1 = [1,0,0]과 일치하도록 정렬하는 변환 행렬 T_F를 산출하는 전방축 정렬부; 및 신경망에 기초하여, 상기 행렬 T_F·T_E에 의하여 변환된 포인트 클라우드를 깊이 이미지로 변환하고, 상기 행렬 T_H에 의하여 변환된 2차원 이미지를 의사 깊이 이미지로 변환한 후에, 양자의 특성 비교를 통하여 포인트 클라우드와 2차원 이미지의 원점이 일치하도록 변환하는 행렬 T_G를 산출하는 원점 정렬부;를 포함한다.According to one embodiment of the present invention for achieving the above technical task, a device for aligning a two-dimensional image and a three-dimensional point cloud comprises: a vertical coordinate alignment unit for estimating a ground normal vector from the three-dimensional point cloud based on a neural network and calculating a transformation matrix T _E that aligns the ground normal vector to coincide with e3 = [0,0,1] of a virtual reference coordinate system; a horizontal coordinate alignment unit for estimating a horizon vector from the two-dimensional image based on a neural network and calculating a transformation matrix T _H that aligns the horizon vector to coincide with e2 = [0,1,0] of the virtual reference coordinate system; The present invention comprises: a forward axis alignment unit which derives a range image from a point cloud transformed by the matrix T _E based on a neural network, estimates a forward axis vector of the point cloud transformed by the matrix T _E based on an offset (w) at which a correlation between the range image and the two-dimensional image transformed by the matrix T _H is maximized, and calculates a transformation matrix T _F that aligns the forward axis vector to match e1 = [1,0,0] of the reference coordinate system; and an origin alignment unit which converts the point cloud transformed by the matrix T _F ·T _E based on a neural network into a depth image, converts the two-dimensional image transformed by the matrix T _H into a pseudo depth image, and then calculates a matrix T _G that transforms the origins of the point cloud and the two-dimensional image to match through a comparison of the characteristics of the two.

본 발명의 일 실시예에 따르면, 2차원 이미지 및 3차원 포인트 클라우드를 정합하는 변환 행렬로서 T_G·T_F·T_E를 산출할 수 있다.According to one embodiment of the present invention, T _G ·T _F ·T _E can be calculated as a transformation matrix for aligning a two-dimensional image and a three-dimensional point cloud.

본 발명의 일 실시예에 따르면, 2차원 이미지는 카메라 센서의 출력이고, 3차원 포인트 클라우드는 라이다 센서의 출력에 해당할 수 있다.According to one embodiment of the present invention, the two-dimensional image may be an output of a camera sensor, and the three-dimensional point cloud may correspond to an output of a lidar sensor.

본 발명의 일 실시예에 따르면, 상기 수직 좌표 정렬부는 DownBCL 블록을 포함하는 신경망에 의하여 구현될 수 있다.According to one embodiment of the present invention, the vertical coordinate alignment unit can be implemented by a neural network including a DownBCL block.

본 발명의 일 실시예에 따르면, 상기 수직 좌표 정렬부는 상기 지면 법선 벡터의 절대값 및 부호를 각각 예측할 수 있다.According to one embodiment of the present invention, the vertical coordinate alignment unit can predict the absolute value and sign of the ground normal vector, respectively.

본 발명의 일 실시예에 따르면, 상기 전방축 정렬부는 상기 행렬 T_F·T_E에 의하여 변환된 포인트 클라우드를 아래 수식에 따라 레인지 이미지로 변환하고,According to one embodiment of the present invention, the front axis alignment unit converts the point cloud transformed by the matrix T _F ·T _E into a range image according to the following formula,

상기 수식에서, (x, y, z)는 포인트 클라우드의 좌표이고, 및 는 각각 포인트 클라우드의 수직 시야(vertical field-of-view)의 상한 및 하한에 해당하는 각도이며, H 및 W는 각각 레인지 맵의 높이 및 넓이를 의미하고, λ는 이미지와 포인트 클라우드의 수평 시야(horizontal field-of-view)의 비율을 나타낼 수 있다.In the above formula, (x, y, z) are the coordinates of the point cloud, and are the angles corresponding to the upper and lower limits of the vertical field-of-view of the point cloud, respectively, H and W represent the height and width of the range map, respectively, and λ can represent the ratio of the horizontal field-of-view of the image and the point cloud.

본 발명의 일 실시예에 따르면, 상기 레인지 이미지와 상기 행렬 T_H에 의하여 변환된 2차원 이미지의 상관도는 아래 수식으로 계산되고,According to one embodiment of the present invention, the correlation between the range image and the two-dimensional image transformed by the matrix T _H is calculated by the following formula,

상기 수식에서, 은 상기 레인지 이미지의 특성 맵이고, 는 상기 행렬 T_H에 의하여 변환된 2차원 이미지의 특성 맵이며, 및 은 각각 2차원 이미지의 높이 및 넓이이며, 는 특성 맵의 차원 크기이고, λ는 이미지와 포인트 클라우드의 수평 시야(horizontal field-of-view)의 비율을 나타내며, 는 오프셋(offset) 값을 나타낼 수 있다.In the above formula, is the feature map of the above range image, is a feature map of a two-dimensional image transformed by the above matrix T _H , and are the height and width of the two-dimensional image, respectively. is the dimension size of the feature map, λ represents the ratio of the horizontal field-of-view of the image and the point cloud, can represent an offset value.

본 발명의 일 실시예에 따르면, 상기 원점 정렬부는 상기 보정 행렬 T_F·T_E에 의하여 보정된 삼차원 센서의 측정 데이터들을 아래 수식에 의하여 깊이 이미지로 변환하는,According to one embodiment of the present invention, the origin alignment unit converts the measurement data of the three-dimensional sensor corrected by the correction matrix T _F ·T _E into a depth image by the following formula:

상기 수식에서 K_init는 초기 보정 매트릭스이고, (u, v, w)는 라이다 좌표계로 표현된 3차원 포인트 클라우드의 점 (x, y, z)를 카메라 좌표로 전환하여 얻은 좌표값을 의미할 수 있다.In the above formula, K _init is an initial correction matrix, and (u, v, w) may mean coordinate values obtained by converting the point (x, y, z) of the 3D point cloud expressed in the LIDAR coordinate system into camera coordinates.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른 2차원 이미지 및 3차원 포인트 클라우드를 정합하는 방법은 신경망에 기초하여, 상기 3차원 포인트 클라우드로부터 지면 법선 벡터를 추정하고, 상기 지면 법선 벡터가 가상 기준 좌표계의 e3 = [0,0,1]과 일치하도록 정렬하는 변환 행렬 T_E을 산출하는 수직 좌표 정렬 단계; 신경망에 기초하여, 상기 2차원 이미지로부터 지평선 벡터를 추정하고, 상기 지평선 벡터가 상기 가상 기준 좌표계의 e2 = [0,1,0]과 일치하도록 정렬하는 변환 행렬 T_H을 산출하는 수평 좌표 정렬 단계; 신경망에 기초하여, 상기 행렬 T_E에 의하여 변환된 포인트 클라우드로부터 레인지 이미지를 도출하고, 상기 레인지 이미지와 상기 행렬 T_H에 의하여 변환된 2차원 이미지의 상관도가 최대가 되는 오프셋(w)에 기초하여, 상기 행렬 T_E에 의하여 변환된 포인트 클라우드의 전방축 벡터를 추정하고, 상기 전방축 벡터를 상기 기준 좌표계의 e1 = [1,0,0]과 일치하도록 정렬하는 변환 행렬 T_F를 산출하는 전방축 정렬 단계; 및 신경망에 기초하여, 상기 행렬 T_F·T_E에 의하여 변환된 포인트 클라우드를 깊이 이미지로 변환하고, 상기 행렬 T_H에 의하여 변환된 2차원 이미지를 의사 깊이 이미지로 변환한 후에, 양자의 특성 비교를 통하여 포인트 클라우드와 2차원 이미지의 원점이 일치하도록 변환하는 행렬 T_G를 산출하는 원점 정렬 단계;를 포함한다.According to one embodiment of the present invention for achieving the above technical task, a method for aligning a two-dimensional image and a three-dimensional point cloud includes: a vertical coordinate alignment step of estimating a ground normal vector from the three-dimensional point cloud based on a neural network and calculating a transformation matrix T _E that aligns the ground normal vector to coincide with e3 = [0,0,1] of a virtual reference coordinate system; a horizontal coordinate alignment step of estimating a horizon vector from the two-dimensional image based on a neural network and calculating a transformation matrix T _H that aligns the horizon vector to coincide with e2 = [0,1,0] of the virtual reference coordinate system; A method for aligning a point cloud by a matrix T _E is provided. The method comprises: a step of aligning a point cloud by a matrix T _E based on a neural network, extracting a range image from the point cloud transformed by the matrix T E , estimating a front axis vector of the point cloud transformed by the matrix T _E based on an offset (w) at which a correlation between the range image and the two-dimensional image transformed by the matrix T H is maximized, and calculating a transformation matrix T _F for aligning the front axis vector to match e1 = [1,0,0] of the reference coordinate system; and an origin alignment step of converting the point cloud transformed by the matrix T _F · T _E into a depth image based on a neural network, converting the two-dimensional image transformed by the matrix T _H into a pseudo-depth image, and then calculating a matrix T _G for transforming the origins of the point cloud and the two-dimensional image to match through a comparison of the characteristics of the two.

본 발명의 일 실시예에 따르면, 2차원 이미지 및 3차원 포인트 클라우드를 정합하는 변환 행렬로서 T_G·T_F·T_E를 산출하는 단계;를 더 포함할 수 있다.According to one embodiment of the present invention, the method may further include a step of calculating T _G ·T _F ·T _E as a transformation matrix for matching a two-dimensional image and a three-dimensional point cloud.

본 발명의 일 실시예에 따르면, 2차원 이미지는 카메라 센서의 출력이고, 3차원 포인트 클라우드는 라이다 센서의 출력일 수 있다.According to one embodiment of the present invention, the two-dimensional image may be an output of a camera sensor, and the three-dimensional point cloud may be an output of a lidar sensor.

본 발명의 일 실시예에 따르면, 상기 수직 좌표 정렬 단계는 DownBCL 블록을 포함하는 신경망에 의하여 실행될 수 있다.According to one embodiment of the present invention, the vertical coordinate alignment step can be executed by a neural network including a DownBCL block.

본 발명의 일 실시예에 따르면, 상기 수직 좌표 정렬 단계는 상기 지면 법선 벡터의 절대값 및 부호를 각각 예측할 수 있다.According to one embodiment of the present invention, the vertical coordinate alignment step can predict the absolute value and sign of the ground normal vector, respectively.

본 발명의 일 실시예에 따르면, 상기 전방축 정렬 단계에서, 상기 행렬 T_F·T_E에 의하여 변환된 포인트 클라우드를 아래 수식에 따라 레인지 이미지로 변환하고,According to one embodiment of the present invention, in the forward axis alignment step, the point cloud transformed by the matrix T _F ·T _E is transformed into a range image according to the following formula,

본 발명의 일 실시예에 따르면, 상기 전방축 정렬 단계에서, 상기 레인지 이미지와 상기 행렬 T_H에 의하여 변환된 2차원 이미지의 상관도는 아래 수식으로 계산되고,According to one embodiment of the present invention, in the forward axis alignment step, the correlation between the range image and the two-dimensional image transformed by the matrix T _H is calculated by the following formula,

본 발명의 일 실시예에 따르면, 상기 원점 정렬 단계에서, 상기 보정 행렬 T_F·T_E에 의하여 보정된 삼차원 센서의 측정 데이터들을 아래 수식에 의하여 깊이 이미지로 변환하고,According to one embodiment of the present invention, in the origin alignment step, the measurement data of the three-dimensional sensor corrected by the correction matrix T _F ·T _E are converted into a depth image by the following formula,

(여기서, K_init는 초기 보정 매트릭스임) (Here, K _init is the initial correction matrix)

본 발명의 일 실시예에 따르면, 임의의 초기 상태에서 임의의 초기 상태에서 2차원 이미지 및 3차원 포인트 클라우드를 정합할 수 있는 효과가 있다.According to one embodiment of the present invention, there is an effect of being able to align a two-dimensional image and a three-dimensional point cloud in an arbitrary initial state.

또한, 본 발명의 일 실시예에 따르면, 사전 준비 없이, 주행 중에 온라인으로 2차원 이미지 및 3차원 포인트 클라우드를 정합할 수 있는 효과가 있다.In addition, according to one embodiment of the present invention, there is an effect that two-dimensional images and three-dimensional point clouds can be aligned online while driving without prior preparation.

도 1은 본 발명이 구현될 수 있는 장치의 일 실시예로서 자율 주행 자동차(100)를 예시한 것이다.
도 2는 본 발명의 일 실시예에 따른 이차원 이미지 및 삼차원 포인트 클라우드의 정합 방법을 도시한 개념도이다.
도 3은 본 발명의 일 실시예에 따른 2차원 이미지 및 3차원 포인트 클라우드의 정합 장치를 도시한 구성도이다.
도 4는 본 발명의 일 실시예에 따른 가상 정렬 단계(S100)에서 3차원 포인트 클라우드를 가상 기준 좌표계(virtual reference coordinate system)에 정렬하는 과정을 도시한 개념도이다.
도 5는 본 발명의 일 실시예에 따른 가상 정렬 단계(S100)에서 2차원 이미지를 가상 기준 좌표계(virtual reference coordinate system)에 정렬하는 과정을 도시한 개념도이다.
도 6은 본 발명의 일 실시예에 따른 비교 및 매칭 단계(S200)에서 2차원 이미지와 3차원 포인트 클라우드의 전방축(forward axis)을 정렬하는 과정을 도시한 개념도이다.
도 7은 본 발명의 일 실시예에 따른 비교 및 매칭 단계(S200)에서 2차원 이미지와 3차원 포인트 클라우드를 최종적으로 정렬하는 과정을 도시한 개념도이다.
도 8은 본 발명에 따른 2차원 이미지 및 3차원 포인트 클라우드의 정합 결과를 정량적으로 비교하여 도시한 비교도이다.FIG. 1 illustrates an autonomous vehicle (100) as an example of a device in which the present invention can be implemented.
FIG. 2 is a conceptual diagram illustrating a method for matching a two-dimensional image and a three-dimensional point cloud according to one embodiment of the present invention.
FIG. 3 is a diagram illustrating a configuration of a two-dimensional image and three-dimensional point cloud alignment device according to one embodiment of the present invention.
FIG. 4 is a conceptual diagram illustrating a process of aligning a 3D point cloud to a virtual reference coordinate system in a virtual alignment step (S100) according to one embodiment of the present invention.
FIG. 5 is a conceptual diagram illustrating a process of aligning a two-dimensional image to a virtual reference coordinate system in a virtual alignment step (S100) according to one embodiment of the present invention.
FIG. 6 is a conceptual diagram illustrating a process of aligning the forward axis of a two-dimensional image and a three-dimensional point cloud in a comparison and matching step (S200) according to one embodiment of the present invention.
FIG. 7 is a conceptual diagram illustrating a process of finally aligning a two-dimensional image and a three-dimensional point cloud in a comparison and matching step (S200) according to one embodiment of the present invention.
Figure 8 is a comparative diagram quantitatively comparing the alignment results of a two-dimensional image and a three-dimensional point cloud according to the present invention.

이하에서 설명하는 본 발명의 실시형태는 도면에 기초하여 설명된다. The embodiments of the present invention described below are described based on the drawings.

본 명세서 및 특허청구범위에서 사용된 용어나 단어는 일반적이거나 사전적인 의미로 한정하여 해석되어서는 아니된다. 발명자가 그 자신의 발명을 최선의 방법으로 설명하기 위해 용어나 단어의 개념을 정의할 수 있다는 원칙에 따라, 본 발명의 기술적 사상과 부합하는 의미와 개념으로 해석되어야 한다. 또한, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명이 실현되는 하나의 실시예에 불과하고, 본 발명의 기술적 사상을 전부 대변하는 것이 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 및 응용 가능한 예들이 있을 수 있음을 이해하여야 한다.The terms or words used in this specification and the claims should not be interpreted as limited to their general or dictionary meanings. In accordance with the principle that the inventor can define the concept of a term or word in order to best explain his or her invention, they should be interpreted as meanings and concepts that are consistent with the technical idea of the present invention. In addition, the embodiments described in this specification and the configurations illustrated in the drawings are only one embodiment in which the present invention is realized, and do not represent the entire technical idea of the present invention, so it should be understood that there may be various equivalents, modifications, and applicable examples that can replace them at the time of this application.

본 명세서 및 특허청구범위에서 사용된 제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. '및/또는' 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, A, B, etc., used in this specification and claims may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term "and/or" includes any combination of a plurality of related listed items or any item among a plurality of related listed items.

본 명세서 및 특허청구범위에서 사용된 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서 "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this specification and claims is only used to describe specific embodiments and is not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly indicates otherwise. It should be understood that the terms "comprise" or "have" in this application do not exclude in advance the possibility of the presence or addition of features, numbers, steps, operations, components, parts or combinations thereof described in the specification.

본 명세서 및 특허청구범위에서 하나의 구성요소가 다른 구성요소와 "연결"되어 있다고 기재한 경우에는 직접 연결된 경우와 함께, 중간에 다른 구성요소를 통하여 연결된 경우도 포함하는 것으로 이해되어야 하며, "직접 연결" 또는 "바로 연결"되어 있다고 기재한 경우에만 중간에 다른 구성요소가 없이 하나의 구성요소와 다른 구성요소가 연결된 것으로 이해되어야 한다. 마찬가지로 구성요소들 사이의 관계를 설명하는 다른 표현들도 동일한 취지로 이해되어야 한다.When it is described in this specification and claims that one component is "connected" to another component, it should be understood that it includes cases where they are directly connected as well as cases where they are connected through other components in between, and only when it is described as "directly connected" or "directly connected" should it be understood that one component is connected to another component without other components in between. Likewise, other expressions describing the relationship between components should be understood in the same sense.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해서 일반적으로 이해되는 것과 동일한 의미를 가지고 있다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art, and will not be interpreted in an idealized or overly formal sense unless expressly defined in this application.

또한, 본 발명의 각 실시예에 포함된 각 구성, 과정, 공정 또는 방법 등은 기술적으로 상호 간 모순되지 않는 범위 내에서 공유될 수 있다.In addition, each configuration, process, procedure or method included in each embodiment of the present invention may be shared within a scope that is not technically contradictory to each other.

도 1은 본 발명이 구현될 수 있는 장치의 일 실시예로서 자율 주행 자동차(100)를 예시한 것이다.FIG. 1 illustrates an autonomous vehicle (100) as an example of a device in which the present invention can be implemented.

도 1의 자율 주행 자동차(100)은 삼차원 센서로서 라이다(Lidar, Light Detection and Ranging) 센서, 레이더(Radar, Radio Detection and Ranging) 센서, 음파(Sonar, Sound Navigation and Ranging) 센서 등을 포함하고, 이차원 센서로서 카메라 등을 포함할 수 있다. 또한, GPS 센서, 관성 측정 장치(IMU, Inertial Measurement Unit)를 포함할 수 있다.The autonomous vehicle (100) of Fig. 1 may include a three-dimensional sensor, such as a Lidar (Light Detection and Ranging) sensor, a Radar (Radio Detection and Ranging) sensor, and a Sonar (Sound Navigation and Ranging) sensor, and may include a camera, etc., as a two-dimensional sensor. In addition, it may include a GPS sensor and an Inertial Measurement Unit (IMU).

본 발명에서 삼차원 센서란 주변 환경을 탐지하여, 3차원 좌표계로 표현된 3차원 데이터를 도출하는 센서를 의미하고, 이차원 센서란 주변 환경을 탐지하여 2차원 좌표계로 표현된 2차원 데이터를 도출하는 센서를 의미한다. 예를 들어, 이차원 센서는 센싱 결과를 주변 환경에 대한 2차원 이미지로 저장할 수 있다.In the present invention, a three-dimensional sensor means a sensor that detects the surrounding environment and derives three-dimensional data expressed in a three-dimensional coordinate system, and a two-dimensional sensor means a sensor that detects the surrounding environment and derives two-dimensional data expressed in a two-dimensional coordinate system. For example, a two-dimensional sensor can store sensing results as a two-dimensional image of the surrounding environment.

삼차원 센서인 라이다, 레이더 및/또는 음파 센서는 차량의 상부에 회전형(110)으로 설치될 수도 있고, 차량의 각 면에 각각 설치되어 고정된 방향을 센싱하는 고정형(111, 112, 113, 114, 115)으로 부착될 수도 있다. 또한, 이차원 센서인 카메라 센서(120)는 자동차의 전방 및/또는 후방에 설치되거나, 차량의 각 면에 각각 설치될 수 있다.The three-dimensional sensors, such as lidar, radar and/or sonic sensors, may be installed in a rotating manner (110) on the top of the vehicle, or may be installed in a fixed manner (111, 112, 113, 114, 115) on each side of the vehicle to sense a fixed direction. In addition, the two-dimensional sensor, the camera sensor (120), may be installed in the front and/or rear of the vehicle, or may be installed on each side of the vehicle.

도 2는 본 발명의 일 실시예에 따른 이차원 이미지 및 삼차원 포인트 클라우드의 정합 방법을 도시한 개념도이다.FIG. 2 is a conceptual diagram illustrating a method for matching a two-dimensional image and a three-dimensional point cloud according to one embodiment of the present invention.

본 발명의 일 실시예에 따른 2차원 이미지 및 3차원 포인트 클라우드의 정합은 2차원 이미지 및 3차원 포인트 클라우드을 입력 받아, 변환 행렬(transformation matrix) T를 구하는 과정으로 정의될 수 있다. 여기서, H 및 W는 각각 2차원 이미지의 가로 및 세로의 화소수이고, N는 3차원 포인트 클라우드에 포함되는 포인트의 개수이다.Alignment of a two-dimensional image and a three-dimensional point cloud according to one embodiment of the present invention is performed by and 3D point cloud It can be defined as a process of obtaining a transformation matrix T by taking as input, where H and W are the number of horizontal and vertical pixels of the 2D image, respectively, and N is the number of points included in the 3D point cloud.

본 발명의 일 실시예에 따른 2차원 이미지 및 3차원 포인트 클라우드의 정합 방법은 가상 정렬 단계(virtual alignment phase, S100) 및 비교 및 매칭 단계(compare-and-match phase, S200)를 포함할 수 있다. 첫번째 단계인 가상 정렬 단계(S100)은 정렬(registration)의 대상이 되는 두 개의 데이터들을 각각 가상 기준 좌표 상에서 정렬하는 과정이다. 즉, 이차원 이미지 데이터 및 삼차원 포인트 클라우드 데이터를 각각 가상 기준 좌표에서 정렬하는 과정이다. 두번째 단계인 비교 및 매칭 단계(S200)은 가상 기준 좌표계 상에서 두 데이터를 비교하고 상대적인 위치 및 자세를 추정하는 과정이다.A method for aligning a two-dimensional image and a three-dimensional point cloud according to one embodiment of the present invention may include a virtual alignment phase (S100) and a compare-and-match phase (S200). The first step, the virtual alignment phase (S100), is a process of aligning two data to be registered on virtual reference coordinates, respectively. That is, it is a process of aligning two-dimensional image data and three-dimensional point cloud data on virtual reference coordinates, respectively. The second step, the compare-and-match phase (S200), is a process of comparing two data on a virtual reference coordinate system and estimating a relative position and pose.

본 발명의 일 실시예에 따르면, 가상 정렬 단계(S100)은 이차원 이미지 데이터와 삼차원 포인트 클라우드 데이터들 각각의 고유한 특성을 활용하여 두 데이터를 직접 비교하지 않고도 두 데이터 사이의 상대적인 위치 및 자세(pose)의 오차를 줄일 수 있다.According to one embodiment of the present invention, the virtual alignment step (S100) can reduce errors in relative positions and poses between two-dimensional image data and three-dimensional point cloud data without directly comparing the two data by utilizing the unique characteristics of each of the two-dimensional image data and the three-dimensional point cloud data.

본 발명의 일 실시예에 따르면, 가상 정렬 단계(S100)은 수직 좌표 정렬 단계(S110) 및 수평 좌표 정렬 단계(S120)을 포함할 수 있다. 먼저, 3차원 포인트 클라우드에 대한 수직 좌표 정렬 단계(S110)에서, 포인트 클라우드에서 지면 법선 벡터(ground normal vector) 를 추정하고, 이것을 가상 기준 좌표계의 수직 방향인 표준 기저 벡터 e3 = [0,0,1]에 정렬할 수 있다. 여기서, 포인트 클라우드의 지면 법선 벡터를 e3 벡터에 정렬하는 변환 행렬을 T_E라고 하면, 포인트 클라우드의 지면 법선에 대한 정렬은 로 표현할 수 있다.According to one embodiment of the present invention, the virtual alignment step (S100) may include a vertical coordinate alignment step (S110) and a horizontal coordinate alignment step (S120). First, in the vertical coordinate alignment step (S110) for the 3D point cloud, a ground normal vector in the point cloud , and can be aligned to the standard basis vector e3 = [0,0,1], which is the vertical direction of the virtual reference coordinate system. Here, if the transformation matrix that aligns the ground normal vector of the point cloud to the e3 vector is T _E , the alignment of the point cloud to the ground normal is can be expressed as

또한, 2차원 이미지에 대한 수평 좌표 정렬 단계(S120)에서, 이미지에서 지평선을 추정하고, 이것을 가상 기준 좌표계의 수평 방향인 표준 기저 벡터(standard basis vector) e2=[0,1,0]에 정렬한다. 여기서, 이미지의 지평선을 e2 벡터에 정렬하는 변환 행렬을 T_H라고 하면, 이미지의 지평선 정렬은 로 표현할 수 있다.In addition, in the horizontal coordinate alignment step (S120) for the two-dimensional image, the horizon is estimated from the image and aligned to the standard basis vector e2=[0,1,0], which is the horizontal direction of the virtual reference coordinate system. Here, if the transformation matrix that aligns the horizon of the image to the e2 vector is T _H , the horizon alignment of the image is can be expressed as

본 발명의 일 실시예에 따르면, 비교 및 매치 단계(S200)는 가상 정렬 단계(S200)를 통하여 각각 정렬된 2차원 이미지 데이터와 3차원 포인트 클라우드 데이터의 상대적인 위치 및 자세를 최종적으로 정렬한다. 두 데이터는 가상 정렬 단계(S100)를 통하여 가상 기준 좌표계에 각각 정렬되어 있기 때문에, 두 센서의 관측 시야(field of view, FOV)가 겹친다고 가정할 수 있다. 즉, 두 데이터는 동일한 부분을 관측한 부분이 존재하고, 비교 및 매치 단계(S200)에서는 이러한 부분을 비교하여 정렬 작업을 완료한다.According to one embodiment of the present invention, the compare and match step (S200) finally aligns the relative positions and poses of the two-dimensional image data and the three-dimensional point cloud data, which are respectively aligned through the virtual alignment step (S200). Since the two data are respectively aligned to the virtual reference coordinate system through the virtual alignment step (S100), it can be assumed that the fields of view (FOV) of the two sensors overlap. That is, the two data have portions in which the same portion is observed, and the compare and match step (S200) compares these portions to complete the alignment task.

본 발명의 일 실시예에 따르면, 비교 및 매치 단계(S200)는 전방축 정렬 단계(S210) 및 원점 정렬 단계(S220)을 포함할 수 있다.According to one embodiment of the present invention, the comparison and matching step (S200) may include a front axis alignment step (S210) and an origin alignment step (S220).

본 발명의 일 실시예에 따르면, 전방축 정렬 단계(S210)에서 3차원 포인트 클라우드를 레인지 맵(range map)으로 변환하고, 레인지 맵과 2차원 이미지의 특징(feature)를 비교하여 기준 좌표계의 전방축(forward axis)과 정렬할 수 있다. 여기서, 레인지 맵은 3차원 포인트 클라우드를 2차원의 360도 파노라마 이미지로 변환한 것으로, 2차원 이미지의 특징과 레인지 맵의 특징이 연관도(correlation)가 가장 높은 위치에서 전방축이 위치하도록 3차원 포인트 클라우드를 회전 변환할 수 있다. 포인트 클라우드의 전방축을 이미지의 전방축과 일치하도록 회전하는 변환 행렬을 T_F라고 하면, 포인트 클라우드의 이미지의 전방축에 대한 정렬은 로 표현할 수 있다.According to one embodiment of the present invention, in the forward axis alignment step (S210), a 3D point cloud is converted into a range map, and the range map can be aligned with the forward axis of the reference coordinate system by comparing the features of the range map and the 2D image. Here, the range map is a 3D point cloud converted into a 2D 360-degree panoramic image, and the 3D point cloud can be rotated and transformed so that the forward axis is located at the position where the features of the 2D image and the features of the range map have the highest correlation. If a transformation matrix that rotates the forward axis of the point cloud to match the forward axis of the image is T _F , the alignment of the point cloud with the forward axis of the image is can be expressed as

본 발명의 일 실시예에 따른 원점 정렬 단계(S220)는, 2차원 이미지의 전방축과 정렬된 3차원 포인트 클라우드를 깊이 이미지(depth image)로 변환하고, 깊이 이미지과 2차원 이미지를 비교하여 포인트 클라우드의 원점 위치를 이동하는 과정을 포함할 수 있다. 여기서 깊이 이미지는 3차원 포인트 클라우드를 2차원 이미지로 변환하면서 각 화소의 RGB 값와 더불어 거리를 포함하는 이미지로서, 로 표현될 수 있다. 포인트 클라우드의 원점 위치를 이동시키는 변환 행렬을 T_G라고 하면, 포인트 클라우드의 이동에 의한 정렬은 로 표현할 수 있다.The origin alignment step (S220) according to one embodiment of the present invention may include a process of converting a 3D point cloud aligned with the front axis of a 2D image into a depth image, and moving the origin position of the point cloud by comparing the depth image with the 2D image. Here, the depth image is an image that includes the distance along with the RGB value of each pixel while converting a 3D point cloud into a 2D image, It can be expressed as T G . If the transformation matrix that moves the origin position of the point cloud is T _G , the alignment by moving the point cloud is can be expressed as

본 발명의 일 실시예에 따르면, 2차원 이미지와 3차원 포인트 클라우드는 가상 정렬 단계(virtual alignment phase, S100) 및 비교 및 매칭 단계(compare-and-match phase, S200)를 통하여 정렬될 수 있으며, 결과적으로 이 된다. 여기서, 2차원 이미지 및 3차원 포인트 클라우드의 정렬을 나타내는 변환 행렬 T는 T_G·T_F·T_E로 표현될 수 있다.According to one embodiment of the present invention, a two-dimensional image and a three-dimensional point cloud can be aligned through a virtual alignment phase (S100) and a compare-and-match phase (S200), and as a result, Here, the transformation matrix T representing the alignment of the 2D image and the 3D point cloud can be expressed as T _G ·T _F ·T _E.

도 3은 본 발명의 일 실시예에 따른 2차원 이미지 및 3차원 포인트 클라우드의 정합 장치를 도시한 구성도이다.FIG. 3 is a diagram illustrating a configuration of a two-dimensional image and three-dimensional point cloud alignment device according to one embodiment of the present invention.

도 3을 참조하면, 이미지 및 포인트 클라우드 정합 장치는 2차원 이미지인 및 3차원 포인트 클라우드인 를 입력받고, 2차원 이미지 및 3차원 포인트 클라우드의 정합을 위한 변환 행렬 를 출력할 수 있다.Referring to Figure 3, the image and point cloud alignment device is a two-dimensional image and 3D point cloud Input a transformation matrix for alignment of a 2D image and a 3D point cloud. can output.

본 발명의 일 실시예에 따르면, 이미지 및 포인트 클라우드 정합 장치는 수평 좌표 정렬부(200), 수직 좌표 정렬부(300), 전방축 정렬부(400) 및 원점 정렬부(400)를 포함할 수 있다.According to one embodiment of the present invention, the image and point cloud alignment device may include a horizontal coordinate alignment unit (200), a vertical coordinate alignment unit (300), a front axis alignment unit (400), and an origin alignment unit (400).

본 발명의 일 실시예에 따른, 수평 좌표 정렬부(200)는 심층 신경망인 지평선 네트워크(horizon network)를 포함할 수 있다. 수평 좌표 정렬부(200)는 지평선 네트워크에 기초하여, 2차원 이미지 에서 지평선 벡터(horizontal vector)인 을 추정할 수 있다. 여기서, 지평선 벡터는 2차원 이미지에서 지평선과 평행한 벡터로 정의될 수 있다. 예를 들어, 지평선의 우측 끝 픽셀이 이고, 좌측 끝 픽셀이 이면, 지평선 벡터 는 으로 정의될 수 있다. 지평선 네트워크의 출력인 변환 행렬 T_H는 지평선 벡터 를 e2=[0,1,0]에 정렬할 수 있다.According to one embodiment of the present invention, the horizontal coordinate alignment unit (200) may include a horizon network, which is a deep neural network. The horizontal coordinate alignment unit (200) may be configured to generate a two-dimensional image based on the horizon network. In the horizon vector can be estimated. Here, the horizon vector can be defined as a vector parallel to the horizon in a two-dimensional image. For example, the right end pixel of the horizon and the leftmost pixel is Back, horizon vector Is can be defined as . The transformation matrix T _H , which is the output of the horizon network, is the horizon vector can be aligned to e2=[0,1,0].

본 발명의 일 실시예에 따른 수직 좌표 정렬부(300)은 심층 신경망인 E3 네트워크(E3 network)를 포함할 수 있다. 수직 좌표 정렬부(300)는 E3 네트워크에 기초하여, 입력된 3차원 포인트 클라우드 로부터 지면 법선 벡터인 을 추정하고, 이를 가상 기준 좌표계의 표준 기저 벡터인 e3 = [0,0,1]과 일치하도록 회전하는 변환 행렬 T_E를 산출할 수 있다.The vertical coordinate alignment unit (300) according to one embodiment of the present invention may include an E3 network, which is a deep neural network. The vertical coordinate alignment unit (300) may, based on the E3 network, The ground normal vector from We can estimate and derive a transformation matrix T _E that rotates it to match the standard basis vector e3 = [0,0,1] of the virtual reference coordinate system.

본 발명의 일 실시예에 따른 전방축 정렬부(400)은 심층 신경망인 전방축 네트워크(forward-axis network)를 포함할 수 있다. 전방축 정렬부(400)는 전방축 네트워크에 기초하여, 2차원 이미지 및 3차원 포인트 클라우드 데이터들의 전방축(forward axis)을 정렬할 수 있다. 예를 들어, 이미지 평면으로 향하는 벡터를 이미지 좌표계의 전방축인 e₁=[1,0,0]이라고 하고, 3차원 포인트 클라우드의 전방축 방향을 표시하는 벡터 를 추정하여, 를 e₁에 정렬하는 변환 행렬 T_F를 구할 수 있다. 즉, T_F를 이용하여 포인트 클라우드의 전방축 을 이미지의 전방축 e₁에 정렬할 수 있다.The forward-axis alignment unit (400) according to one embodiment of the present invention may include a forward-axis network, which is a deep neural network. The forward-axis alignment unit (400) may align the forward axes of two-dimensional images and three-dimensional point cloud data based on the forward-axis network. For example, a vector directed to an image plane is called e ₁ = [1,0,0], which is the forward axis of the image coordinate system, and a vector indicating the forward-axis direction of a three-dimensional point cloud is called By estimating, We can obtain the transformation matrix T _F that aligns e 1 to e ₁ . That is, we can obtain the forward axis of the point cloud using T _F . can be aligned to the front axis e ₁ of the image.

본 발명의 일 실시예에 따른 원점 정렬부(500)은 심층 신경망인 수집 네트워크(gather network)를 포함할 수 있다. 원점 정렬부(500)는 수집 네트워크에 기초하여, 수집 네트워크의 두 입력인 (i) 이전 단계에서 지평선에 정렬된 이미지인 및 (ii) 이전 단계에서 전방축에 정렬된 포인트 클라우드인 로부터 변환된 깊이 이미지(depth image)인 으로부터, 이미지 좌표계와 매칭하기 위하여 포인트 클라우드 좌표계의 원점 (0,0,0)을 이동시킬 위치를 지시하는 변환 벡터 를 추정하고, 포인트 클라우드 좌표계의 원점 (0,0,0)을 변환 벡터 로 이동시키는 변환 행렬 T_G를 생성할 수 있다.The origin alignment unit (500) according to one embodiment of the present invention may include a gather network, which is a deep neural network. The origin alignment unit (500) may include two inputs of the gather network, (i) an image aligned to the horizon in the previous step, based on the gather network. and (ii) a point cloud aligned to the forward axis in the previous step. The depth image converted from , a transformation vector indicating where to move the origin (0,0,0) of the point cloud coordinate system to match the image coordinate system. Estimate and transform the origin (0,0,0) of the point cloud coordinate system into a vector We can generate a transformation matrix T _G that moves to .

[가상 정렬 단계 - 수직 좌표 정렬 단계][Virtual alignment step - vertical coordinate alignment step]

도 3은 본 발명의 일 실시예에 따른 가상 정렬 단계(S100)에서 3차원 포인트 클라우드를 가상 기준 좌표계(virtual reference coordinate system)에 정렬하는 과정을 도시한 개념도이다.FIG. 3 is a conceptual diagram illustrating a process of aligning a 3D point cloud to a virtual reference coordinate system in a virtual alignment step (S100) according to one embodiment of the present invention.

본 발명의 일 실시예에 따르면, 신경망(Neural Network)인 E3 네트워크(E3 Network)를 통하여, 3차원 포인트 클라우드 에서 지면 법선 벡터인 을 추정하고, 이를 가상 기준 좌표계의 표준 기저 벡터인 e3 = [0,0,1]과 일치하도록 회전하는 변환 행렬 T_E를 산출할 수 있다. 결국, 변환 행렬 T_E에 의하면, 3차원 포인트 클라우드의 지면 법선 벡터를 가상 기준 좌표계의 수직 방향(e3 벡터 방향)에 정렬할 수 있다. 도 3을 참조하면, 포인트 클라우드 에 변환 행렬 T_E를 적용하여 지면 법선 벡터 이 e3에 정합된 포인트 클라우드 를 얻을 수 있다.According to one embodiment of the present invention, a 3D point cloud is generated through an E3 Network, which is a neural network. The ground normal vector in , and can derive a transformation matrix T _E that rotates it to match the standard basis vector e3 = [0,0,1] of the virtual reference coordinate system. Finally, according to the transformation matrix T _E , the ground normal vector of the 3D point cloud can be aligned to the vertical direction (e3 vector direction) of the virtual reference coordinate system. Referring to Fig. 3, the point cloud Applying the transformation matrix T _E to the ground normal vector Point cloud aligned to this e3 You can get it.

본 발명의 일 실시예에 따르면, E3 네트워크는 3차원 포인트 클라우드로부터 특징을 추출하는 DownBCL 블록을 포함할 수 있다. E3 네트워크의 DownBCL 블록은 넓은 영역에 분산되어 있는 3차원 포인트 클라우드의 정보를 학습하여, 포인트 클라우드의 특징을 추출할 수 있다. 예를 들어, DownBCL 블록은 선행문헌인 [X. Gu, Y. Wang, C. Wu, Y. J. Lee, and P. Wang, "Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3254-3263]에 개시된 DownBCL 블록을 사용할 수 있으며, 상기 선행문헌의 모든 개시는 본원에 참조로서 포함된다. 다만, 본 발명의 DownBCL 블록의 구성은 상기 선행문헌에 개시된 구성에 한정되지 않는다.According to one embodiment of the present invention, the E3 network may include a DownBCL block that extracts features from a 3D point cloud. The DownBCL block of the E3 network may learn information of a 3D point cloud distributed over a wide area, and extract features of the point cloud. For example, the DownBCL block may use the DownBCL block disclosed in the prior art document [X. Gu, Y. Wang, C. Wu, Y. J. Lee, and P. Wang, "Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3254-3263], and all disclosures of the prior art document are incorporated herein by reference. However, the configuration of the DownBCL block of the present invention is not limited to the configuration disclosed in the prior art document.

본 발명의 일 실시예에 따르면, E3 네트워크의 DownBCL 블록에서 추출된 3차원 포인트 클라우드의 특징을 이용하여 회전 벡터 P_r을 추정할 수 있다. 예를 들어, 회전 벡터 P_r의 추정은 선행문헌인 [S. Liao, E. Gavves, and C. G. Snoek, "Spherical regression: Learning viewpoints, surface normals and 3d rotations on n-spheres," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9759-9767.]에 개시된 구형 회귀 프레임워크(spherical regression framework)를 이용하여 추정될 있으며, 상기 선행문헌의 모든 개시는 본원에 참조로서 포함된다. 다만, 본 발명의 회전 벡터 P_r의 추정 과정은 상기 선행문헌에 개시된 구성에 한정되지 않는다.According to one embodiment of the present invention, the rotation vector P _r can be estimated by utilizing the features of the 3D point cloud extracted from the DownBCL block of the E3 network. For example, the estimation of the rotation vector P _r can be estimated by utilizing the spherical regression framework disclosed in the prior art document [S. Liao, E. Gavves, and CG Snoek, "Spherical regression: Learning viewpoints, surface normals and 3D rotations on n-spheres," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9759-9767.], and all disclosures of the prior art document are incorporated herein by reference. However, the estimation process of the rotation vector P _r of the present invention is not limited to the configuration disclosed in the prior art document.

본 발명의 일 실시예에 따르면, 구형 회귀 프레임워크에 기초하여 회전을 추정하는 신경망인 회전 헤드(rotation head)는 회전 벡터 P_r의 절대값 과 부호를 예측할 수 있고, 여기서 회전 벡터 P_r의 +/- 부호들은 원-핫 벡터(one-hot vector)로서 인코딩되므로, 을 가진다.According to one embodiment of the present invention, a rotation head, which is a neural network for estimating rotation based on a spherical regression framework, is a rotation head that estimates the absolute value of a rotation vector P _r and sign can be predicted, where the +/- signs of the rotation vector P _r are encoded as a one-hot vector, has

본 발명의 일 실시예에 따른 E3 네트워트에서, 그 서브 네트워크인 회전 헤드는 포인트 클라우드의 지면 법선 벡터인 을 추정할 수 있다. 여기서, 이고, 이다. 이다.In the E3 network according to one embodiment of the present invention, the rotation head, which is a sub-network, is a ground normal vector of a point cloud. can be estimated. Here, And, and this is. am.

본 발명의 일 실시예에 따르면, E3 네트워크의 서브 네트워크인 회전 헤드의 손실 함수는 아래 수식 (1)과 같이 절대값과 부호에 관한 두 개의 부분으로 구성될 수 있다.According to one embodiment of the present invention, the loss function of the rotation head, which is a sub-network of the E3 network, can be composed of two parts related to the absolute value and the sign, as in the following equation (1).

(1) (1)

여기서, P는 예측을 나타내고, Y는 실제값(ground truth)을 나타낸다. 또한, 수식 (1)에서 절대값 파트의 손실은 코사인 근접 손실(cosine proximity loss) 함수를 사용하고, 부호 파트의 손실은 크로스 엔트로피 손실(cross entropy loss) 함수를 사용할 수 있다. 크로스 엔트로피 손실 함수는 상기 수식에서 CE로 표시되었다.Here, P represents the prediction, and Y represents the ground truth. In addition, the loss of the absolute value part in Equation (1) can use the cosine proximity loss function, and the loss of the sign part can use the cross entropy loss function. The cross entropy loss function is denoted as CE in the above equation.

[가상 정렬 단계 - 수평 좌표 정렬 단계][Virtual alignment step - Horizontal coordinate alignment step]

도 4은 본 발명의 일 실시예에 따른 가상 정렬 단계(S100)에서 2차원 이미지를 가상 기준 좌표계(virtual reference coordinate system)에 정렬하는 과정을 도시한 개념도이다.FIG. 4 is a conceptual diagram illustrating a process of aligning a two-dimensional image to a virtual reference coordinate system in a virtual alignment step (S100) according to one embodiment of the present invention.

본 발명의 일 실시예에 따르면, 신경망(Neural Network)인 지평선 네트워크(Horizon Network)를 통하여, 2차원 이미지 에서 지평선 벡터(horizontal vector)인 을 추정할 수 있다. 여기서, 지평선 벡터는 2차원 이미지에서 지평선과 평행한 벡터로 정의될 수 있다. 예를 들어, 지평선의 우측 끝 픽셀이 이고, 좌측 끝 픽셀이 이면, 지평선 벡터 는 으로 정의될 수 있다. 지평선 네트워크의 출력인 변환 행렬 T_H는 지평선 벡터 를 e2=[0,1,0]에 정렬할 수 있다.According to one embodiment of the present invention, a two-dimensional image is generated through a horizon network, which is a neural network. In the horizon vector can be estimated. Here, the horizon vector can be defined as a vector parallel to the horizon in a two-dimensional image. For example, the right end pixel of the horizon and the leftmost pixel is Back, horizon vector Is can be defined as . The transformation matrix T _H , which is the output of the horizon network, is the horizon vector can be aligned to e2=[0,1,0].

본 발명의 일 실시예에 따르면, 지평선 네트워크는 2차원 이미지로부터 특징을 추출하는 VGG 네트워크를 포함할 수 있다. 예를 들어, VGG 네트워크는 선행문헌인 [K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.]에 개시된 것을 사용할 수 있으며, 상기 선행문헌의 모든 개시는 본원에 참조로서 포함된다. 다만, 본 발명의 VGG 네트워크의 구성은 상기 선행문헌에 개시된 구성에 한정되지 않는다. According to one embodiment of the present invention, the horizon network may include a VGG network that extracts features from a two-dimensional image. For example, the VGG network may use the one disclosed in the prior art document [K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.], and all disclosures of the prior art document are incorporated herein by reference. However, the configuration of the VGG network of the present invention is not limited to the configuration disclosed in the prior art document.

본 발명의 일 실시예에 따르면, 지평선 네트워크의 VGG 네트워크에서 추출된 2차원 이미지의 특징을 이용하여 지평선 벡터 을 추정할 수 있다. 예를 들어, 지평선 벡터 의 추정은 선행문헌인 [S. Liao, E. Gavves, and C. G. Snoek, "Spherical regression: Learning viewpoints, surface normals and 3d rotations on n-spheres," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9759-9767.]에 개시된 구형 회귀 프레임워크(spherical regression framework)를 이용하여 추정될 있으며, 상기 선행문헌의 모든 개시는 본원에 참조로서 포함된다. 다만, 본 발명의 회전 벡터 의 추정 과정은 상기 선행문헌에 개시된 구성에 한정되지 않는다. 여기서, 지평선 벡터 이고, 이다.According to one embodiment of the present invention, a horizon vector is generated by using features of a two-dimensional image extracted from a VGG network of a horizon network. can be estimated. For example, the horizon vector The estimation of can be estimated using the spherical regression framework disclosed in the prior literature [S. Liao, E. Gavves, and CG Snoek, "Spherical regression: Learning viewpoints, surface normals and 3d rotations on n-spheres," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9759-9767.], all disclosures of which are incorporated herein by reference. However, the rotation vector of the present invention The estimation process is not limited to the configuration disclosed in the above-mentioned prior literature. Here, the horizon vector And, am.

[비교 및 매칭 단계 - 전방축 정렬 단계][Compare and Match Step - Forward Axle Alignment Step]

도 5는 본 발명의 일 실시예에 따른 비교 및 매칭 단계(S200)에서 2차원 이미지와 3차원 포인트 클라우드의 전방축(forward axis)을 정렬하는 과정을 도시한 개념도이다.FIG. 5 is a conceptual diagram illustrating a process of aligning the forward axis of a two-dimensional image and a three-dimensional point cloud in a comparison and matching step (S200) according to one embodiment of the present invention.

본 발명의 일 실시예에 따르면, 비교 및 매칭 단계(S200)에서 신경망(Neural Network)인 전방축 네트워크(Forward-axis Network)를 통하여, 2차원 이미지 및 3차원 포인트 클라우드 데이터들의 전방축(forward axis)을 정렬할 수 있다. 예를 들어, 이미지 평면으로 향하는 벡터를 이미지 좌표계의 전방축인 e₁=[1,0,0]이라고 하고, 3차원 포인트 클라우드의 전방축 방향을 표시하는 벡터 를 추정하여, 를 e₁에 정렬하는 변환 행렬 T_F를 구할 수 있다, 즉, T_F를 이용하여 포인트 클라우드의 전방축 을 이미지의 전방축 e₁에 정렬할 수 있다.According to one embodiment of the present invention, in the comparison and matching step (S200), the forward axis of the two-dimensional image and the three-dimensional point cloud data can be aligned through the forward-axis network, which is a neural network. For example, a vector directed to the image plane is called e ₁ = [1,0,0], which is the forward axis of the image coordinate system, and a vector indicating the forward axis direction of the three-dimensional point cloud By estimating, We can obtain the transformation matrix T _F that aligns e ₁ to the forward axis of the point cloud using T _F . can be aligned to the front axis e ₁ of the image.

본 발명의 일 실시예에 따르면, 전방축 네트워크의 두 개의 입력은 (i) 이전 단계에서 지평선에 정렬된 이미지인 및 (ii) 지면 법선 벡터에 정렬된 포인트 클라우드인 로부터 변환된 레인지 맵인 이다. 여기서, 레인지 맵은 특정 위치에서 특정 포인트까지의 거리를 2D 이미지로 표현한 것이고, 아래 수식 (2)에 의하여 표현될 수 있다. According to one embodiment of the present invention, two inputs of the front-axis network are (i) an image aligned to the horizon in the previous step; and (ii) a point cloud aligned to the ground normal vector. A range map converted from Here, the range map is a 2D image that represents the distance from a specific location to a specific point, and can be expressed by the following equation (2).

(2) (2)

여기서, 및 는 각각 포인트 클라우드의 수직 시야(vertical field-of-view)의 상한 및 하한에 해당하는 각도이고, H 및 W는 각각 레인지 맵의 높이 및 넓이를 의미한다. 또한, λ는 이미지와 포인트 클라우드의 수평 시야(horizontal field-of-view)의 비율을 나타내고, 아래 수식 (3)과 같이 표현될 수 있다.Here, and are the angles corresponding to the upper and lower limits of the vertical field-of-view of the point cloud, respectively, and H and W represent the height and width of the range map, respectively. In addition, λ represents the ratio of the horizontal field-of-view of the image and the point cloud, and can be expressed as in the following equation (3).

(3) (3)

여기서, 및 는 각각 포인트 클라우드의 수평 시야(vertical field-of-view)의 상한 및 하한에 해당하고, 및

는 각각 이미지의 수평 시야(vertical field-of-view)의 상한 및 하한에 해당한다. 포인트 클라우드의 수평 시야는 360도(=2π)로 이미지의 수평 시야보다 넓으므로, λ≥1이다.Here, and correspond to the upper and lower limits of the vertical field-of-view of the point cloud, respectively. and

correspond to the upper and lower limits of the vertical field-of-view of the image, respectively. The horizontal field-of-view of the point cloud is 360 degrees (=2π), which is wider than the horizontal field-of-view of the image, so λ≥1.

본 발명의 일 실시예에 따르면, 이미지 및 레인지 맵

은 각각 독립적인 합성곱 신경망(Convolutional Neural Network, 320)인 CNN_I 및 CNN_R에 의하여 처리될 수 있다. 본 발명의 일 실시예에 따르면, 본 발명에 사용되는 CNN들은 선행문헌인 [K. Simonyan, and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.]에 개시된 VGG 네트워크를 사용할 수 있고, 상기 선행문헌의 모든 개시는 본원에 참조로서 포함된다. 다만, 본 발명의 상기 CNN 구성은 상기 선행문헌에 개시된 구성에 한정되지 않는다. According to one embodiment of the present invention, the image and range map

can be processed by CNN _I and CNN _R , which are independent convolutional neural networks (Convolutional Neural Networks, 320), respectively. According to one embodiment of the present invention, the CNNs used in the present invention can use the VGG network disclosed in the prior art document [K. Simonyan, and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.], and all disclosures of the prior art document are incorporated herein by reference. However, the CNN configuration of the present invention is not limited to the configuration disclosed in the prior art document.

본 발명의 일 실시예에 따르면, 상기 선행문헌에 개시된 VGG 네트워크들을 사용하고, 그 특징들을 재형성하기 위하여 VGG 네트워크 앞뒤로 간단한 합성곱 계층(convolutional layer)를 추가할 수 있다.According to one embodiment of the present invention, the VGG networks disclosed in the above prior art documents can be used, and simple convolutional layers can be added before and after the VGG networks to reshape their features.

본 발명의 일 실시예에 따르면, CNN_I 및 CNN_R은 각각 이미지 및 레인지 맵에 대한 특성 맵(feature map, 및 )을 생성할 수 있다. 여기서, 이고, 이다.According to one embodiment of the present invention, CNN _I and CNN _R each provide feature maps for an image and a range map, respectively. and ) can be generated. Here, And, am.

본 발명의 일 실시예에 따르면, 두 개의 특성 맵인 및 는 이미지 및 레이지 맵의 연관도(correlation)을 결정하는데 이용될 수 있다. 본 발명의 일 실시예에 따르면, 연관도 점수(correlation score) 맵 은 이미지의 폭을 따라서, 아래 수식 (4)와 같이 계산될 수 있다.According to one embodiment of the present invention, two feature maps and can be used to determine the correlation of images and rage maps. According to one embodiment of the present invention, a correlation score map The width of the image can be calculated as shown in the following equation (4).

(4) (4)

여기서, %는 모듈로 연산을 표시한다.Here, % indicates modulo operation.

본 발명의 일 실시예에 따르면, 연관도 점수(correlation score) 맵 으로부터 연관도가 가장 높아지는 w 값을 산출하고, 아래 수식 (5)와 같이, w 값에 해당하는 포인트 클라우드의 회전 각도 및 포인트 클라우드의 전방축에 해당하는 벡터 를 산출한다.According to one embodiment of the present invention, a correlation score map The w value with the highest correlation is calculated, and the rotation angle of the point cloud corresponding to the w value is calculated as in Equation (5) below. and the vector corresponding to the forward axis of the point cloud It produces .

(5) (5)

수식 (5)에서 는 포인트 클라우드에서 이미지의 전방축 방향을 나타낸다. 본 발명의 일 실시예에 따르면, 예측된 포인트 클라우드의 전방축 를 기준 가상 좌표계의 x축 단위 벡터인 e₁=[1,0,0]과 일치하도록 회전시키는 변환 행렬 T_F를 계산할 수 있다. In formula (5) represents the forward axis direction of the image in the point cloud. According to one embodiment of the present invention, the forward axis of the predicted point cloud We can compute the transformation matrix T _F that rotates the virtual coordinate system so that the x-axis unit vector e ₁ = [1,0,0] matches.

본 발명의 일 실시예에 따르면, 학습을 위한 실제 값(ground truth)인 은 실제 y_rad 값에 해당하는 픽셀 및 그 주변의 n개의 픽셀은 1로 설정되고, 다른 모든 픽셀은 0으로 설정될 수 있다. 본 발명의 일 실시예에 따르면, 이진 크로스 앤트로피 손실(binary cross entropy loss) 함수 및 hard-negative mining 기법이 신경망의 학습을 위하여 사용될 수 있다.According to one embodiment of the present invention, the ground truth for learning is A pixel corresponding to an actual y _rad value and n pixels surrounding it may be set to 1, and all other pixels may be set to 0. According to one embodiment of the present invention, a binary cross entropy loss function and a hard-negative mining technique may be used to learn a neural network.

[비교 및 매칭 단계 - 원점 정렬 단계][Compare and Match Step - Origin Alignment Step]

도 6은 본 발명의 일 실시예에 따른 비교 및 매칭 단계(S200)에서 2차원 이미지와 3차원 포인트 클라우드를 최종적으로 정렬하는 과정을 도시한 개념도이다.FIG. 6 is a conceptual diagram illustrating a process of finally aligning a two-dimensional image and a three-dimensional point cloud in a comparison and matching step (S200) according to one embodiment of the present invention.

본 발명의 일 실시예에 따르면, 비교 및 매칭 단계(S200)에서 신경망(Neural Network)인 수집 네트워크(Gather Network)를 통하여, 이전 과정에서 얻어진 정렬 결과들을 수집하고, 이미지와 포인트 클라우드의 변위(displacement)를 추정할 수 있다. 바람직하게는, 수집 네트워크의 두 입력은 (i) 이전 단계에서 지평선에 정렬된 이미지인 및 (ii) 이전 단계에서 전방축에 정렬된 포인트 클라우드인 로부터 변환된 깊이 이미지(depth image)인 이다. 여기서, 깊이 이미지 는 아래 수식 (6)에 의하여 계산될 수 있다.According to one embodiment of the present invention, in the comparison and matching step (S200), the alignment results obtained in the previous process can be collected through a gathering network, which is a neural network, and the displacement of the image and the point cloud can be estimated. Preferably, the two inputs of the gathering network are (i) an image aligned to the horizon in the previous step, and (ii) a point cloud aligned to the forward axis in the previous step. The depth image converted from Here, the depth image can be calculated by the following formula (6).

(6) (6)

여기서, K_init는 초기 보정 매트릭스(initial calibration matrix)이고, (u, v, w)는 라이다 좌표계로 표현된 3차원 포인트 클라우드의 점 (x, y, z)를 카메라 좌표로 전환하여 얻은 좌표값이고, 깊이 맵 는 깊이 맵의 좌표 인 픽셀에 대하여, 이에 대응하는 라이다 좌표계의 (x, y, z) 값 와 깊이 값(w)을 포함하는 4차원 정보인 [x, y, z, w]로 저장한다. 즉, 깊이 맵에서 좌표 에 대응되는 각 픽셀은 (u, v, w)에 대응되는 [x, y, z, w]의 4차원 정보를 저장한다.Here, K _init is the initial calibration matrix, (u, v, w) is the coordinate value obtained by converting the point (x, y, z) of the 3D point cloud expressed in the LIDAR coordinate system to the camera coordinate, and the depth map is the coordinate of the depth map For each pixel, it is stored as [x, y, z, w], which is 4-dimensional information including the (x, y, z) value of the corresponding LiDAR coordinate system and the depth value (w). That is, in the depth map, the coordinate Each pixel corresponding to stores four-dimensional information [x, y, z, w] corresponding to (u, v, w).

본 발명의 일 실시예에 따르면, 이미지 좌표계와 매칭하기 위하여, 포인트 클라우드 좌표계의 원점 (0,0,0)을 이동시킬 위치를 지시하는 변환 벡터 를 추정하고, 포인트 클라우드 좌표계의 원점 (0,0,0)을 변환 벡터 로 이동시키는 변환 행렬 T_G를 생성할 수 있다.According to one embodiment of the present invention, a transformation vector indicating a position to which the origin (0,0,0) of the point cloud coordinate system is to be moved in order to match the image coordinate system. Estimate and transform the origin (0,0,0) of the point cloud coordinate system into a vector We can generate a transformation matrix T _G that moves to .

도 6을 참조하면, 이미지 를 인코더-디코더 네트워크(CNN1)에 입력하여, 의사 깊이 이미지(pseudo-depth image) 및 깊이 마스크(depth mask) 을 추정할 수 있다. 여기서, 의사 깊이 이미지는 이미지 의 각각의 픽셀에 대하여 깊이 값을 가진 이미지로서 정의될 수 있다. 실제 값(ground truth)은 실제 포인트 클라우드(ground truth point cloud)를 이미지 평면에 전사(projection)한 것이고, 깊이 마스크는 포인트 클라우드가 전사되었으면 '1'이고, 아니면 '0'인 값을 가지는 이미지 의 각각의 픽셀에 대한 이진 이미지(binary image)이다.Referring to Figure 6, the image Input the pseudo-depth image into the encoder-decoder network (CNN1). and depth mask can be estimated. Here, the pseudo-depth image is an image It can be defined as an image with a depth value for each pixel. The ground truth is the projection of the actual point cloud onto the image plane, and the depth mask is an image with a value of '1' if the point cloud is projected, or '0' otherwise. is a binary image for each pixel.

도 6을 참조하면, 인코더-디코더 네트워크(CNN1)에서 산출된 특징 맵(feature map) 및 깊이 이미지 는 ResNet(CNN2)에 입력되고, CNN2로부터 추출된 특징들(features)를 다층 퍼셉트론(multi-layer perceptron, MLP)으로 구성된 신경망에 입력하여, 특징 맵과 깊이 이미지의 거리 차이값에 해당하는 3차원 변환 벡터 를 추정한다. 본 발명의 일 실시예에 따르면, 본 발명에 사용되는 ResNet은 선행문헌인 [K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.]에 개시된 신경 네트워크를 사용할 수 있고, 상기 선행문헌의 모든 개시는 본원에 참조로서 포함된다. 다만, 본 발명의 상기 ResNet 구성은 상기 선행문헌에 개시된 구성에 한정되지 않는다. Referring to Figure 6, the feature map and depth image produced by the encoder-decoder network (CNN1) is input to ResNet (CNN2), and the features extracted from CNN2 are input to a neural network consisting of a multi-layer perceptron (MLP), and a 3D transformation vector corresponding to the distance difference between the feature map and the depth image is generated. According to one embodiment of the present invention, the ResNet used in the present invention may use the neural network disclosed in the prior art document [K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.], and all disclosures of the prior art document are incorporated herein by reference. However, the ResNet configuration of the present invention is not limited to the configuration disclosed in the prior art document.

본 발명의 일 실시예에 따르면, 의사 깊이 이미지 의 손실 함수는 평균 제곱 오차(mean squared error)이고, 깊이 마스크 을 위한 손실 함수는 크로스 엔트로피 오차(cross-entropy error)이며, 를 위해서는 L1-loss 함수가 사용될 수 있다.According to one embodiment of the present invention, a pseudo-depth image The loss function is the mean squared error, and the depth mask The loss function for is the cross-entropy error, For this, the L1-loss function can be used.

[성능 실험][Performance Test]

본 발명의 성능을 확인하기 위하여, 2차원 이미지 및 3차원 포인트 클라우드의 정합이 적용될 수 있는, 3차원 포인트 클라우드의 기준 좌표계를 기준으로 2차원 이미지 센서의 위치 및 자세를 측정하는 이미지 기반 측위(image-based localization) 및 2차원 센서 및 3차원 센서 사이의 상관성을 확립하는 외적 보정(extrinsic calibration)에 적용하여 실험하였다.In order to verify the performance of the present invention, an experiment was conducted by applying it to image-based localization, which measures the position and attitude of a two-dimensional image sensor based on a reference coordinate system of a three-dimensional point cloud to which alignment of a two-dimensional image and a three-dimensional point cloud can be applied, and extrinsic calibration, which establishes a correlation between the two-dimensional sensor and the three-dimensional sensor.

구체적으로 본 발명의 성능을 테스트하기 위하여 아래 표 1과 같은 실험 조건을 가정하고, 각 실험 조건에서 실험을 진행하였다.Specifically, in order to test the performance of the present invention, experimental conditions as shown in Table 1 below were assumed, and experiments were conducted under each experimental condition.

<표 1: 테스트 세트의 구성><Table 1: Composition of the test set>

여기서, Test1은 이미지 기반 측위(image-based localization)을 위한 것이고, Test2는 카메라와 LiDAR 센서 사이의 외적 보정(extrinsic calibration)을 위한 것이다. 위의 표 1에서 Test1-A, B, E, F 및 Test2-A, B, C, D는 모두 단일 이미지 및 단일 포인트 클라우드 사이의 정합에 관한 것이고, Test1-C, D, G, H는 단일 이미지와 축적된 포인트 클라우드(accumulated point cloud) 사이의 정합을 위한 것이다. 축적된 포인트 클라우드는 선택된 포인트 클라우드의 전후로 5 개의 프레임을 축적한 데이터에 의하여 생성된다. 또한, 위의 표 1에서 α, β, γ는 노이즈로서, α는 이미지 회전 각도를 나타내고, β는 포인트 클라우드의 회전 각도를 나타내며, γ는 포인트 클라우드의 이동(translation)을 나타낸다. Here, Test1 is for image-based localization, and Test2 is for extrinsic calibration between the camera and the LiDAR sensor. In the Table 1 above, Test1-A, B, E, F and Test2-A, B, C, D are all about the registration between a single image and a single point cloud, and Test1-C, D, G, H are for the registration between a single image and an accumulated point cloud. The accumulated point cloud is generated by accumulating five frames before and after the selected point cloud. Also, in the Table 1 above, α, β, and γ represent noise, where α represents the image rotation angle, β represents the rotation angle of the point cloud, and γ represents the translation of the point cloud.

먼저, 이미지 기반 측위(image-based localization)에 대하여, Test1-A 내지 H의 조건에서, 선행문헌 [P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, "Rellis-3d dataset: Data, benchmarks and analysis," 2020.]에 개시된 Rellis-3D 데이터셋(dataset), 선행문헌 [A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.]에 개시된 KITTI odometry 데이터셋 및 선행문헌 [H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, "nuscenes: A multimodal dataset for autonomous driving," in CVPR, 2020.]에 개시된 nuScenes를 사용하였고, 상기 선행문헌들의 모든 개시는 본원에 참조로서 포함된다. 또한, 성능 비교를 위한 지표(evaluation metric)는 평균 상대 회전 오차(relative rotation error, RRE) 및 평균 상대 병진 오차(relative translational error, RTE)를 사용하였다.First, for image-based localization, under the conditions of Test1-A to H, the Rellis-3D dataset disclosed in the prior literature [P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, "Rellis-3d dataset: Data, benchmarks and analysis," 2020.], the KITTI odometry dataset disclosed in the prior literature [A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.], and the prior literature [H. We used nuScenes disclosed in [Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, "nuscenes: A multimodal dataset for autonomous driving," in CVPR, 2020.], all disclosures of which are incorporated herein by reference. In addition, the evaluation metrics for performance comparison were the mean relative rotation error (RRE) and the mean relative translational error (RTE).

먼저, Rellis-3D 데이터셋을 이용하여, 초기 상태(initial state), T_E 및 T_H만 적용한 경우(EH), T_E, T_F 및 T_H를 적용한 경우(EFG) 및 T_E, T_F, T_G, T_H를 적용한 경우(EFGH)를 비교하면, 아래 표 2와 같이 모든 Test1 조건에 대하여, EFGH의 경우가 오차가 적다.First, using the Rellis-3D dataset, comparing the initial state, the case where only T _E and T _H are applied (EH), the case where T _E , T _F , and T _H are applied (EFG), and the case where T _E , T _F , T _G , and T _H are applied (EFGH), EFGH has a smaller error for all Test1 conditions, as shown in Table 2 below.

<표 2: 이미지 기반 측위, Rellis-3D 데이터셋 비교><Table 2: Image-based positioning, comparison of Rellis-3D dataset>

또한, KTTI Odometry 및 nuScenes 데이터셋을 이용하여, 선행문헌 [S. Aich, J. M. U. Vianney, M. A. Islam, and M. K. B. Liu, "Bidirectional attention network for monocular depth estimation," in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 11 746-11 752.]에 개시된 기존의 방식인 BANet+ICP 및 선행문헌 [J. Li and G. H. Lee, "Deepi2p: Image-to-point cloud registration via deep classification," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 960-15 969.]에 개시된 기존의 방식인 DeepI2P에 의한 결과와 본 발명에 따른 EFGHNet에 따른 성능을 비교하면, 아래 표 3과 같이, 모든 Test1 조건에 대하여, 본 발명에 따른 EFGHNet의 경우가 오차가 적다.In addition, using the KTTI Odometry and nuScenes datasets, we compared the existing method BANet+ICP disclosed in the prior literature [S. Aich, J. M. U. Vianney, M. A. Islam, and M. K. B. Liu, "Bidirectional attention network for monocular depth estimation," in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 11 746-11 752.] and the prior literature [J. Li and G. H. Lee, "Deepi2p: Image-to-point cloud registration via deep classification," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. When comparing the performance of EFGHNet according to the present invention with the results by the existing method, DeepI2P, disclosed in [15 960-15 969.], as shown in Table 3 below, for all Test1 conditions, EFGHNet according to the present invention has less error.

<표 3: 이미지 기반 측위, KITTI Odometry 및 nuScenes에 의한 비교><Table 3: Comparison of image-based positioning, KITTI Odometry, and nuScenes>

도 8은 KITTI Odometry 데이터셋에 따른 이미지 기반 측위에 대한 정량적인 결과들을 보여주는 도면이다. 도 8에 나타난 정량적인 도면에 의하여도 본 발명에 따른 EFGHNet에 의한 정합 결과가 실제(ground truth)와 가장 일치하는 것을 확인할 수 있다.Fig. 8 is a diagram showing quantitative results for image-based positioning according to the KITTI Odometry dataset. It can be confirmed from the quantitative diagram shown in Fig. 8 that the alignment result by EFGHNet according to the present invention is most consistent with the ground truth.

다음으로, 2차원 및 3차원 센서의 외적 보정(Extrinsic Calibration)에 대하여, Test2-A 내지 D의 조건에서, 선행문헌 [A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, "Vision meets robotics: The kitti dataset," International Journal of Robotics Research (IJRR), 2013.]에 개시된 KITTI raw 데이터셋을 사용하였고, 상기 선행문헌들의 모든 개시는 본원에 참조로서 포함된다. 또한, 성능 비교를 위한 지표(evaluation metric)는 4원수 거리(quaternion distance, QD) 및 평균 절대 오차(mean absolute error, MAE)를 사용하였다. 여기서, 4원수 거리 및 평균 절대 오차는 각각 아래 수식 (7) 및 (8)와 같이 정의될 수 있다.Next, for the extrinsic calibration of the 2D and 3D sensors, the KITTI raw dataset disclosed in the prior literature [A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, "Vision meets robotics: The kitti dataset," International Journal of Robotics Research (IJRR), 2013.] was used under the conditions of Test2-A to D, and all disclosures of the above prior literatures are incorporated herein by reference. In addition, the quaternion distance (QD) and mean absolute error (MAE) were used as evaluation metrics for performance comparison. Here, the quaternion distance and the mean absolute error can be defined as in the following equations (7) and (8), respectively.

(8) (8)

(9) (9)

선행문헌 [G. Iyer, R. K. Ram, J. K. Murthy, and K. M. Krishna, "Calibnet: Geometrically supervised extrinsic calibration using 3d spatial transformer networks," in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1110-1117.]에 개시된 기존의 방식인 CalibNet 및 선행문헌 [K. Yuan, Z. Guo, and Z. J. Wang, "Rggnet: Tolerance aware lidar-camera online calibration with geometric deep learning and generative model,"IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6956-6963, 2020.]에 개시된 기존의 방식인 RGGNet 및 선행문헌 [X. Lv, B. Wang, Z. Dou, D. Ye, and S. Wang, "Lccnet: Lidar and camera self-calibration using cost volume network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2894-2901.]에 개시된 기존의 방식인 LCCNet에 의한 결과와 본 발명에 따른 EFGHNet에 따른 성능을 비교하면, 아래 표4와 같이, Test2-A MAE의 경우를 제외하면 모든 Test2 조건에 대하여, 본 발명에 따른 EFGHNet의 경우가 오차가 적다.The existing method CalibNet disclosed in the prior literature [G. Iyer, R. K. Ram, J. K. Murthy, and K. M. Krishna, "Calibnet: Geometrically supervised extrinsic calibration using 3d spatial transformer networks," in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1110-1117.] and the existing method RGGNet disclosed in the prior literature [K. Yuan, Z. Guo, and Z. J. Wang, "Rggnet: Tolerance aware lidar-camera online calibration with geometric deep learning and generative model," IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6956-6963, 2020.] and the prior literature [X. When comparing the performance of EFGHNet according to the present invention with the results by LCCNet, an existing method disclosed in [Lv, B. Wang, Z. Dou, D. Ye, and S. Wang, "Lccnet: Lidar and camera self-calibration using cost volume network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2894-2901.], EFGHNet according to the present invention has smaller errors for all Test2 conditions except for the case of Test2-A MAE, as shown in Table 4 below.

<표 4: 외적 보정, KITTI Raw에 의한 비교><Table 4: External correction, comparison by KITTI Raw>

본 발명에 따른 2차원 이미지 및 3차원 포인트 클라우드의 정합 방법 및 그 시스템에 대하여 본원의 도면에 따라 상기와 같이 설명하였으나, 본 발명은 본원에 도시 및 설명된 구성 및 방법으로만 국한되는 것이 아니다. 본 발명은 2차원 이미지 및 3차원 포인트 클라우드의 정합 방법에 대하여 설명하고 있으나, 2차원 이미지와 다른 2차원 이미지 또는 3차원 포인트 클라우드와 다른 3차원 포인트 클라우드에 대하여도 본 발명에 따른 정합 방법이 적용될 수 있다. 또한, 본원에 개시된 것 이외의 다양한 하드웨어 및/또는 소프트웨어가 본 발명의 구성으로 사용될 수 있고, 그 권리범위에 있어서도 본원에 개시된 구성 및 방법으로 한정되는 것이 아니다. 당해 기술분야의 통상의 기술자들은 본 발명이 추구하는 목적과 효과의 범위 내에서 다양한 변형 및 수정이 가능함을 이해할 것이다.Although the method and system for aligning a two-dimensional image and a three-dimensional point cloud according to the present invention have been described above with reference to the drawings of the present invention, the present invention is not limited to the configuration and method illustrated and described herein. The present invention has described the method for aligning a two-dimensional image and a three-dimensional point cloud, but the alignment method according to the present invention can be applied to a two-dimensional image different from a two-dimensional image or a three-dimensional point cloud different from a three-dimensional point cloud. In addition, various hardware and/or software other than those disclosed herein can be used as the configuration of the present invention, and the scope of rights thereof is not limited to the configuration and method disclosed herein. Those skilled in the art will understand that various modifications and variations are possible within the scope of the purpose and effect pursued by the present invention.

100: 자율주행장치
110: 라이다 센서
120: 카메라 센서
200: 지평선 네트워크
300: E3 네트워크
400: 전방축 네트워크
500: 수집 네트워크
510: CNN1
520: CNN2100: Autonomous driving device
110: Lidar sensor
120: Camera sensor
200: Horizon Network
300: E3 Network
400: Front axle network
500: Collection Network
510: CNN1
520: CNN2

Claims

As a device for aligning two-dimensional images and three-dimensional point clouds,
A vertical coordinate alignment unit for estimating a ground normal vector from the 3D point cloud based on a neural network and generating a transformation matrix T _E that aligns the ground normal vector to match e3 = [0,0,1] of a virtual reference coordinate system;
A horizontal coordinate alignment unit for estimating a horizon vector from the two-dimensional image based on a neural network and generating a transformation matrix T _H that aligns the horizon vector to match e2 = [0,1,0] of the virtual reference coordinate system;
A front axis alignment unit for deriving a range image from a point cloud transformed by the matrix T _E based on a neural network, estimating a front axis vector of the point cloud transformed by the matrix T _E based on an offset (w) at which a correlation between the range image and a two-dimensional image transformed by the matrix T _H is maximized, and generating a transformation matrix T _F for aligning the front axis vector to match e1 = [1,0,0] of the reference coordinate system; and
An origin alignment unit comprising: a point cloud transformed by the matrix T _F ·T _E based on a neural network is transformed into a depth image; a two-dimensional image transformed by the matrix T _H is transformed into a pseudo-depth image; and a matrix T _G is generated to transform the origins of the point cloud and the two-dimensional image to match through a comparison of the characteristics of the two.
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
A transformation matrix that aligns a two-dimensional image and a three-dimensional point cloud, which yields T _G ·T _F ·T _E .
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
The 2D image is the output of the camera sensor, and the 3D point cloud is the output of the lidar sensor.
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
The above vertical coordinate alignment part is implemented by a neural network including a DownBCL block.
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
The above vertical coordinate alignment part predicts the absolute value and sign of the ground normal vector, respectively.
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
The above-mentioned front axis alignment unit converts the point cloud transformed by the matrix T _F ·T _E into a range image according to the formula below,

In the above formula, (x, y, z) are the coordinates of the point cloud, and are the angles corresponding to the upper and lower limits of the vertical field-of-view of the point cloud, respectively, H and W represent the height and width of the range map, respectively, and λ represents the ratio of the horizontal field-of-view of the image and the point cloud.
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
The correlation between the above range image and the two-dimensional image transformed by the above matrix T _H is calculated using the following formula,

In the above formula, is the feature map of the above range image, is a feature map of a two-dimensional image transformed by the above matrix T _H , and are the height and width of the two-dimensional image, respectively. is the dimension size of the feature map, λ represents the ratio of the horizontal field-of-view of the image and the point cloud, represents the offset value,
A device for aligning two-dimensional images and three-dimensional point clouds.

In the first paragraph,
The above origin alignment unit converts the measurement data of the three-dimensional sensor, which is corrected by the correction matrix T _F ·T _E , into a depth image by the following formula.

In the above formula, K _init is the initial correction matrix, and (u, v, w) represents the coordinate values obtained by converting the point (x, y, z) of the 3D point cloud expressed in the LIDAR coordinate system into camera coordinates.
A device for aligning two-dimensional images and three-dimensional point clouds.

A method for aligning a two-dimensional image and a three-dimensional point cloud,
A vertical coordinate alignment step for estimating a ground normal vector from the 3D point cloud based on a neural network and generating a transformation matrix T _E that aligns the ground normal vector to match e3 = [0,0,1] of a virtual reference coordinate system;
A horizontal coordinate alignment step for estimating a horizon vector from the two-dimensional image based on a neural network and generating a transformation matrix T _H that aligns the horizon vector to match e2 = [0,1,0] of the virtual reference coordinate system;
A front axis alignment step of deriving a range image from a point cloud transformed by the matrix T _E based on a neural network, estimating a front axis vector of the point cloud transformed by the matrix T _E based on an offset (w) at which a correlation between the range image and a two-dimensional image transformed by the matrix T _H is maximized, and generating a transformation matrix T _F that aligns the front axis vector to match e1 = [1,0,0] of the reference coordinate system; and
A method for aligning a point cloud and a two-dimensional image, the method comprising: converting a point cloud transformed by the matrix T _F · _T _E into a depth image based on a neural network; converting a two-dimensional image transformed by the matrix T H into a pseudo-depth image; and then calculating a matrix T _G for transforming the origins of the point cloud and the two-dimensional image to match through a comparison of the characteristics of the two; comprising:
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
A step of calculating T _G ·T _F ·T _E as a transformation matrix for matching a two-dimensional image and a three-dimensional point cloud; further comprising;
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
The 2D image is the output of the camera sensor, and the 3D point cloud is the output of the lidar sensor.
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
The above vertical coordinate alignment step is executed by a neural network including a DownBCL block.
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
The above vertical coordinate alignment step predicts the absolute value and sign of the ground normal vector, respectively.
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
In the above forward axis alignment step, the point cloud transformed by the matrix T _F ·T _E is converted into a range image according to the formula below,

In the above formula, (x, y, z) are the coordinates of the point cloud, and are the angles corresponding to the upper and lower limits of the vertical field-of-view of the point cloud, respectively, H and W represent the height and width of the range map, respectively, and λ represents the ratio of the horizontal field-of-view of the image and the point cloud.
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
In the above forward axis alignment step, the correlation between the range image and the two-dimensional image transformed by the matrix T _H is calculated by the following formula,

In the above formula, is the feature map of the above range image, is a feature map of a two-dimensional image transformed by the above matrix T _H , and are the height and width of the two-dimensional image, respectively. is the dimension size of the feature map, λ represents the ratio of the horizontal field-of-view of the image and the point cloud, represents the offset value,
A method for aligning two-dimensional images and three-dimensional point clouds.

In Article 9,
In the above origin alignment step, the measurement data of the three-dimensional sensor corrected by the correction matrix T _F ·T _E is converted into a depth image by the following formula,

In the above formula, K _init is the initial correction matrix, and (u, v, w) represents the coordinate values obtained by converting the point (x, y, z) of the 3D point cloud expressed in the LIDAR coordinate system into camera coordinates.
A method for aligning two-dimensional images and three-dimensional point clouds.