KR102085070B1

KR102085070B1 - Apparatus and method for image registration based on deep learning

Info

Publication number: KR102085070B1
Application number: KR1020190108662A
Authority: KR
Inventors: 박재규; 한창민; 김경수
Original assignee: 한국씨텍(주)
Priority date: 2019-06-10
Filing date: 2019-09-03
Publication date: 2020-03-05
Anticipated expiration: 2039-09-03

Abstract

본 발명은 영상 정합 장치 및 방법에 관한 것으로, 본 발명의 일실예에 따른 영상 정합 장치는 가시광선(RGB) 영상을 입력받는 가시광선 영상 입력부; 적외선(IR) 영상을 입력받는 적외선 영상 입력부; 딥러닝을 기술을 이용하여 가시광선 영상으로부터 객체를 추출하고, 적외선 영상으로부터 배경영상을 추출하는 특징 추출부; 및 상기 추출된 객체와 배경을 정합하는 정합부;를 포함한다. 본 발명에 따르면 야간에 촬영된 영상에서도 객체 인식 및 식별 능력을 개선할 수 있다. The present invention relates to an image matching device and a method, and an image matching device according to an embodiment of the present invention includes a visible light image input unit for receiving a visible light (RGB) image; An infrared image input unit for receiving an infrared (IR) image; A feature extractor which extracts an object from a visible light image using a deep learning technique and extracts a background image from an infrared image; And a matching unit matching the extracted object with the background. According to the present invention, an object recognition and identification capability may be improved even in an image captured at night.

Description

Deep learning based image registration device and method {Apparatus and method for image registration based on deep learning}

본 발명은 딥러닝 기반 영상 정합 장치 및 방법에 관한 것으로, 보다 상세하게는 야간에 보행자나 차량과 같은 객체를 검출하기 어려운 여건에서 가시광선 영상과 적외선 영상을 정합함으로써 보다 정확하게 객체를 검출할 수 있는 딥러닝 기반 영상 정합 장치 및 방법에 관한 것이다.The present invention relates to a deep learning based image registration device and method, and more particularly, to accurately detect an object by matching a visible light image and an infrared image in a situation where it is difficult to detect an object such as a pedestrian or a vehicle at night. Deep learning-based image registration device and method.

야간에 CCTV로 촬영된 영상에서는 화질과 윤곽이 희미하여 사람의 정체를 확인하기 어렵다. 즉, 야간에 가시광선으로 촬영된 영상은 객체를 잘 알아보기 어렵다. 한편, 적외선 영상은 가시광선 영상과는 달리 보행자나 차량과 같은 객체의 윤곽을 뚜렷하게 보여주어서 객체와 배경이 분명하게 구분된다는 특징을 가진다. 가시광선 영상에서는 어두운 부분의 객체를 정확히 구분해내기 어려운 반면에, 적외선 영상에서는 어두운 부분에서도 상대적으로 보행자나 차량과 같은 객체가 뚜렷이 나타나서 구분해내기 용이하다. 이러한 특징은 야간 영상과 같이 조명이 부족한 환경에서 보다 두드러지게 나타난다. 그러나 적외선으로 촬영된 영상은 단색이여서 객체의 특징을 알아보기 어렵다. It is difficult to check a person's identity because the image quality and outline are blurred in the video taken by CCTV at night. That is, the image photographed with visible light at night is difficult to recognize the object well. On the other hand, the infrared image, unlike the visible light image, clearly shows the outline of an object such as a pedestrian or a vehicle, so that the object and the background are clearly distinguished. In the visible image, it is difficult to accurately distinguish the dark part of the object, whereas in the infrared image, objects such as pedestrians or vehicles are relatively visible in the dark part, and thus it is easy to distinguish the object. This feature is more prominent in poorly lit environments such as night vision. However, infrared images are monochromatic so it is difficult to see the characteristics of the object.

시설물 보안과 안전을 위하여 설치된 CCTV는 높은 객체 식별 능력이 요구된다. 특히 야간에 촬영된 영상에서 보행자나 차량같은 객체를 명확하게 구분하고, 그 특징을 명확하게 확인할 수 있는 방안이 요구된다. CCTVs installed for security and safety of facilities require high object identification. In particular, there is a need for a method for clearly distinguishing objects such as pedestrians or vehicles and clearly identifying the characteristics of the images captured at night.

야간에 촬영된 영상으로도 사람이나 차량 등의 객체를 명확하게 검출할 수 있는 영상 정합 장치 및 방법을 제안한다.An image matching device and method for clearly detecting an object such as a person or a vehicle even with an image captured at night are proposed.

본 발명의 일실시예에 따른 영상 정합 장치는 가시광선(RGB) 영상을 입력받는 가시광선 영상 입력부; 적외선(IR) 영상을 입력받는 적외선 영상 입력부; 딥러닝 기술을 이용하여 가시광선 영상으로부터 객체를 추출하고, 적외선 영상으로부터 배경영상을 추출하는 특징 추출부; 및 상기 추출된 객체와 배경을 정합하는 정합부;를 포함한다.Image matching apparatus according to an embodiment of the present invention is a visible light image input unit for receiving a visible light (RGB) image; An infrared image input unit for receiving an infrared (IR) image; A feature extractor which extracts an object from a visible light image using a deep learning technique and extracts a background image from an infrared image; And a matching unit matching the extracted object with the background.

상기 특징 추출부는 가시광선 영상 및 적외선 영상에 대하여 각각 Box Offset Regressor (x, y, w, h) 배경영상 안에서 객체를 분리하기 위한 바운딩 박스의 좌표값을 산출하기 위한 제1 및 제2 Bbox-pred(Bounding box-prediction)부; 및 가시광선 영상 및 적외선 영상에 대하여 각각 Softmax 함수를 사용하여 객체의 형성여부를(true, false) 판단하여 배경영상으로부터 객체영역을 분리해 내는 과정의 제1 및 제2 Cls(Classification Layers)-score부;를 포함한다.The feature extractor may include first and second Bbox-preds for calculating coordinate values of a bounding box for separating an object in a box offset regressor (x, y, w, h) background image for a visible light image and an infrared image, respectively. (Bounding box-prediction) unit; And first and second classification layers (CLS) -scores in the process of separating the object region from the background image by determining whether the object is formed (true, false) using the Softmax function for the visible light image and the infrared image, respectively. It includes;

상기 영상 정합 장치는 가시광선(RGB) 영상 프레임에서 컬러 특징을 취득하는 형태소 분석 필터;를 더 포함한다.The image matching device further includes a morphological analysis filter for acquiring color features in a visible light (RGB) image frame.

상기 영상 정합 장치는 가시광선(RGB) 영상 프레임 및 적외선(IR) 영상 프레임별 형태소 벡터 행렬을 분리하는 형태소 분석기;를 더 포함한다.The image matching device further includes a morpheme analyzer for separating the morphological vector matrix for each visible light (RGB) image frame and the infrared (IR) image frame.

상기 영상 정합 장치는 공 벡터를 생성하는 공간 벡터 생성기; 및 교차할 공 트랙과 정합된 객체를 담을 영상 프레임 트랙을 생성하는 프레임 트랙커 교차 생성기;를 더 포함한다.The image registration device includes a space vector generator for generating an empty vector; And a frame tracker intersection generator for generating an image frame track to contain the object matched with the ball track to be intersected.

상기 영상 정합 장치는 영상 정보를 출력하는 형태소 분석 파이스 생성기; 및 가시광선 영상과 적외선 영상을 파이프로 분리하여 공간 할당을 생성하는 객체 프레임 벡터 분리기;를 더 포함한다.The image matching device includes: a morphological analysis piece generator for outputting image information; And an object frame vector separator that separates the visible light image and the infrared image into pipes to generate space allocation.

상기 영상 정합 장치는 상기 객체 프레임 벡터 분리기에서 수신되는 프레임을 벡터 공간에 할당하는프레임 정합 생성기;를 더 포함한다.The image matching device may further include a frame matching generator for allocating a frame received by the object frame vector separator to a vector space.

본 발명은 가시광선 영상에서 컬러 정보를 획득하고, 적외선 영상에서 윤곽 정보를 획득하여 이를 정합하는 장치 및 방법에 관한 것이다. 본 발명은 적외선 영상에서 배경영상은 사용하고 객체의 형태를 버림으로써 윤곽(엣지)만 살리게 되고, 이와는 반대로 가시광선 영상에서는 배경영상은 버리고 보행자나 차량과 같은 객체인 형태소만을 추출하여 적외선 영상에 정합하는 방법을 사용한다. 따라서 고해상도의 가시광선 영상과 적외선 영상을 딥러닝 영상분석 기법을 활용하여 실시간으로 정합이 가능한 영상처리장치 및 방법을 제공할 수 있다.The present invention relates to an apparatus and method for acquiring color information from a visible light image, acquiring contour information from an infrared image, and matching the same. The present invention uses only the background image in the infrared image and discards the shape of the object.On the contrary, in the visible image, the background image is discarded and only the morphemes such as pedestrians or vehicles are extracted and matched to the infrared image. Use the method. Accordingly, it is possible to provide an image processing apparatus and method capable of real-time matching high resolution visible light images and infrared images using deep learning image analysis techniques.

가시광선 영상과 적외선 영상은 각각 Deep CNN(Convolutional Neural Network) 4부를 통해 4번의 컨볼루션 레이어를 통과하여 객체의 특징을 산출 한다. The visible and infrared images pass through four convolutional layers through four parts of the deep CNN (Convolutional Neural Network) to calculate the characteristics of the object.

Region CNN Feature부는 네 번째 컨볼루션 레이어를 통과한 가시광선 영상과 적외선 영상의 특징이 각각 반영된 객체의 피처 맵을 만들기 위하여 CNN 학습에 필요한 위젯(weight) 데이터 셋 분석 도구들을 연결하는 RPN(Region Proposal Network)부와 1×1 컨볼루션을 통해 피처 맵의 차원을 조절해주는 역할을 하는 NIN(Network in Network) Converter부를 통해 형성된 객체들의 특징을 합쳐 다섯 번째의 컨볼루션 레이어를 통해 세부적인 시각정보가 담긴 하나의 피처 맵이 만들어진다. The Region CNN Feature section is a Region Proposal Network that connects widget dataweight analysis tools necessary for CNN learning to create feature maps of objects that reflect the characteristics of visible and infrared images that pass through a fourth convolution layer. One that contains detailed visual information through the fifth convolutional layer, combining the features of objects formed through the NIN (Network in Network) Converter, which controls the dimension of the feature map through the 1 × 1 convolution. Feature map is created.

Bbox-pred(Bounding box-prediction)부는 Box Offset Regressor (x, y, w, h)의 배경영상 안에서 객체를 분리하기 위한 바운딩 박스의 좌표값을 산출하고, Cls(Classification Layers)- score부는 Softmax 함수를 사용하여 객체의 형성여부를(true, false) 판단하여 배경영상으로부터 객체영역을 분리해 낸다. The Bbox-pred (Bounding box-prediction) section calculates the coordinates of the bounding box for separating objects in the background image of the Box Offset Regressor (x, y, w, h), and the Cls (Classification Layers)-score section is the Softmax function. Use to determine whether the object is formed (true, false) and separate the object area from the background image.

정합부는 가시광선 영상의 Cls(Classification Layers)- score부를 통하여 배경과 분리된 객체와, 반대로 적외선 영상에서는 Bbox-pred(Bounding box-prediction)부의 Box Offset Regressor (x, y, w, h)로 객체를 제외한 배경영상을 서로 정합한다. The matching part is an object separated from the background through the Cls (Classification Layers) -score part of the visible light image, and the object is a box offset regressor (x, y, w, h) of the bounding box-prediction part in the infrared image. Match the background images except for

본 발명은 야간에 낮은 조도의 조명 환경에서 촬영되는 영상에 대하여 보행자나 차량과 같은 객체를 정확히 인식하고 컬러정보를 획득하여 판독한다. 판독이나 피처맵에서는 형태소에 따라 객체를 구분하고 컬러를 인식하여 특징을 찾아내기에 두 가지 요소가 중요하다. 보행자의 특징을 반소매 옷, 긴 바지와 같은 옷을 입을 형태와 붉은색 반소매 옷, 검은색 긴 바지와 같이 색상으로 특징을 세분화 할 수 있기에 객체인식에서 정확도를 높일 수 있다. 이와 같이 객체의 특징을 세분화하면 할수록 대용량의 저장 공간에서 가중치와 유사도를 비교하여 동일한 객체를 추적하여 찾아내는데 정확도를 높일 수 있다. 또한 육안으로 식별하기에도 보행자와 차량과 같은 객체가 컬러로 디스플레이 되면 객체의 특징을 쉽게 구분할 수 있고 기억하기도 용이하다.The present invention accurately recognizes an object, such as a pedestrian or a vehicle, and acquires and reads color information with respect to an image photographed in a low light illumination environment at night. In reading or feature maps, two factors are important in identifying objects by morphologically identifying objects and recognizing color. The characteristics of the pedestrian can be divided into colors such as short-sleeved clothes, long pants, red short-sleeved clothes, and black long pants, so the accuracy of object recognition can be improved. As the characteristics of the object are subdivided, the accuracy of tracking and finding the same object can be improved by comparing weights and similarities in a large storage space. In addition, even when visually identified, when objects such as pedestrians and vehicles are displayed in color, the characteristics of the objects can be easily distinguished and easily remembered.

본 발명에 따르면 야간에 촬영된 영상으로도 사람이나 차량 등의 객체를 정확하게 검출할 수 있다. According to the present invention, it is possible to accurately detect an object such as a person or a vehicle even with an image photographed at night.

야간의 저 시정 환경에서 가시광선 영상과 적외선 영상을 동시에 촬영하여 정합함으로써, 보행자나 차량과 같은 객체를 정확히 판독할 수 있으며, 딥러닝 영상분석 알고리즘을 이용한 객체 검출에서 전경과 배경을 명확히 구분할 수 있어서 검출율을 높일 수 있는 효과가 있다. By simultaneously capturing and matching visible and infrared images in a low visibility environment at night, it is possible to accurately read objects such as pedestrians and vehicles, and clearly distinguish between foreground and background in object detection using deep learning image analysis algorithms. There is an effect that can increase the detection rate.

또한 대부분 야간에는 적외선모드로 촬영하거나 열 영상으로 촬영하여 객체의 색상을 알아볼 수 없지만 가시광선영상과 적외선영상의 정합으로써 야간 촬영에도 컬러정보를 획득할 수 있다.In addition, the color of the object can not be recognized at most by shooting in infrared mode or by thermal image at night, but color information can be obtained even at night by matching visible and infrared images.

또한 주간에 녹음이 우거진 숲을 촬영했을 때 숲속에 있는 사람이나 차량과 같은 객체를 식별할 수 없으나 가시광선 영상과 적외선 영상을 정합하면 쉽게 알아볼 수 있으며, 육안으로 보이지 않는 선탠이 진하게 붙여진 차량 안에도 객체를 쉽게 알아볼 수 있다.Also, when shooting a forest in the daytime, you can't identify objects such as people or vehicles in the forest, but it is easy to recognize by matching the visible and infrared images, even in a vehicle that is heavily tanned invisible to the naked eye. You can easily recognize the object.

도 1은 본 발명의 일실예에 따른 영상 정합 장치를 나타낸다.
도 2는 도 1에 도시된 정합부(108) 일실시예를 나타낸다.
도 3은 본 발명의 일실시예에 따른 영상 정합 장치에서 벡터 프레임의 메모리 공간 할당과 정합 기록 과정을 보여준다. 1 shows an image matching device according to an embodiment of the present invention.
FIG. 2 illustrates one embodiment of the matching unit 108 shown in FIG. 1.
3 is a diagram illustrating a memory space allocation and registration recording process of a vector frame in an image matching device according to an embodiment of the present invention.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제안하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급되지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소오 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상내에서 제2 구성요소일 수도 있음은 물론이다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting the invention. In this specification, the singular also includes the plural unless specifically stated in the text. As used herein, “comprises” and / or “comprising” does not exclude the presence or addition of one or more other components in addition to the mentioned components. Like reference numerals refer to like elements throughout, and "and / or" includes each and all combinations of one or more of the mentioned components. Although "first", "second", etc. are used to describe various components, these components are of course not limited by these terms. These terms are only used to distinguish one component from another. Therefore, of course, the first component mentioned below may be the second component within the technical idea of the present invention. In addition, the terms "... unit", "module", etc. described in the specification mean a unit for processing at least one function or operation, which may be implemented in hardware or software or a combination of hardware and software. .

이하에서는 도면을 참조하여 본 발명에 대하여 보다 상세히 설명하기로 한다. 이하에 소개되는 실시예들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. Hereinafter, with reference to the drawings will be described in more detail with respect to the present invention. The embodiments introduced below are provided as examples to sufficiently convey the spirit of the present invention to those skilled in the art to which the present invention pertains.

도 1은 본 발명의 일실예에 따른 영상 정합 장치를 나타낸다.1 shows an image matching device according to an embodiment of the present invention.

도 1을 참조하면, 본 발명에 따른 영상 정합 장치(100)는 가시광선(RGB) 영상 입력부(101a), 적외선(IR) 영상 입력부(101b), 영상 특징 추출부(110) 및 영상 정합부(108)를 포함한다. 상기 가시광선(RGB) 영상 입력부(101a)와 적외선(IR) 영상 입력부(101b)는 각각 이미지센서에서 획득된 영상이 입력된다. 즉, 가시광선(RGB) 영상 입력부(110a)는 가시광선(RGB) 영상을 입력받고, 적외선(IR) 영상 입력부(110b)는 적외선 영상을 입력 받는다. Referring to FIG. 1, the image matching device 100 according to the present invention includes a visible light (RGB) image input unit 101a, an infrared (IR) image input unit 101b, an image feature extractor 110, and an image matcher ( 108). The visible light (RGB) image input unit 101a and the infrared (IR) image input unit 101b respectively receive images obtained from an image sensor. That is, the visible ray (RGB) image input unit 110a receives a visible ray (RGB) image, and the infrared ray (IR) image input unit 110b receives an infrared image.

영상 특징 추출부(110)는 딥러닝 기술을 이용하여 가시광선(RGB) 영상에서 필요한 특징을 추출하고, 적외선(IR) 영상에서 필요한 특징을 추출한다. 구체적으로 영상 특징 추출부(110)는 가시광선(RGB) 영상으로부터 객체를 추출하고, 적외선 영상(IR)으로부터 배경을 추출할 수 있다.The image feature extractor 110 extracts a feature required from a visible light (RGB) image using a deep learning technology and extracts a feature required from an infrared (IR) image. In more detail, the image feature extractor 110 may extract an object from a visible light (RGB) image and extract a background from an infrared image (IR).

영상 특징 추출부(110)은 Deep CNN(Convolutional Neural Network)4 부(102a, 102b), RPN(Region Proposal Network)부(103a, 103b), Region CNN Feature부(104a, 104b), NIN(Network in Network) Converter부(105), Bbox-pred(Bounding box-prediction)부(106a, 106b), Cls(Classification Layers)- score부(107a, 107b), 영상 정합부(108), 영상 출력부(109)를 포함한다. The image feature extraction unit 110 includes four deep CNN (Convolutional Neural Network) units 102a and 102b, a Region Proposal Network (RPN) unit 103a and 103b, a Region CNN feature unit 104a and 104b, and a network in NIN. Network Converter 105, Bbox-pred (Bounding box-prediction) 106a, 106b, Classification Layers (Cls) -score 107a, 107b, Image Matching 108, Image Output 109 ).

상기 Deep CNN4부(102a, 102b)는 각각의 가시광선(RGB) 영상 입력부(101a)와 적외선(IR) 영상 입력부(101b)에서 수신된 영상에 딥러닝 영상분석 알고리즘인 CNN(Convolutional Neural Network)을 적용한다. 이때 두 CNN의 구조는 동일하고 입력만 다르게 들어간다. 상기 Deep CNN4부(102a, 102b)는 여러 층의 컨볼루션 레이어(Convolutional layer)들로 구성되어 있는데, 이 중 네 번째 컨볼루션 레이어를 통과한 두 피처 맵(Feature map)을 채널 차원을 따라 이어 붙이고 각각의 특징을 산출한다.The Deep CNN 4 units 102a and 102b use a convolutional neural network (CNN), which is a deep learning image analysis algorithm, to the images received from each of the visible (RGB) image input unit 101a and the infrared (IR) image input unit 101b. Apply. At this time, the structure of the two CNNs are the same and only the inputs are different. The Deep CNN 4 parts 102a and 102b are composed of multiple layers of convolutional layers, of which two feature maps passing through the fourth convolutional layer are joined along the channel dimension. Calculate each feature.

CNN(Convolutional Neural Network)은 딥러닝 알고리즘의 한 종류로, 격자 형태로 배열된 데이터를 처리하는 것에 특화되어 데이터의 패턴을 식별하는 것에 대하여 효과적인 신경망이다. CNN(Convolutional Neural Network)은 기본적으로 이미지에 대해 convolution 연산을 사용하는 layer 들이 사용된 신경망이다. CNN(Convolutional Neural Network)은 공유 파라미터로 사용할 수 있는 다수의 필터를 활용하여 2차원의 경우 이미지의 공간 정보를 유지하여 인접 이미지와의 특징을 효과적으로 추출하며 학습한다. CNN(Convolutional Neural Network)은 최소한의 파라미터와 전처리 과정을 통한 보다 간편한 학습을 가능하게 하는 장점이 있다. Convolutional Neural Network (CNN) is a type of deep learning algorithm that is specialized for processing data arranged in a grid and is an effective neural network for identifying patterns of data. Convolutional Neural Network (CNN) is basically a neural network with layers that use convolution operations on images. CNN (Convolutional Neural Network) utilizes multiple filters that can be used as shared parameters to effectively extract and learn features from adjacent images by maintaining spatial information of images in two dimensions. Convolutional Neural Networks (CNNs) have the advantage of enabling easier learning with minimal parameters and preprocessing.

상기 RPN(Region Proposal Network)부(103a, 103b)는 각각 네 번째 컨볼루션 레이어를 통과한 두 피처 맵을 가시광선(RGB) 영상과 적외선(IR) 영상의 특징이 반영된 피처 맵을 만들기 위하여 CNN 학습에 필요한 위젯 데이터 셋 분석도구들을 연결한다. 일반적으로 RPN(Region Proposal Network)부는 객체 검출에서 핵심역할을 하며, 기본 Anchor Box, Delta, 및 probaility 3개의 요소를 이용한다. Anchor Box는 이 구역에서 객체가 발견되는지를 검사하겠다고 최초에 설정된 고정 영역으로, 입력 영상 내부에 구획된 작은 직사각형 영역들이다. Delta는 기본 Anchor의 크기와 위치를 조정하기 위한 값들이다. 이는 AI 모델을 통해 학습된다. Anchor 하나에 Delta가 하나씩 대응한다. Probability는 각 Anchor 내부에 객체가 존재할 확률이다. 결국 RPN의 출력은 객체가 존재할 것이라는 확률이 높고, 중복이 제거된 Bounding Box 들이다. 이걸 ROI(Region of Interest)라고 한다.The RPN (Region Proposal Network) units 103a and 103b each learn CNN to create feature maps reflecting the characteristics of the visible light (RGB) and infrared (IR) images from two feature maps passing through the fourth convolutional layer. Connect widget data set analysis tools needed for. In general, the Region Proposal Network (RPN) part plays a key role in object detection and uses three elements: basic anchor box, delta, and probaility. The anchor box is a fixed area initially set up to check if an object is found in this area, which are small rectangular areas partitioned inside the input image. Delta is the value to adjust the size and position of the basic anchor. This is learned through the AI model. One Delta corresponds to one Anchor. Probability is the probability that an object exists inside each anchor. After all, the output of RPNs is that there is a high probability that the object exists, and the bounding boxes are deduplicated. This is called the ROI (Region of Interest).

Region CNN Feature부(104a, 104b)는 네 번째 컨볼루션 레이어와 RPN(Region Proposal Network)부(103a, 103b)에서 형성된 객체의 특징들을 합쳐 다섯번째 컨볼루션 레이어인 의미론적인 정보와 세부적인 시각적 정보가 담긴 하나의 피처 맵이 만들어진다. Region CNN Feature section 104a, 104b combines the features of the object formed in the fourth convolutional layer and Region Proposal Network (RPN) section 103a, 103b to provide semantic and detailed visual information, which is the fifth convolutional layer. One feature map is created.

NIN(Network in Network) Converter부(105)는 1×1 컨볼루션을 통해 피처 맵의 차원을 조절해주는 역할을 하는데, 이는 미리 학습된 모델의 가중치(weight)를 이용하기 위해서는 차원을 맞춰 주어야 하기 때문에 필요하다.The NIN (Network in Network) Converter 105 adjusts the dimension of the feature map through 1 × 1 convolution, which must be matched in order to use the weight of the pre-trained model. need.

Bbox-pred(Bounding box-prediction)부(106a, 106b)는 Box Offset Regressor (x, y, w, h) 배경영상 안에서 객체를 분리하기 위한 바운딩 박스의 좌표값을 산출하기 위한 것이다. The bounding box-prediction (Bbox-pred) units 106a and 106b are used to calculate coordinate values of a bounding box for separating objects in a box offset regressor (x, y, w, h) background image.

Cls(Classification Layers)-score부(107a, 107b)는 Softmax 함수를 사용하여 객체의 형성여부를(true, false) 판단하여 배경영상으로부터 객체영역을 분리한다. Cls (Classification Layers) -score unit 107a, 107b separates the object area from the background image by determining whether the object is formed (true, false) using the Softmax function.

영상 정합부(108)는 가시광선(RGB) 영상의 Cls(Classification Layers)-score부(107a)를 통하여 배경과 분리된 객체와, 적외선(IR) 영상에서의 Bbox-pred(Bounding box-prediction)부(106b)의 Box Offset Regressor (x, y, w, h)를 통해 객체를 제외한 배경을 서로 정합한다.The image matching unit 108 is an object separated from the background through a classification layer (SCI) -score unit 107a of a visible light (RGB) image, and a boxing box-prediction (Bbox-pred) in an infrared (IR) image. Through the Box Offset Regressor (x, y, w, h) of the unit 106b, the backgrounds other than the objects are matched with each other.

도 2는 도 1에 도시된 영상 정합부(108) 일실시예를 나타낸다.FIG. 2 illustrates an embodiment of the image matcher 108 illustrated in FIG. 1.

도1 및 도 2를 참조하면, 본 발명의 일실시예에 따른 영상 정합부(200)는 가시광선(RGB) 영상 프레임부(201a)에서 도 1에 도시된 CLS(Classification Layers -Score, 107a)로부터 영상 프레임 정합에 필요한 객체(또는 전경) 프레임을 수신한다. 또한 적외선(IR) 영상 프레임부(201b)에서 도 1에 도시된 Bbox-Pred(Bounding box-prediction, 107b)의 Box Offset Regressor(x, y, w, h)로 배경영상을 수신하되 객체(또는 전경)의 윤곽이 제외된 배경 공간 벡터 프레임을 수신한다(단계 201b). 1 and 2, an image matching unit 200 according to an embodiment of the present invention may include a classification layer (Score) 107a (CLS) illustrated in FIG. 1 in a visible light (RGB) image frame unit 201a. Receives an object (or foreground) frame required for image frame registration from the apparatus. In addition, the infrared (IR) image frame unit 201b receives a background image as a box offset regressor (x, y, w, h) of Bbox-Pred (Bounding box-prediction, 107b) shown in FIG. Receive a background space vector frame without the outline of the foreground) (step 201b).

형태소(Morpheme) 필터(202)는 컬러 영상에 대한 잡음을 제거하는 필터링 동작을 수행하고, 가시광선(RGB) 영상 프레임부(201a)에서 컬러 특징을 취득한다.The Morpheme filter 202 performs a filtering operation to remove noise of the color image, and acquires color features from the visible light (RGB) image frame unit 201a.

형태소 분석기(203)에서는 적외선(IR) 영상 프레임부(201b)에서 수신되는 수신되는 배경 공간 벡터 프레임과, 가시광선(RGB) 영상프레임부(201a) 및 형태소 필터(202)를 거쳐 수신되는 컬러 객체 형태소를 이용해 각 특징을 분석한다. 형태소 분석기(203)를 통해 각 프레임 별 형태소 벡터 행렬을 분리한 후 공간 벡터 생성기(206-1)를 통해 벡터 요소를 생성한다. In the morpheme analyzer 203, the color object received through the received background space vector frame received from the infrared (IR) image frame unit 201b, the visible light (RGB) image frame unit 201a, and the morpheme filter 202. Analyze each feature using morphemes. After separating the morpheme vector matrix for each frame through the morpheme analyzer 203, the vector elements are generated by the spatial vector generator 206-1.

공간 벡터 생성기(206-1)는 공 벡터를 생성하고, 프레임 트랙커 교차 생성기(206-3)의 트랙을 교차할 공 트랙과 정합된 객체를 담을 영상 프레임 트랙을 생성한다. The spatial vector generator 206-1 generates an empty vector, and generates an image frame track to contain the object matched with the empty track to intersect the track of the frame tracker intersection generator 206-3.

객체 프레임 벡터 분리기(204)는 형태소 분석기(203)를 통해 분석된 객체의 수치화된 벡터 정보만을 분석한다. 객체 프레임 벡터 분리기(204)는 형태소 프레임 파이프 생성기(206-2)를 통해 들어오는 영상 정보를 가시광선(RGB) 영상과 적외선(IR) 영상 파이프로 분리해 공간 할당을 생성한다.The object frame vector separator 204 analyzes only the digitized vector information of the object analyzed by the morphological analyzer 203. The object frame vector separator 204 generates a space allocation by dividing the image information received through the morphological frame pipe generator 206-2 into a visible light (RGB) image and an infrared (IR) image pipe.

형태소 프레임 타입 결정기(204-1)는 객체 프레임 벡터 분리기(204)가 동작할 때 형태소 분석에 필요한 영상 프레임 특징 타입을 제공한다. 여기서 타입은 특징 분석에 필요한 위젯 데이터 셋(Widget Dataset)을 의미한다. 형태소 분석에 필요한 영상 프레임 특징 타입을 제공하기 위해 특징 분석에 필요한 위젯 데이터 셋(Widget Dataset)을 통해 사전학습을 진행한다.The stem frame type determiner 204-1 provides an image frame feature type for stemming when the object frame vector separator 204 operates. In this case, the type refers to a widget data set required for feature analysis. In order to provide image frame feature types required for morphological analysis, pre-learning is performed through a widget data set necessary for feature analysis.

프레임 정합 생성기(205)는 수신되는 프레임을 벡터 공간에 할당한다. 프레임 정합에 필요한 교차 프레임 트랙커(Tracker) 등의 기능을 처리 한다. The frame match generator 205 assigns the received frame to vector space. It handles functions such as cross frame tracker required for frame matching.

영상 정합 초기화기(206)는 프레임 정합 생성기(205)에서 출력되는 결과와 공간 벡터 프레임 처리 장치인 공간벡터 생성기(206-1), 형태소 프레임 파이프 생성기(206-2), 프레임 트렉커 교차 생성기(206-3)와 영상 생성기(206-5) 장치들을 연결해 영상 정합 장치를 활성화 한다. 컬러(RGB) 객체 영상을 가져와 형태소로부터 추출된 컬러 특징(RGB Feature) 객체를 기반으로 객체 벡터 블록을 생성한다. The image matching initializer 206 is a space vector generator 206-1, a stem vector frame pipe generator 206-2, and a frame tracker cross generator that are output from the frame matching generator 205 and a space vector frame processing apparatus. 206-3) and the image generator 206-5 to connect the devices to activate the image matching device. An object vector block is generated based on a color feature object extracted from a morpheme by taking a color object image.

정합 영상 생성부(207)는 영상 정합 초기화기(206)를 이용해 실시간으로 배경을 이루고 있는 벡터 공간에 객체(또는 전경) 영상 정합을 진행한다.The registered image generator 207 performs object (or foreground) image registration on the vector space forming the background in real time using the image registration initializer 206.

도 3은 본 발명의 일실시예에 따른 영상 정합 장치에서 벡터 프레임의 메모리 공간 할당과 정합 기록 과정을 보여준다.3 is a diagram illustrating a memory space allocation and registration recording process of a vector frame in an image matching device according to an embodiment of the present invention.

도 2 및 도 3을 참조하면, 공간벡터 생성기(206-1)에 의해서 이미지 컨테이너(Container) 공간 벡터가 만들어진다(단계 301). 그런 후 형태소 프레임 파이프 생성기(206-2)를 통해서 교차(Cross) 벡터를 생성한 후(단계 302), 트렉커 교차(Cross) 생성기(206-3)를 만든다(단계 304). 피처 트랙은 형태소 프레임 파이프 생성기(206-2)와 트렉커 교차(Cross) 생성기(206-3)에 가시광선(RGB) 영상의 객체(또는 전경)와, 적외선(IR) 영상의 객체(또는 전경)를 제외한 배경영상을 수신하여 교차(Cross) 벡터(302)와 트렉커 교차(Cross) 생성기(304)의 정합된 영상 스트림 데이터로 채움으로써, 피처 데이터가 기록된 후(단계 303) 영상 정합 초기화기(206)가 생성된다(단계 305).2 and 3, an image container space vector is created by the space vector generator 206-1 (step 301). The cross vector is then generated through the stem frame pipe generator 206-2 (step 302) and then the tracker cross generator 206-3 is created (step 304). The feature track is an object (or foreground) of visible (RGB) images and an object (or foreground) of infrared (IR) images in the stem frame pipe generator 206-2 and the tracker cross generator 206-3. After receiving the background image and filling it with the matched image stream data of the cross vector 302 and the tracker cross generator 304, the feature data is recorded (step 303). The group 206 is created (step 305).

본 발명의 일실시예에 따른 영상 정합 장치 및 방법은 영상분석과 처리에 적합한 인공신경망인 CNN(Convolutional Neural Network)을 이용한 영상 정합기술로서, 고해상도의 영상을 실시간으로 보행자나 차량과 같은 객체를 검출할 수 있고, 컬러정보를 획득할 수 있어서 식별이나 인식에서 판독효과를 극대화할 수 있다. 이 기술은 활용하면 딥러닝 영상분석에서 보행자와 차량과 같은 객체(또는 전경)과 배경을 명확히 구분하여 분류할 수 있기에 객체에 대한 인식률을 높일 수 있다.An image matching device and method according to an embodiment of the present invention is an image matching technology using a CNN (Convolutional Neural Network), which is an artificial neural network suitable for image analysis and processing, and detects an object such as a pedestrian or a vehicle in real time with a high resolution image. In addition, color information can be obtained, thereby maximizing the reading effect in identification or recognition. This technology can improve the recognition rate of objects because deep learning image analysis can distinguish and classify objects (or foreground) and backgrounds such as pedestrians and vehicles.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스트, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능한 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, in a software module executed by hardware, or by a combination thereof. Software modules may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any form of computer readable recording medium well known in the art.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.In the above, embodiments of the present invention have been described with reference to the accompanying drawings, but those skilled in the art to which the present invention pertains may realize the present invention in other specific forms without changing the technical spirit or essential features thereof. I can understand that. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive.

101a: 가시광선 영상 입력부 101b: 적외선 영상 입력부
102a, 102b: Deep CNN 4부 103a, 103b: RPN
104a, 104b: Region CNN Feature 105: NIN Converter
106a, 106b: Box Offset Regressor 107a, 107b: Softmax
108: 정합부 109: 영상출력부101a: visible light image input unit 101b: infrared image input unit
102a, 102b: Deep CNN Part 4 103a, 103b: RPN
104a, 104b: Region CNN Feature 105: NIN Converter
106a, 106b: Box Offset Regressor 107a, 107b: Softmax
108: matching unit 109: image output unit

Claims

delete

A visible light image input unit configured to receive a visible light (RGB) image;
An infrared image input unit for receiving an infrared (IR) image;
A feature extractor which extracts an object from a visible light image using a deep learning technique and extracts a background image from an infrared image; And
And a matching unit for matching the extracted object with a background.
The feature extraction unit
Box Offset Regressor (x, y, w, h) First and second Bbox-preds (Bounding box-) for calculating coordinate values of bounding boxes for separating objects in a background image for visible and infrared images prediction unit; And
First and second classification layers (Cls) -score part of the process of separating the object area from the background image by determining whether the object is formed (true, false) using the Softmax function for the visible light image and the infrared image, respectively. Including;
The matching part
Box Offset Regressor (x, y) of the object separated through the first Classified Layers (Cls) -score part of the visible light (RGB) image and the second Bbox-pred (bounding box-prediction) part in the infrared (IR) image , w, h) image matching device to match the background image excluding the object.

The method of claim 3,
And a morphological analysis filter for obtaining color features in a visible light (RGB) image frame.

The method of claim 3,
And a morpheme analyzer for separating morphological vector matrices for each visible light (RGB) image frame and an infrared (IR) image frame.

The method of claim 5,
A space vector generator for generating a ball vector; And
And a frame tracker intersection generator for generating an image frame track to contain the object matched with the ball track to be intersected.

The method of claim 6,
A morphological analysis piece generator for outputting image information; And
And an object frame vector separator for separating a visible light image and an infrared image by a pipe to generate a space allocation.

The method of claim 7, wherein
And a frame matching generator for allocating a frame received from the object frame vector separator to a vector space.