WO2012133962A1

WO2012133962A1 - Apparatus and method for recognizing 3d movement using stereo camera

Info

Publication number: WO2012133962A1
Application number: PCT/KR2011/002145
Authority: WO
Inventors: 강인배
Original assignee: ITXSECURITY CO Ltd
Current assignee: ITXSECURITY CO Ltd
Priority date: 2011-03-25
Filing date: 2011-03-29
Publication date: 2012-10-04
Anticipated expiration: 2013-09-25
Also published as: KR20120108728A; KR101226668B1

Abstract

An apparatus and a method for recognizing 3D movement using a stereo camera are disclosed. The apparatus for recognizing 3D movement, according to the present invention, calculates information on the movement of a pointer by means of calculating the actual position, the direction of movement, the distance of movement, and the speed of the pointer, which corresponds to a specific area of the human body to be tracked, by using the stereo camera. The information which is extracted and calculated by the present invention enables the gesture of a person in a 3D space to be turned into a pattern and information.

Description

3D motion recognition device and recognition method using stereo camera

본 발명은, 스테레오 카메라를 이용하여 사용자의 3차원 몸짓을 인식하여 정보화할 수 있는 3차원 동작 인식장치 및 인식방법에 관한 것이다.The present invention relates to a three-dimensional motion recognition apparatus and a recognition method that can recognize and inform a three-dimensional gesture of a user using a stereo camera.

컴퓨터를 포함하는 기계장치가 인간의 동작을 단순히 기록하는 것을 넘어 그 동작의 패턴을 인식할 수 있다는 것은 정보 기술의 새로운 혁신이 될 수 있을 것이다. 예컨대, 최근의 일부 게임기기가 사용자의 동작을 감지하여 그 동작에 반응하는 형태로 게임을 진행하는 것은 그 단편적인 예가 될 것이다. The ability of computers, including computers, to recognize patterns of movement beyond simply recording human movements could be a new innovation in information technology. For example, it may be a fragmentary example that recent game devices detect a user's motion and proceed with the game in response to the motion.

이처럼, 사용자의 3차원 몸짓을 인식하는 기술은 일반적인 정보기기를 위한 사용자 인터페이스 영역뿐만 아니라 각종 로봇 등을 포함하여 그 활용범위의 한계를 미루어 짐작하기 힘든 정도이다. As such, the technology for recognizing the user's three-dimensional gesture is difficult to guess the limit of the range of application, including various robots, as well as the user interface area for general information equipment.

이 분야에 대하여 종래에 이미 연구가 있었으나, 아직까지는 사용자가 별도의 착용 가능한 인터페이스 수단(Wearable Interface)을 사용하는 정도의 솔루션이 대부분이며, 카메라를 이용하여 사용자의 몸짓을 직접 해석하는 형태의 솔루션은 아직 제시되지 못한 상태이다. 그러나, 모니터나 홀로그램의 형태로 표시되는 3차원 입체 영상을 보면서 그 입체 영상으로 표시된 기기를 가상으로 조작하는 것과 같은 공상과학 영화 등에서나 나오던 것이 실제로 실현될 날도 멀지 않은 것만은 분명하다.Although there have been studies in this field in the past, most of the solutions are as far as the user uses a separate wearable interface, and the solution of directly interpreting the user's gesture using a camera is It has not been presented yet. However, it is clear that it is not too long to come true from science fiction films and the like, such as watching a three-dimensional stereoscopic image displayed in the form of a monitor or a hologram and virtually manipulating a device represented by the stereoscopic image.

본 발명의 목적은 스테레오 카메라를 이용하여 사용자의 3차원 몸짓을 인식하여 정보화할 수 있는 3차원 동작 인식장치 및 인식방법을 제공함에 있다.An object of the present invention is to provide a three-dimensional gesture recognition apparatus and a recognition method that can recognize and inform the three-dimensional gesture of the user using a stereo camera.

상기 목적을 달성하기 위한 스테레오 카메라를 이용한 3차원 동작 인식방법은, 움직이는 객체를 스테레오 카메라로 촬영하여 스테레오 영상을 생성하는 단계; 상기 스테레오 영상에서의 각 픽셀에 대한 심도 맵 데이터를 계산하는 단계; 상기 스테레오 영상에서 상기 움직이는 객체를 추출하는 단계; 상기 영상에 대한 영상처리를 통해, 상기 영상의 객체 영역 중에서 모션인식 대상이 되는 포인터를 인식하는 단계; 상기 생성하는 단계에서 연속적으로 생성되는 영상 프레임 각각에 대해, 상기 심도 맵 데이터를 계산하는 단계 내지 포인터를 인식하는 단계를 수행하여 상기 포인터의 3차원 공간상의 위치 변화를 추적하는 단계; 및 상기 추적된 포인터의 변화된 3차원 공간상 위치 정보를 이용하여, 상기 포인터의 3차원상 이동방향에 대한 정보를 계산하고 출력하는 단계를 포함한다.A three-dimensional motion recognition method using a stereo camera to achieve the above object, comprising: generating a stereo image by photographing a moving object with a stereo camera; Calculating depth map data for each pixel in the stereo image; Extracting the moving object from the stereo image; Recognizing a pointer to be a motion recognition target from an object region of the image through image processing of the image; Tracking the change of the position of the pointer in the three-dimensional space by performing the calculating of the depth map data or recognizing a pointer for each image frame continuously generated in the generating step; And calculating and outputting information on the three-dimensional moving direction of the pointer by using the changed three-dimensional spatial position information of the tracked pointer.

여기서, 상기 포인터를 인식하는 단계는, 상기 객체의 중심축을 추출한 다음, 상기 중심축의 단부를 포인터로 인식하는 것이 바람직하다.Here, in the step of recognizing the pointer, it is preferable to extract the central axis of the object, and then recognize the end of the central axis as a pointer.

또한, 상기 위치 변화를 추적하는 단계에서, 상기 포인터의 3차원 공간상의 위치는 상기 각 영상 프레임에서의 상기 포인터의 좌표와, 상기 심도 맵 데이터로부터 추출한 상기 좌표에서의 심도인 것이 바람직하다.In the step of tracking the change of position, the position of the pointer in the three-dimensional space is preferably the coordinate of the pointer in each image frame and the depth in the coordinate extracted from the depth map data.

실시 예에 따라, 상기 출력하는 단계는, 상기 영상에서 상기 객체 영역의 단위 픽셀당 거리를 계산하여 상기 포인터가 이동한 픽셀 수를 곱하는 방법으로 상기 포인터의 실제 공간상 이동거리를 함께 계산하되, 상기 단위 픽셀당 거리는 다음의 수학식으로 구할 수 있다.According to an embodiment of the present disclosure, the outputting may be performed by calculating a distance per unit pixel of the object region in the image and multiplying the number of pixels moved by the pointer to calculate the actual spatial movement distance of the pointer. The distance per unit pixel can be obtained by the following equation.

여기서 상기 L_p(do)는 심도 do에 위치한 상기 객체 영역에 포함된 픽셀의 단위 길이, 상기 Qxy는 프레임 전체의 가로축 또는 세로축의 픽셀 수. Where L _p (do) is the unit length of the pixels included in the object area located at the depth do, the Qxy is the number of pixels along the horizontal or vertical axis of the entire frame.

본 발명의 다른 실시 예에 따른 동작 인식장치는, 움직이는 객체를 스테레오 카메라로 촬영하여 스테레오 영상을 생성하는 스테레오카메라부; 상기 스테레오카메라부로부터 연속적으로 입력되는 상기 스테레오 영상의 각 프레임에서의 각 픽셀에 대한 심도 맵 데이터를 계산하는 거리정보계산부; 상기 영상 프레임 각각에서 상기 움직이는 객체를 추출하는 객체추출부; 상기 객체추출부가 추출한 객체 영역 중에서 모션인식 대상이 되는 포인터를 인식하는 과정을 상기 영상 프레임 각각에 대해 수행하여 상기 포인터의 3차원 공간상의 위치 변화를 추적하는 모션추적부; 및 상기 추적된 포인터의 변화된 3차원 공간상 위치 정보를 이용하여, 상기 포인터의 3차원상 이동방향에 대한 정보를 계산하고 출력하는 모션정보출력부를 포함한다.Motion recognition apparatus according to another embodiment of the present invention, the stereo camera unit for generating a stereo image by photographing a moving object with a stereo camera; A distance information calculator for calculating depth map data for each pixel in each frame of the stereo image which is continuously input from the stereo camera unit; An object extracting unit extracting the moving object from each of the image frames; A motion tracking unit for tracking a position change in the three-dimensional space of the pointer by performing a process for recognizing a pointer that is a motion recognition target among the object regions extracted by the object extraction unit for each of the image frames; And a motion information output unit for calculating and outputting information on the three-dimensional moving direction of the pointer by using the changed three-dimensional spatial position information of the tracked pointer.

본 발명의 동작 인식장치는 3차원 공간 내에서 임의적으로 발생할 수 있는 인간의 3차원 몸짓을 인식하여, 특정 인체 부분이 움직이는 방향 및 그 속도에 관한 정보를 생성할 수 있다. The motion recognition apparatus of the present invention may recognize three-dimensional gestures of humans that may occur arbitrarily in a three-dimensional space, and generate information regarding a direction and a speed at which a specific human body part moves.

도 1은 본 발명의 일 실시 예에 따른 동작 인식 시스템의 블록도,1 is a block diagram of a gesture recognition system according to an embodiment of the present invention;

도 2는 본 발명의 3차원 동작 인식방법의 설명에 제공되는 흐름도,2 is a flowchart provided for explaining a three-dimensional motion recognition method of the present invention;

도 3은 객체의 중심축 추출에 제공되는 도면, 3 is a view provided to extract the central axis of the object,

도 4는 본 발명의 인식 시스템의 실제 동작상태를 예시적으로 도시한 도면,4 exemplarily shows an actual operating state of the recognition system of the present invention;

도 5는 도 4에서 촬영한 영상들에 대한 영상처리 결과를 예시적으로 도시한 도면, 그리고FIG. 5 is a diagram illustrating an image processing result for the images photographed in FIG. 4; and

도 6은 객체의 면적 및 대표 길이의 계산에 제공되는 도면이다.6 is a view provided for the calculation of the area and representative length of an object.

이하 도면을 참조하여 본 발명을 더욱 상세히 설명한다.Hereinafter, the present invention will be described in more detail with reference to the accompanying drawings.

도 1을 참조하면, 동작인식장치(100)는 스테레오카메라부(110) 및 영상처리부(130)를 포함하여 3차원 공간상에서의 사용자의 3차원 모션을 인식하게 된다. Referring to FIG. 1, the gesture recognition apparatus 100 may include a stereo camera unit 110 and an image processor 130 to recognize a user's three-dimensional motion in a three-dimensional space.

본 발명의 동작인식장치(100)는 도 4에서처럼 스테레오 카메라를 이용하여 사용자를 촬영하게 되고, 그 촬영한 영상에 대한 영상처리를 통해 사용자의 3 차원 몸짓(Motion)을 인식한다. 여기서, 3차원 몸짓의 인식이라 함은, 사용자 인체 중에서 추적 대상이 된 인체 부분(이하 '포인터(Pointer)'라 함)의 움직이는 방향, 움직이는 거리 및 그 속도에 관한 정보(이하, '모션 정보'라 함)를 생성함을 의미한다. 또한, 포인터(Pointer)는 예컨대, 손, 팔, 발, 다리, 머리, 손가락과 같은 신체 일부분을 의미하며 '두 손'과 같이 복수의 부분이 될 수도 있다. 움직이는 방향은 카메라 뷰(View)를 기준으로 한 3차원 가상공간에서의 상/하/좌/우/전방/후방을 포함하는 임의의 방향뿐만 아니라 회전을 포함한다.The motion recognition apparatus 100 of the present invention photographs the user using a stereo camera as shown in FIG. 4, and recognizes the user's three-dimensional gesture through image processing of the photographed image. Here, the recognition of the three-dimensional gesture means information about a moving direction, a moving distance, and a speed of a human body part (hereinafter, referred to as a 'pointer') of the user's body to be tracked (hereinafter, 'motion information' Means to create the In addition, the pointer may mean a part of the body such as a hand, an arm, a foot, a leg, a head, a finger, and may be a plurality of parts, such as two hands. The direction of movement includes rotation as well as any direction including up / down / left / right / front / rear in a three-dimensional virtual space with respect to the camera view.

도 4에서의 사용자는 팔을 움직여 손을 상하로 움직이고 있다. 만약 동작인식장치(100)에 사용자 손이 포인터로 설정되어 있다면, 동작인식장치(100)는 사용자 손이 Y 방향으로 얼마의 거리만큼 얼마의 속도로 이동하였음을 인식하고 그 정보를 생성하게 된다. In FIG. 4, the user moves his or her hand up and down. If the user's hand is set as a pointer in the gesture recognition apparatus 100, the gesture recognition apparatus 100 recognizes that the user's hand has moved at a certain speed by a certain distance in the Y direction and generates the information.

이러한 동작을 위해, 동작인식장치(100)는 스테레오 카메라를 구비하여 사용자에 대한 스테레오 영상을 생성하고 영상 처리를 수행한다. For this operation, the motion recognition apparatus 100 includes a stereo camera to generate a stereo image for the user and perform image processing.

도 1을 참조하면, 동작인식장치(100)는 스테레오카메라부(110) 및 영상처리부(130)를 포함하여 3차원 공간상에서의 사용자의 인체 포인터의 3차원 모션을 인식하게 된다. Referring to FIG. 1, the gesture recognition apparatus 100 may include a stereo camera unit 110 and an image processor 130 to recognize a three-dimensional motion of a user's human pointer in a three-dimensional space.

스테레오카메라부(110)는 제1 카메라(111), 제2 카메라(113) 및 영상수신부(115)를 포함한다. The stereo camera unit 110 includes a first camera 111, a second camera 113, and an image receiver 115.

제1 카메라(111) 및 제2 카메라(113)는 동일한 영역을 촬영하도록 상호 이격되어 설치된 한 쌍의 카메라들로서, 소위 스테레오 카메라라고 한다. 제1 카메라(111) 및 제2 카메라(113)는 영역을 촬영한 아날로그 영상신호를 영상수신부(115)로 출력한다. 이러한 스테레오 카메라를 이용하여 피사체까지의 실제 거리를 추출할 수 있다. The first camera 111 and the second camera 113 are a pair of cameras spaced apart from each other to photograph the same area, and are called a stereo camera. The first camera 111 and the second camera 113 output an analog image signal photographing an area to the image receiver 115. The stereo camera may be used to extract the actual distance to the subject.

영상수신부(115)는 제1 카메라(111) 및 제2 카메라(113)에서 입력되는 연속적인 프레임의 영상신호(또는 이미지)를 디지털 영상으로 변환하고, 그 프레임 동기를 맞추어 영상처리부(130)에게 제공한다. The image receiver 115 converts a video signal (or image) of a continuous frame input from the first camera 111 and the second camera 113 into a digital image and synchronizes the frame to the image processor 130 in synchronization with the frame. to provide.

실시 예에 따라, 스테레오카메라부(110)의 제1 카메라(111)와 제2 카메라(113)는 아날로그 영상이 아닌 디지털 영상신호를 생성하는 카메라일 수 있으며, 이 경우 영상수신부(115)는 다른 변환처리없이 영상처리부(130)와의 인터페이스를 제공하며 한 쌍의 영상의 프레임 동기를 맞추는 역할을 한다. According to an exemplary embodiment, the first camera 111 and the second camera 113 of the stereo camera unit 110 may be a camera that generates a digital video signal instead of an analog image. In this case, the image receiver 115 may be different. It provides an interface with the image processor 130 without conversion processing and serves to match frame synchronization of a pair of images.

영상처리부(130)는 스테레오카메라부(110)로부터 연속적으로 출력되는 디지털 스테레오 영상을 이용하여 사용자의 포인터를 인식하고 그 포인터의 움직임을 추적함으로써 사용자의 3차원 모션을 인식한다.The image processor 130 recognizes the user's pointer by using the digital stereo image continuously output from the stereo camera unit 110 and recognizes the user's three-dimensional motion by tracking the movement of the pointer.

이상의 처리를 위해, 영상처리부(130)는 거리정보계산부(131), 객체추출부(133), 객체인식부(135), 모션추적부(137) 및 모션정보출력부(139)를 포함한다. 이하에서는, 도 2를 참조하여 거리정보계산부(131), 객체추출부(133), 객체인식부(135), 모션추적부(137) 및 모션정보출력부(139)의 동작을 설명한다. For the above process, the image processor 130 includes a distance information calculator 131, an object extractor 133, an object recognizer 135, a motion tracker 137, and a motion information output unit 139. . Hereinafter, operations of the distance information calculator 131, the object extractor 133, the object recognizer 135, the motion tracker 137, and the motion information output unit 139 will be described with reference to FIG. 2.

먼저, 제1 카메라(111) 및 제2 카메라(113)는 특정 공간을 촬영하도록 배치된다. 제1 카메라(111) 및 제2 카메라(113)가 아날로그 영상신호를 생성하면, 영상수신부(115)가 해당 아날로그 영상신호를 디지털 영상신호로 변환한 다음 프레임 동기를 맞추어 영상처리부(130)에게 제공한다(S201).First, the first camera 111 and the second camera 113 are arranged to photograph a specific space. When the first camera 111 and the second camera 113 generate an analog image signal, the image receiver 115 converts the analog image signal into a digital image signal and then provides the image processor 130 in synchronization with a frame. (S201).

<심도 맵 데이터 생성: S203 단계><Create Depth Map Data: Step S203>

모션 인식은 (1) 움직이는 객체 추출, (2) 객체가 인체인지 판단, (3) 인체인 경우, 기 설정된 포인터 부분 추적 (4) 포인터에 대한 모션 정보 생성의 순서로 이루어진다. 여기서, 위 (2) 내지 (4)의 과정에는 제1 및 제2 카메라(111, 113)로부터 해당 객체까지의 실측 거리정보가 필요하다. Motion recognition is performed in the order of (1) extracting moving objects, (2) determining whether an object is a human body, and (3) tracking a preset pointer part in the case of a human body, and (4) generating motion information for a pointer. Here, in the processes of (2) to (4), measured distance information from the first and second cameras 111 and 113 to the corresponding object is required.

이를 위해, 거리정보계산부(131)는, 영상수신부(115)로부터 실시간으로 입력받는 한 쌍의 디지털 영상을 이용하여 각 픽셀에 포착된 파사체까지의 거리정보를 픽셀 단위로 구하여, 3차원 심도 맵(3D Depth Map) 데이터를 계산한다. 따라서 심도 맵 데이터는 각 픽셀마다의 거리정보를 포함하게 된다.To this end, the distance information calculator 131 obtains distance information of the capsular body captured by each pixel in units of pixels using a pair of digital images received in real time from the image receiver 115, and then uses a three-dimensional depth map. Compute (3D Depth Map) data. Therefore, the depth map data includes distance information for each pixel.

여기서, 각 픽셀의 거리 정보는 종래에 알려진 스테레오 정합방법에 의해 구해지는 양안차 정보로서, 대한민국 등록특허 제0517876호의 "복수 영상 라인을 이용한 영상 정합 방법"이나 대한민국 등록특허 제0601958호의 "3차원 객체 인식을 위한 양안차 추정방법에 제시된 그래프 컷(Graph Cut) 알고리즘 등을 이용하여 계산할 수 있다. 따라서, 거리정보계산부(131)에서 계산한 심도 맵 데이터에는 각 픽셀에는 각 픽셀에 포착된 피사체까지의 실제 거리에 대한 정보가 포함된다.Here, the distance information of each pixel is binocular difference information obtained by a stereo matching method known in the art, and "image matching method using multiple image lines" of Korean Patent No. 0517876 or "3D object" of Korean Patent No. 0601958. The depth map data calculated by the distance information calculator 131 can be calculated using the graph cut algorithm presented in the binocular difference estimation method for recognition. Contains information about the actual distance.

<움직이는 객체의 추출 단계: S205 단계><Extraction step of moving object: step S205>

객체추출부(133)는 영상수신부(115)를 통해 입력되는 한 쌍의 디지털 이미지 중 어느 하나 이미지(또는 두 개 이미지 모두)에 대한 영상처리를 수행하여 움직이는 객체의 영역을 추출한다. The object extractor 133 extracts an area of a moving object by performing image processing on one image (or both images) of a pair of digital images input through the image receiver 115.

출원인은 이미 스테레오 카메라를 이용하여 움직이는 객체, 그 중에서도 특히 사람을 인식하는 방법에 관한 특허출원 제10-2010-0039302호 및 제10-2010-0039366호를 출원한 바 있다. Applicant has already applied for patent applications 10-2010-0039302 and 10-2010-0039366 relating to a method for recognizing a moving object, especially a person, using a stereo camera.

이에 의하면, 움직이는 객체의 추출은 새롭게 입력되는 영상에서 배경 영상을 뺀 차 영상(Different Image)를 구하는 방법으로 이루어진다. 다만, 본 발명에서 배경 영상은 고정 설정된 값일 수도 있으나, 움직이는 객체로 판단된 영역이 포함된 영상이더라도 포인터가 인식되지 않는 영상이라면 여기서의 배경 영상이 될 수 있다. 다만, 심도 맵 데이터를 구하거나, 심도 맵 데이터를 이용하여 아래에서 설명할 객체의 면적 또는 대표 길이를 구하기 위해 사용되는 기본 배경 영상은 기 설정된 영상이어야 한다. According to this, the extraction of the moving object is performed by obtaining a differential image obtained by subtracting a background image from a newly input image. However, in the present invention, the background image may be a fixed value, but even if the image includes the region determined as the moving object, the background image may be the background image if the pointer is not recognized. However, the basic background image used to obtain the depth map data or the area or representative length of the object to be described below using the depth map data should be a preset image.

<추출된 객체가 인체인지 판단: S207 단계><Determine whether the extracted object is a human body: step S207>

객체가 추출되면, 객체추출부(133)와 객체인식부(135)는 추출된 객체가 인체인지 여부를 우선 판단한다. 예컨대, 해당 객체가 사람인지 동물인지 아니면 사물인지를 판단한다. When the object is extracted, the object extractor 133 and the object recognizer 135 first determine whether the extracted object is a human body. For example, it is determined whether the object is a human, an animal, or an object.

객체인식을 위해, 객체추출부(133)는 차 영상으로부터 객체의 외곽선을 검출하고, 객체인식부(135)는 객체추출부(133)가 추출한 객체의 외곽선과 거리정보계산부(131)가 계산한 심도 맵 데이터를 이용하여 객체의 면적 또는 객체의 대표 길이를 구한다. For object recognition, the object extractor 133 detects the outline of the object from the difference image, and the object recognizer 135 calculates the outline and distance information calculator 131 of the object extracted by the object extractor 133. The depth map data is used to determine the area of the object or the representative length of the object.

객체인식부(135)는 계산된 객체의 면적 또는 대표 길이가 기 설정된 인체의 면적 또는 길이 범위 내에 속하는지를 판단하는 방법으로 추출된 객체가 인체인지를 판단할 수 있다. The object recognition unit 135 may determine whether the extracted object is a human body by determining whether the calculated area or representative length of the object falls within a preset area or length range of the human body.

객체의 외곽선 /면적 /대표 길이의 검출 및 계산에 대하여, 앞서 언급한 출원인의 특허출원 발명 제10-2010-0039302호 및 제10-2010-0039366호에는 외곽선 검출, 스테레오 카메라를 이용한 객체, 특히 사람을 인식하는 방법에 관하여 제시하고 있으며, 그 방법에 대하여 아래에서 다시 설명한다. Regarding the detection and calculation of the outline / area / representation of the object, the aforementioned patent applications Nos. 10-2010-0039302 and 10-2010-0039366 of the applicants mentioned above have outline detection, objects using stereo cameras, especially humans. It is presented on how to recognize the system, and the method will be described again below.

<객체의 중심선을 이용한 포인터 인식: S209 단계><Pointer Recognition Using Centerline of Object: Step S209>

모션추적부(137)는 객체추출부(133)가 추출한 객체에 대해 골격화 또는 세선화 알고리즘을 적용하여 1 픽셀의 폭을 가지는 객체의 중심축(Medial Axis)을 추출한다. 골격화 알고리즘에는 외곽선을 이용하는 중심축변환(MAT: Medial Axis Transform)알고리즘 또는 Zhang Suen 알고리즘과 같이 기 알려진 다양한 방식을 적용할 수 있다. The motion tracker 137 extracts a media axis of an object having a width of 1 pixel by applying a skeletonization or thinning algorithm to the object extracted by the object extractor 133. As the skeletalization algorithm, various known methods such as a Medial Axis Transform (MAT) algorithm using an outline or Zhang Suen algorithm can be applied.

예컨대, 중심축 변환에 의할 경우, 객체의 중심축(a)은 도 3에서처럼 객체(R) 내의 각 점(또는 픽셀)들 중에서 복수 개의 경계점을 가지는 점들의 집합이다. 여기서, 경계점은 외곽선(B) 상의 점들 중에서 객체 내의 해당 점과의 거리가 가장 가까운 점을 말하는 것으로, 외곽선상의 점 b1, b2는 객체(R) 내의 점 P1의 경계점이 된다. 따라서, 중심축 알고리즘은 경계점이 복수 개인 점들을 추출하는 과정이 되며 다음의 수학식 1과 같이 표현될 수 있다.For example, according to the central axis transformation, the central axis a of the object is a set of points having a plurality of boundary points among the respective points (or pixels) in the object R as shown in FIG. 3. Here, the boundary point refers to a point closest to the point in the object among the points on the outline B, and the points b1 and b2 on the outline become the boundary point of the point P1 in the object R. Therefore, the central axis algorithm is a process of extracting points having a plurality of boundary points and may be expressed as in Equation 1 below.

수학식 1

Equation 1

여기서, P_ma는 x의 집합으로 표시되는 중심축이고, x는 객체(R)내에 존재하는 점, b_min(x)는 점 x의 경계점의 수이다. 따라서, 중심축은 경계점의 수가 1보다 큰 점 x들의 집합이 된다. 여기서, 경계점을 계산하기 위해, 내부의 점 x에서 외곽선상의 임의의 픽셀까지의 거리를 구하는 방법(예컨대, 4-Distance, 8-Distance, Euclidean Distance 등)에 따라, 골격의 구조가 다소 바뀔 수 있다. Where P _ma is a central axis represented by a set of x, x is a point present in the object R, and b _min (x) is the number of boundary points of the point x. Thus, the central axis is a set of points x whose number of boundary points is greater than one. Here, in order to calculate the boundary point, the structure of the skeleton may change somewhat according to a method of obtaining a distance from an internal point x to an arbitrary pixel on the outline (for example, 4-Distance, 8-Distance, Euclidean Distance, etc.). .

그 밖에도, 객체가 비교적 간단한 형태의 것인 경우, 객체에 대한 가우시안 값의 피크값을 추출하는 방법으로 중심선을 추출할 수 있으며, 이러한 알고리즘에 의할 경우 S207 단계의 외곽선 검출은 생략될 수도 있다.In addition, when the object is in a relatively simple form, the center line may be extracted by extracting a peak value of the Gaussian value for the object, and in this case, the edge detection of step S207 may be omitted.

모션추적부(137)는 객체의 중심축이 추출되면, 그 중심축 정보를 이용하여 포인터를 인식하게 된다. 포인터는 머리, 손, 발과 같이 인체의 중심선의 단부에 위치하는 것이 바람직하므로, 모션추적부(137)의 포인터 인식은 추출된 객체의 중심선의 단부를 인식하는 것에 해당할 수 있다. When the central axis of the object is extracted, the motion tracking unit 137 recognizes the pointer using the central axis information. Since the pointer is preferably located at the end of the centerline of the human body, such as the head, hands, and feet, the pointer recognition of the motion tracker 137 may correspond to the end of the centerline of the extracted object.

<포인터의 움직임 추적: S211, S213 단계><Tracking the movement of the pointer: steps S211 and S213>

모션 인식은 움직임이 전제되므로, 모션추적부(137)가 인식한 포인터가 움직이지 않으면 해당 모션의 정보화도 없다. 따라서 일단 포인터가 인식된 후부터 추적이 종료될 때까지, 모션추적부(137)는 S201 단계에서 생성되어 연속적으로 입력되는 모든 영상 프레임에 대해, S203 내지 S209 단계의 영상처리를 반복 수행하면서 해당 포인터가 움직이는지를 판단한다(S211). Since motion recognition assumes movement, if the pointer recognized by the motion tracking unit 137 does not move, there is no information on the corresponding motion. Therefore, the motion tracking unit 137 repeatedly performs the image processing of steps S203 to S209 for all image frames generated in step S201 and continuously inputted until the tracking is finished until the pointer is recognized. It is determined whether the movement (S211).

앞서 언급한 바와 같이, 포인터가 머리, 손, 발 중 적어도 하나에 지정된 경우, 모션추적부(137)의 포인터 추적은 도 5에 도시된 것처럼 연석적으로 입력되어 연속적으로 입력되어 영상처리된 각 프레임의 영상에서 중심선의 단부의 움직임을 추적하는 것에 해당한다. As mentioned above, when the pointer is assigned to at least one of the head, the hand, and the foot, the pointer tracking of the motion tracking unit 137 is inputted in a chronological manner, as shown in FIG. Corresponds to tracking the movement of the end of the centerline in the image of.

도 5는 도 4의 사용자를 촬영한 영상들에 대한 영상처리 결과를 예시적으로 도시한 것이다. 도 5에는 순차적으로 영상처리되어 중심선(M1, M2, M3)이 추출된 영상(M1, M2, M3)이 도시되어 있으며, 사용자의 두 손이 포인터(m11, m12, m21, m22, m31, m32)로 설정된 경우이다. 따라서 모션추적부(137)는 각 영상(M1, M2, M3)에서 중심선의 두 단부(m11, m12, m21, m22, m31, m32)의 움직임을 추적한다.FIG. 5 exemplarily illustrates an image processing result of images of the user photographed in FIG. 4. In FIG. 5, images M1, M2, and M3 obtained by sequentially processing image lines and extracting the center lines M1, M2, and M3 are illustrated, and two hands of the user are pointers m11, m12, m21, m22, m31, and m32. If set to). Accordingly, the motion tracker 137 tracks the movement of the two ends m11, m12, m21, m22, m31, and m32 of the center line in each of the images M1, M2, and M3.

도 5에서는 각 중심선(M1, M2, M3)에서 왼손에 해당하는 단부(m11, m21, m31)만이 움직이고 있으므로, 결국은 해당 포인터(m11, m21, m31)에 대한 모션 정보만이 생성될 것이다. In FIG. 5, since only the ends m11, m21, and m31 corresponding to the left hand are moving in each center line M1, M2, and M3, only motion information for the corresponding pointers m11, m21, and m31 will be generated.

모션추적부(137)는 연속적으로 입력되어 영상처리된 각 프레임의 영상에서 포인터(m11, m12, m21, m22, m31, m32) 각각의 위치 정보를 추출하여 모션정보출력부(139)에게 제공한다. 여기서, 위치 정보는 영상에서의 포인터(또는 그 픽셀)의 좌표와, 해당 좌표 픽셀에서의 심도를 포함한다. 여기서, 심도는 해당 영상 프레임에 대하여 거리정보계산부(131)가 계산한 심도 맵 데이터로부터 추출한다. 이러한 모션추적부(137)의 포인터 추적은 포인터(m11, m12, m21, m22, m31, m32)가 움직이는 동안 계속된다. The motion tracking unit 137 extracts the position information of each of the pointers m11, m12, m21, m22, m31, and m32 from the image of each frame that is continuously input and processed and provides the motion information output unit 139. . Here, the position information includes the coordinates of the pointer (or the pixel) in the image and the depth at the corresponding coordinate pixel. Here, the depth is extracted from the depth map data calculated by the distance information calculator 131 for the corresponding image frame. The pointer tracking of the motion tracking unit 137 continues while the pointers m11, m12, m21, m22, m31, and m32 are moved.

<포인터의 모션정보 생성: S215 단계><Generate Motion Information of Pointer: Step S215>

모션정보출력부(139)는 모션추적부(137)가 제공하는 각 프레임의 영상에서의 포인터(m11, m12, m21, m22, m31, m32) 각각의 위치정보를 기초로, 해당 포인터의 이동방향, 실제 이동거리를 포함하는 포인터의 모션 정보를 계산한다. 또한, 모션정보출력부(139)는 영상 프레임 주기에 기초하여 이동 속도를 계산한다.The motion information output unit 139 is based on the location information of each of the pointers m11, m12, m21, m22, m31, and m32 in the image of each frame provided by the motion tracker 137, and the direction of movement of the pointer. The motion information of the pointer including the actual moving distance is calculated. In addition, the motion information output unit 139 calculates a moving speed based on the image frame period.

이동 방향은 포인터의 좌표와 그 좌표의 심도 정보로 3차원 가상공간 상의 포인터의 이동 벡터를 그림으로써 당연히 추출할 수 있다. The direction of movement can naturally be extracted by drawing the movement vector of the pointer in the three-dimensional virtual space with the coordinates of the pointer and the depth information of the coordinates.

포인터의 실제 이동 거리는 포인터 좌표 픽셀의 단위 픽셀당 가로방향 거리와 세로방향 거리, 그리고 심도 맵 데이터를 이용할 수 있다. 여기서, 단위 픽셀당 거리는 아래의 수학식 6 등을 이용할 수 있다. 수학식 6은 각 픽셀의 세로 길이이지만, 동일한 방법으로 가로 길이를 구할 수 있을 것이다. The actual moving distance of the pointer may use the horizontal distance, the vertical distance, and the depth map data per unit pixel of the pointer coordinate pixel. Here, the distance per unit pixel may use Equation 6 below. Equation 6 is the vertical length of each pixel, but the horizontal length may be obtained in the same manner.

앞서 설명한 바와 같이, 왼손 포인터(m11, m21, m31)만 움직이므로, 왼손 포인터(m11, m21, m31)에 대한 모션 정보만이 생성될 것이다. As described above, since only the left hand pointers m11, m21 and m31 move, only motion information for the left hand pointers m11, m21 and m31 will be generated.

이상의 방법으로 본 발명의 3차원 동작 인식장치(100)의 스테레오 카메라를 이용한 모션 인식방법이 수행된다.In the above method, a motion recognition method using a stereo camera of the 3D motion recognition apparatus 100 of the present invention is performed.

이하에서는, S207 단계의 외곽선 검출, 객체의 면적 및 대표 길이 계산에 대하여 먼저 한 특허출원 제10-2010-0039302호 및 제10-2010-0039366호를 기초로 간단히 설명한다. Hereinafter, the outline detection of the step S207, the calculation of the area and the representative length of the object will be briefly described based on the first patent application Nos. 10-2010-0039302 and 10-2010-0039366.

먼저, 추출된 객체의 외곽선 검출을 위해, 객체추출부(133)는 S205 단계의 뺄셈 연산의 결과 영상에서 외곽선 검출을 수행하여 움직이는 객체의 외곽선을 검출한다. 외곽선 검출은 객체의 경계선 넓이와 형태에 따라 여러 종류의 형태의 에지를 사용하여 처리된다. First, in order to detect the outline of the extracted object, the object extractor 133 detects the outline of the moving object by performing outline detection on the result of the subtraction operation of step S205. Edge detection is handled using different types of edges, depending on the borderline width and shape of the object.

객체추출부(133)는 외곽선 검출을 위해, 뺄셈 영상에 모폴로지(Morphology) 연산을 적용하여 잡음을 제거하고, 외각선이나 골격선을 간단하게 할 수 있다. 모폴로지 연산에는 기본적으로 잡음을 제거하는 침식(Erosion) 연산과 객체 내의 작은 구멍을 메우는 팽창(Dilation) 연산이 사용될 수 있다. The object extractor 133 may remove a noise by applying a morphology operation to a subtraction image and simplify an outline or a skeleton line to detect an outline. The morphology operation can basically use erosion operation to remove noise and dilation operation to fill small holes in an object.

객체의 면적 계산은, S203 단계에서 추출된 객체가 위치한 거리(do)에서의 픽셀 당 실제 면적(이하, 픽셀의 '단위 면적'이라 함)을 구한 다음, 해당 객체의 외곽선 내부에 포함된 픽셀의 수를 곱하는 방법으로 이루어진다.In calculating the area of an object, the actual area per pixel (hereinafter, referred to as a 'unit area' of a pixel) at a distance (do) at which the object is extracted in operation S203 is obtained, and then the pixel included in the outline of the object is calculated. This is done by multiplying numbers.

도 6을 참조하면, 기본 배경영상을 기준으로 최대 심도(D)에서의 전체 프레임에 대응하는 실제면적(Nmax)과, 추출된 객체의 위치(do)에서의 전체 프레임에 대응하는 실제면적 N(do)이 표시되어 있다. 먼저 해당 객체가 위치하는 거리(do)에서의 프레임 전체에 대응되는 실제면적 N(do)은 다음의 수학식 2와 같이 구할 수 있다. Referring to FIG. 6, the actual area Nmax corresponding to the entire frame at the maximum depth D based on the basic background image, and the actual area N corresponding to the entire frame at the position of the extracted object do ( do) is displayed. First, the actual area N (do) corresponding to the entire frame at a distance do where the object is located may be obtained as in Equation 2 below.

수학식 2

Equation 2

여기서, Nmax은 기존 배경 영상을 기준으로 최대 거리(do)에서의 전체 프레임(예컨대, 720×640 픽셀)에 대응되는 실제 면적이다. Here, Nmax is an actual area corresponding to the entire frame (eg, 720 × 640 pixels) at the maximum distance do based on the existing background image.

다음으로, 객체가 위치하는 거리(do)에서의 전체 프레임에 대응되는 실제 면적 N(do)을 프레임 전체의 픽셀 수(Q, 예컨대, 460,800=720×640)로 나눔으로써, 객체 영역에 포함된 픽셀의 단위 면적 N_p(do)을 다음의 수학식 3과 같이 구한다. Next, by dividing the actual area N (do) corresponding to the entire frame at the distance do by which the object is located by the number of pixels (eg, 460,800 = 720 × 640) of the entire frame, the object area is included in the object area. The unit area N _{p (do)} of the pixel is obtained as in Equation 3 below.

수학식 3

Equation 3

여기서, Q는 전체 픽셀의 수이다. 수학식 3에 의하면, N_p(do)은 3차원 심도 맵 데이터의 거리 정보로부터 확인한 해당 객체까지의 거리(do)에 따라 달라짐을 알 수 있다.Where Q is the total number of pixels. According to Equation 3, it can be seen that N _{p (do)} depends on the distance do to the corresponding object confirmed from the distance information of the 3D depth map data.

마지막으로, 객체의 면적은 앞에서 설명한 것처럼 픽셀의 단위 면적 N_p(do)에 해당 외곽선 내부에 포함되는 픽셀의 수(qc)를 곱함으로써 다음의 수학식 4와 같이 구할 수 있다. Finally, the area of the object can be obtained as shown in Equation 4 by multiplying the unit area N _{p (do)} of the pixel by the number qc of the pixels included in the outline as described above.

수학식 4

Equation 4

여기서, qc는 객체에 포함된 픽셀의 수이다. Where qc is the number of pixels included in the object.

이하에서는 객체의 대표 길이를 계산하는 과정에 대하여 간단히 설명한다. Hereinafter, the process of calculating the representative length of the object will be briefly described.

수학식 1을 중심으로 설명한 바와 같이 중심선이 추출되면, 심도 맵 데이터를 이용하여 객체의 대표 길이를 구한다. 객체의 대표 길이는 객체를 대표하는 것으로 설정된 객체의 실제 길이로서 영상으로부터 계산된 값이며, 중심축의 실제 길이, 객체의 실제 폭 또는 객체의 실제높이 등이 해당할 수 있다. 다만, 객체의 대표 길이는 카메라의 위치, 촬영각도 및 촬영영역의 특성 등에 따라 영향을 받게 된다. As described with reference to Equation 1, when the center line is extracted, the representative length of the object is obtained using the depth map data. The representative length of the object is a value calculated from an image as an actual length of an object set to represent the object, and may correspond to an actual length of a central axis, an actual width of an object, or an actual height of an object. However, the representative length of the object is affected by the position of the camera, the shooting angle, and the characteristics of the shooting area.

나아가, 객체의 실제길이의 계산은, 객체가 위치한 거리(do)에서의 픽셀 당 실제 길이(이하, 픽셀의 '단위 길이'라 함)를 구한 다음, 해당 객체를 대표하는 픽셀의 수를 곱하는 방법으로 이루어진다. 여기서, 객체를 대표하는 픽셀의 수는 앞서 중심축을 형성하는 픽셀의 수, 해당 객체의 폭이나 높이가 되는 픽셀의 수 등이 해당할 수 있다. Further, the calculation of the actual length of an object is a method of obtaining the actual length per pixel (hereinafter referred to as the 'unit length' of a pixel) at a distance (do) where the object is located, and then multiplying the number of pixels representing the object. Is done. Here, the number of pixels representing the object may correspond to the number of pixels forming the central axis, the number of pixels to be the width or height of the object.

객체를 대표하는 픽셀의 수로서의, 객체의 폭이나 높이는 객체 영역의 x축좌표의 범위 또는 y축좌표의 범위를 통해 구해질 수 있으며, 중심축의 길이는 예컨대 중심축에 포함된 픽셀의 수를 모두 더함으로써 구할 수 있다.The width or height of the object, as the number of pixels representing the object, can be obtained through the range of the x-axis coordinate or the y-axis coordinate of the object area, and the length of the central axis is, for example, the number of pixels included in the central axis. It can be obtained by adding.

특정 픽셀의 단위 길이는 픽셀마다(정확하게는 픽셀의 심도에 따라) 달라지며, 도 6를 참조하여 다음과 같이 구할 수 있다. 여기서, 설명의 편리를 위해, 영상 프레임의 크기를 720×640 픽셀이라 가정한다.The unit length of a particular pixel varies from pixel to pixel (exactly depending on the depth of the pixel), and can be obtained as follows with reference to FIG. 6. Here, for convenience of explanation, it is assumed that the size of the image frame is 720x640 pixels.

도 6에서, 기본 배경영상을 기준으로 최대 심도(D)에서의 전체 프레임의 세로축(또는 가로축)에 대응하는 실제길이 Lmax와, 추출된 객체의 위치 l에서의 전체 프레임의 세로축(또는 가로축)에 대응하는 실제길이 L(do)가 표시되어 있다. 먼저 해당 객체가 위치하는 심도 do에서의 프레임 전체의 세로축(또는 가로축)에 대응되는 실제길이 L(do)는 다음의 수학식 5와 같이 구할 수 있다. In FIG. 6, the actual length Lmax corresponding to the vertical axis (or horizontal axis) of the entire frame at the maximum depth D based on the basic background image, and the vertical axis (or horizontal axis) of the entire frame at the position l of the extracted object. The corresponding actual length L (do) is indicated. First, the actual length L (do) corresponding to the vertical axis (or the horizontal axis) of the entire frame at the depth do where the object is located may be obtained as in Equation 5 below.

수학식 5

Equation 5

여기서, L(do)는 심도 do에서의 프레임 전체의 세로축(또는 가로축)에 대응되는 실제 길이이고, Lmax는 기존 배경영상을 기준으로 최대 심도(D)에서의 전체 프레임의 세로축(또는 가로축)에 대응되는 실제 길이다. Here, L (do) is the actual length corresponding to the vertical axis (or horizontal axis) of the entire frame at the depth do, and Lmax is the vertical axis (or horizontal axis) of the entire frame at the maximum depth D based on the existing background image. The corresponding actual length.

다음으로, 객체가 위치하는 거리(do)에서의 전체 프레임의 세로축(또는 가로축)에 대응되는 실제 길이 L(do)을 프레임 전체의 세로축(또는 가로축)의 픽셀 수(Qx, Qy, 예에서 Qx=720, Qy=640)로 나눔으로써, 객체 영역에 포함된 픽셀의 단위 길이 L_p(do)을 다음의 수학식 6과 같이 구할 수 있다. Next, the actual length L (do) corresponding to the vertical axis (or the horizontal axis) of the entire frame at the distance (do) at which the object is located is determined by the number of pixels Qx, Qy in the vertical axis (or the horizontal axis) of the entire frame. = 720, Qy = 640), the unit length L _p (do) of the pixels included in the object region can be obtained as in Equation 6 below.

수학식 6

Equation 6

여기서, L_p(do)는 심도 do에 위치한 객체 영역에 포함된 픽셀의 단위 길이, Qy는 프레임 전체의 세로축의 픽셀 수이다. 수학식 6에 의하면, L_p(do)은 3차원 심도 맵 데이터의 거리 정보로부터 확인한 해당 객체까지의 심도(do)와 맵 데이터 상의 최대 심도에 따라 달라짐을 알 수 있다.Here, L _p (do) is the unit length of the pixel included in the object region located at the depth do, and Qy is the number of pixels along the vertical axis of the entire frame. According to Equation 6, it can be seen that L _p (do) depends on the depth to the corresponding object confirmed from the distance information of the 3D depth map data and the maximum depth on the map data.

앞서, S215 단계에서 구하는 포인터의 이동 거리를 계산하기 위해 사용되는 단위 픽셀의 세로축은 수학식 6을 그대로 이용하면 되고, 가로축 길이는 수학식 6에서 Qy를 대신하여 프레임 전체의 가로축 Qx를 입력함으로써 구해질 수 있을 것이다. Previously, the vertical axis of the unit pixel used to calculate the movement distance of the pointer obtained in step S215 may be used as it is, and the horizontal axis length is obtained by inputting the horizontal axis Qx of the entire frame instead of Qy in Equation 6. It can be done.

픽셀의 단위 길이가 구해지면, 객체인식부(135)는 객체의 대표 길이를 구한다. 객체의 대표 길이는 픽셀의 단위 길이 L_p(do)에 해당 객체를 대표하는 픽셀의 수 qo를 곱함으로써 다음의 수학식 7과 같이 구할 수 있다. When the unit length of the pixel is obtained, the object recognition unit 135 calculates the representative length of the object. The representative length of the object may be calculated by Equation 7 by multiplying the unit length L _p (do) of the pixel by the number qo of the pixels representing the object.

수학식 7

Equation 7

여기서, qo는 해당 객체를 대표하는 픽셀의 수이다.Here, qo is the number of pixels representing the object.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.Although the above has been illustrated and described with respect to preferred embodiments of the present invention, the present invention is not limited to the above-described specific embodiments, it is usually in the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

Claims

Generating a stereo image by photographing a moving object with a stereo camera;

Calculating depth map data for each pixel in the stereo image;

Extracting the moving object from the stereo image;

Recognizing a pointer to be a motion recognition target from an object region of the image through image processing of the image;

Tracking the change of the position of the pointer in the three-dimensional space by performing the calculating of the depth map data or recognizing a pointer for each image frame continuously generated in the generating step; And

Computing and outputting information on the three-dimensional moving direction of the pointer using the changed three-dimensional spatial position information of the tracked pointer, 3D motion recognition method using a stereo camera.

The method of claim 1,

Recognizing the pointer,

And extracting a central axis of the object and recognizing an end of the central axis as a pointer.

The method of claim 1,

In the step of tracking the position change, the position of the pointer in the three-dimensional space is a coordinate in the coordinates of the pointer in each image frame and the depth in the coordinates extracted from the depth map data using a stereo camera 3D motion recognition method.

The method of claim 1,

The outputting step may be performed by calculating a distance per unit pixel of the object area in the image and multiplying the number of pixels moved by the pointer, thereby calculating the actual spatial movement distance of the pointer.

The distance per unit pixel is expressed by the following equation

To obtain,

The L _p (do) is a unit length of a pixel included in the object area located at a depth do, the Qxy is a three-dimensional motion recognition method using a stereo camera, characterized in that the number of pixels in the horizontal or vertical axis of the entire frame.

A stereo camera unit generating a stereo image by photographing a moving object with a stereo camera;

A distance information calculator for calculating depth map data for each pixel in each frame of the stereo image which is continuously input from the stereo camera unit;

An object extracting unit extracting the moving object from each of the image frames;

A motion tracking unit for tracking a position change in the three-dimensional space of the pointer by performing a process for recognizing a pointer which is a motion recognition target among the object regions extracted by the object extraction unit for each of the image frames; And

3D motion recognition using a stereo camera, characterized in that it comprises a motion information output unit for calculating and outputting information about the three-dimensional movement direction of the pointer using the changed three-dimensional spatial position information of the tracked pointer. Device.