KR101942808B1

KR101942808B1 - Apparatus for CCTV Video Analytics Based on Object-Image Recognition DCNN

Info

Publication number: KR101942808B1
Application number: KR1020180139043A
Authority: KR
Inventors: 장정훈; 전창호
Original assignee: 주식회사 인텔리빅스; 쿠도커뮤니케이션 주식회사
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2019-01-29
Anticipated expiration: 2038-11-13

Abstract

본 발명은 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치에 관한 것으로서, 본 발명의 실시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치는 비디오 제공 장치에서 제공하는 복수의 비디오 프레임 영상을 수신하여 영상처리의 부하를 줄이는 포맷으로 변환하는 영상 변환부, 및 상기의 포맷으로 변환한 복수의 비디오 프레임 영상에서 모션 추적을 위한 복수의 추적 객체에 대한 객체 이미지를 추출하여 추출한 객체 이미지를 객체 이미지 인식 방식을 사용하는 DCNN에 적용하여 적용 결과를 근거로 복수의 추적 객체를 분류하며, 분류한 복수의 추적 객체에서 지정 기준을 벗어나는 것으로 확인된 사용자 비관심 객체를 복수의 추적 객체의 대상에서 제거해 획득한 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출하는 모션 기반 영상 분석부를 포함할 수 있다.The present invention relates to an object image recognition DCNN-based CCTV image analysis apparatus, and an object image recognition DCNN-based CCTV image analysis apparatus according to an embodiment of the present invention receives a plurality of video frame images provided by a video providing apparatus, And an object image extracting unit for extracting and extracting object images of a plurality of tracking objects for motion tracking in the plurality of video frame images converted into the above format using an object image recognition method DCNN, and classifies the plurality of tracked objects based on the application result. The user interest object obtained by removing the user non-interest object from the plurality of tracked objects, Motion based detection of events based on trace information and assignment rules And an image analysis unit.

Description

[0001] The present invention relates to an object image recognition DCNN-based CCTV image analysis apparatus,

본 발명은 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치에 관한 것으로서, 더 상세하게는 가령 중앙처리장치(CPU)에서도 CCTV 영상으로부터 관심 객체들의 검출/추적이 가능하고, 이를 바탕으로 지정된 이벤트를 감지하여 경보를 주는 CCTV 영상분석장치로서, 객체 이미지 인식 DCNN(Deep Convolutional Neural Network)을 이용하여 강건하고 효율적인 영상 분석을 수행하는 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치에 관한 것이다.The present invention relates to an object image recognition DCNN-based CCTV image analysis apparatus, and more particularly, to a central processing unit (CPU) capable of detecting / tracking objects of interest from a CCTV image, And more particularly, to a CCTV image analysis apparatus based on an object image recognition DCNN that performs robust and efficient image analysis using a Deep Convolutional Neural Network (DCNN).

최근 관공서나 기업 등에서 보안/안전을 위해 설치하는 CCTV 카메라의 수는 폭발적으로 증가하고 있다. 그러나 설치한 CCTV 카메라 수에 비해 CCTV 카메라 영상을 모니터링하는 요원의 수는 턱없이 부족한 실정이다. 이러한 문제점을 해결하기 위해 지능형 CCTV 영상 감시 시스템의 도입이 활발하게 이루어지고 있다.Recently, the number of CCTV cameras installed for security / safety in government offices and corporations is increasing explosively. However, compared to the number of installed CCTV cameras, the number of CCTV camera monitoring personnel is insufficient. To solve these problems, the introduction of the intelligent CCTV video surveillance system is actively performed.

지능형 CCTV 영상 감시 시스템의 핵심을 이루는 CCTV 영상분석장치는 CCTV 카메라로부터 비디오 영상을 받아 이동 객체들을 검출/추적하고, 이를 바탕으로 “금지된 구역에 침입 발생” 등과 같은 이상 상황을 자동으로 감지하여 경보를 발생시킨다. 모니터링 요원은 다수의 (무의미한) CCTV 영상을 항상 주시할 필요 없이 경보가 발생한 CCTV 영상만 확인함으로써, 다수의 CCTV 카메라 영상을 효과적으로 모니터링할 수 있다.The CCTV image analyzer, which is the core of the intelligent CCTV video surveillance system, detects and tracks moving objects by receiving video images from CCTV cameras and automatically detects abnormal situations such as " . The monitoring personnel can effectively monitor multiple CCTV camera images by checking only the CCTV images for which the alarm has occurred without having to constantly watch a large number of (meaningless) CCTV images.

그러나 기존의 CCTV 영상분석장치의 대부분은 모션 기반의 객체 검출 알고리즘을 사용하는 관계로, 실제 관심 객체(예: 대표적으로 사람 및 차량)의 검출 이외에도 다양한 원인(예: 바람에 흔들리는 나뭇가지, 출렁이는 물결, 움직이는 그림자, 갑작스러운 조명 변화, 반짝이는 불빛, 눈/비 등)에 의한 객체 오검출이 빈번하게 발생한다. 이를 통해 오경보 또한 빈번하게 발생하여 효율적인 모니터링을 할 수 없게 만든다.However, most of existing CCTV image analysis devices use motion-based object detection algorithms. In addition to detection of actual objects of interest (eg, people and vehicles), there are various causes (eg, Objects are frequently detected by waves, moving shadows, sudden lighting changes, flashing lights, snow / rain, etc.). This leads to frequent false alarms, making effective monitoring impossible.

컴퓨터 비전(Computer Vision) 연구자들은 모션 기반의 객체 검출 기술 이외에 2000년대 중반부터 객체 형상 학습 기반의 객체 검출 기술을 발전시켜 왔다. 상기 기술에서는 특정 타입의 객체(예를 들면, 보행자)의 다양한 학습 이미지들로부터 객체 형상 특징을 추출하여 학습하고, 학습된 객체의 형상 특징과 유사한 형상 특징을 보이는 영역을 영상에서 찾음에 의해 객체 검출을 수행한다. 대표적으로 Viola-Jones, HOG, ICF, ACF, DPM 등의 객체 검출 기술이 있다. 그러나 이러한 객체 검출 기술들의 검출 성능 한계 및 처리 부하 문제로 상용 CCTV 영상분석장치에 적용하기에는 어려움이 있었다.Computer Vision Researchers have developed object detection technology based on object shape learning since the mid - 2000s in addition to motion - based object detection technology. In the above technique, object shape features are extracted and learned from various learning images of a specific type of object (for example, a pedestrian), and an area showing shape characteristics similar to the shape features of the learned object is found in the image, . Typically, there are object detection technologies such as Viola-Jones, HOG, ICF, ACF, and DPM. However, it has been difficult to apply it to commercial CCTV image analysis devices due to the detection performance limitations and processing load problems of such object detection techniques.

이런 와중에 2012년도에 캐나다 토론토 대학의 G. Hinton 교수 팀이 AlexNet이라는 DCNN을 이용하여, ILSVRC(ImageNet Large Scale Visual Recognition Challenge)에서 기존의 이미지 인식 알고리즘들과는 압도적인 성능 차이로 우승을 하게 됨에 따라, 컴퓨터 비전 분야에서 딥러닝(Deep Learning) 기술이 주목을 받기 시작하였고, 그 후 딥러닝 기술을 이용하여 컴퓨터 비전의 각종 문제들을 해결하려는 시도가 이어져 왔다. In the meantime, the team of Professor G. Hinton of the University of Toronto in Toronto will use DCNN, AlexNet, to win the overwhelming performance difference from the existing image recognition algorithms in ILSVRC (ImageNet Large Scale Visual Recognition Challenge) In the field of vision, Deep Learning technology has been attracting attention, and then attempts have been made to solve various problems of computer vision using deep learning technology.

2014년부터 DCNN 기반의 객체 검출 기술들이 발표되기 시작하였다. 이들 DCNN 기반의 객체 검출 기술은 기존의 객체 검출 기술의 성능을 훨씬 뛰어 넘는 검출 성능을 제공한다. 대표적으로 Fast/Faster R-CNN, RFCN, SSD, YOLO 등의 객체 검출 기술이 있다.Object detection technologies based on DCNN have been launched since 2014. These DCNN-based object detection technologies provide detection performance well beyond that of existing object detection techniques. Typically, there are object detection technologies such as Fast / Faster R-CNN, RFCN, SSD and YOLO.

그러나 DCNN 기반의 객체 검출 기술을 상용 CCTV 영상분석장치에 적용하기에는 여전히 여러 제약점들이 있다. 대표적인 제약점은 DCNN 기반의 객체 검출기를 이용해 비디오를 실시간 처리하기 위한 하드웨어 비용이 매우 높다는 점이다. 통상적인 DCNN 기반의 객체 검출기는 한 장의 비디오 프레임으로부터 객체를 검출하는 데에도 상당히 많은 연산량을 요구하기 때문에, 일반 CPU에서 DCNN 기반의 객체 검출기를 이용하여 비디오를 실시간으로 처리(예: 통상적으로 초당 7 프레임 이상 객체 검출 수행 필요)하기에는 매우 어렵다. 따라서 DCNN 기반의 객체 검출기를 이용하여 비디오를 실시간으로 처리하려면, 대규모 병렬 연산이 가능한 GPU가 반드시 요구된다. 또한 GPU를 사용한다 하더라도, 성능이 우수한 고가의 GPU를 사용하지 않는 이상 하나의 영상분석장치에서 여러 개의 비디오 스트림을 동시에 실시간 처리하기는 어렵다.However, there are still several limitations in applying DCNN-based object detection technology to commercial CCTV image analysis devices. A typical limitation is that the hardware cost for real-time video processing using a DCNN-based object detector is very high. Since a typical DCNN-based object detector requires a considerable amount of computation to detect an object from a single video frame, the DCNN-based object detector in the general CPU is used to process the video in real time (for example, It is very difficult to perform frame over object detection). Therefore, to process video in real time using a DCNN-based object detector, a GPU capable of large-scale parallel operation is indispensable. Even if the GPU is used, it is difficult to process several video streams simultaneously in real time in one image analysis device without using an expensive GPU with high performance.

한국등록특허공보 제10-1040049호(2011.06.02.)Korean Registered Patent No. 10-1040049 (June 23, 2011) 한국등록특허공보 제10-1173853호(2012.08.08.)Korean Registered Patent No. 10-1173853 (August 8, 2012) 한국등록특허공보 제10-1178539호(2012.08.24.)Korean Registered Patent No. 10-1178539 (Aug. 24, 2012) 한국등록특허공보 제10-1748121호(2017.06.12.)Korean Registered Patent No. 10-1748121 (Jun. 한국등록특허공보 제10-1789690호(2017.10.18.)Korean Registered Patent No. 10-1789690 (Oct. 18, 2017) 한국등록특허공보 제10-1808587호(2017.12.07.)Korean Registered Patent No. 10-1808587 (July 07, 2017) 한국공개특허공보 제10-2018-0072561호(2018.06.29.)Korean Patent Publication No. 10-2018-0072561 (June 29, 2018) 한국등록특허공보 제10-1850286호(2018.04.13.)Korean Registered Patent No. 10-1850286 (Apr. 13, 2018) 한국공개특허공보 제10-2018-0107930호(2018.10.04.)Korean Patent Publication No. 10-2018-0107930 (Oct.

“Pedestrian Detection: An Evaluation of the State of the Art”IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 4, April 2012"Pedestrian Detection: An Evaluation of the State of the Art" IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 4, April 2012 “Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Proceedings of the IEEE, Vol. 105, Issue 12, Dec. 2017"Efficient Processing of Deep Neural Networks: A Tutorial and Survey", Proceedings of the IEEE, Vol. 105, Issue 12, Dec. 2017 “Object Detection with Deep Learning: A Review”arXiv.org, arXiv:1807.05511 [cs.CV], Jul. 2018"Object Detection with Deep Learning: A Review" arXiv.org, arXiv: 1807.05511 [cs.CV], Jul. 2018

본 발명의 실시예는, 가령 중앙처리장치(CPU)에서도 CCTV 영상으로부터 관심 객체들의 검출/추적이 가능하고, 이를 바탕으로 지정된 이벤트를 감지하여 경보를 주는 CCTV 영상분석장치로서, 객체 이미지 인식 DCNN을 이용하여 강건하고 효율적인 영상 분석을 수행하는 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치를 제공함에 그 목적이 있다.An embodiment of the present invention is a CCTV image analyzing apparatus capable of detecting / tracking interested objects from a CCTV image, for example, in a central processing unit (CPU) The object of the present invention is to provide an object image recognition DCNN-based CCTV image analysis apparatus that performs robust and efficient image analysis.

본 발명의 실시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치는, 영상 내의 객체에 대한 인식결과를 제공해주는 DCNN(Deep Convolutional Neural Network)을 이용해 영상 분석을 수행하는 이미지 인식 DCNN 기반 CCTV 영상분석장치로서, 제1 픽셀 포맷, 제1 해상도 및 제1 프레임률(frame rate)을 갖는 복수의 제1 비디오 프레임으로 구성되는 제1 영상 데이터를 비디오 제공 장치로부터 수신하며, 상기 수신한 제1 영상 데이터의 상기 제1 해상도, 상기 제1 프레임률 및 상기 제1 픽셀 포맷을 각각 상기 제1 해상도보다 낮은 제2 해상도, 상기 제1 프레임률보다 낮은 제2 프레임률 및 상기 제1 픽셀 포맷(pixel format)과 다른 형식을 갖는 이종의 제2 픽셀 포맷의 제2 영상 데이터로 변환하는 영상 변환부 및 상기 변환한 제2 영상 데이터의 비디오 프레임에서 모션(motion) 기반으로 복수의 이동 객체의 검출 및 추적을 수행하고, 상기 복수의 추적중인 객체의 이미지를 각각 추출하며, 상기 추출한 객체 이미지를 상기 DCNN에 입력하여 얻은 인식 결과를 이용하여 상기 복수의 추적 객체를 구분하며, 상기 구분한 복수의 추적 객체에서 지정 기준을 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 복수의 추적 객체의 대상에서 제거하여 획득되는 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출하는 모션 기반 영상 분석부;를 포함하되, 상기 영상 변환부는 상기 제1 프레임률에 따라 입력되는 상기 복수의 비디오 프레임을 수신하는 비디오 프레임 획득부, 상기 입력된 복수의 제1 비디오 프레임보다 적은 수의 비디오 프레임을 샘플링하여 복수의 제2 비디오 프레임을 생성하는 비디오 프레임 서브샘플링부, 상기 생성한 복수의 제2 비디오 프레임을 상기 제2 해상도로 변환하는 비디오 프레임 스케일링부 및 상기 제2 해상도로 변환된 상기 복수의 제2 비디오 프레임의 상기 제1 픽셀 포맷을 상기 제2 픽셀 포맷으로 변환하여 생성한 상기 제2 영상 데이터를 상기 모션 기반 영상 분석부로 제공하는 픽셀 포맷 변환부를 포함한다.The object image recognition DCNN-based CCTV image analysis apparatus according to an embodiment of the present invention includes an image recognition DCNN-based CCTV image analysis apparatus that performs image analysis using a DCNN (Deep Convolutional Neural Network) , The first video data being composed of a plurality of first video frames having a first pixel format, a first resolution and a first frame rate, from the video providing apparatus, The first resolution, the first frame rate, and the first pixel format are respectively referred to as a second resolution lower than the first resolution, a second frame rate lower than the first frame rate, An image converter for converting the first image data into a second image data of a different second pixel format having a different format, A plurality of tracking objects are detected and tracked, a plurality of images of the objects being tracked are respectively extracted, and the plurality of tracking objects are identified using recognition results obtained by inputting the extracted object images to the DCNN The method includes the steps of: identifying a user non-interest object that is out of the plurality of tracking objects identified by the user and notifying the user that the user interest object is out of the plurality of tracking objects; And a motion-based image analysis unit for detecting an event based on the first frame rate, wherein the image conversion unit includes a video frame acquisition unit that receives the plurality of video frames input according to the first frame rate, A smaller number of video frames are sampled to generate a plurality of second video frames A video frame scaling unit for converting the plurality of generated second video frames to the second resolution, and a second pixel unit for converting the first pixel format of the plurality of second video frames, And a pixel format conversion unit for providing the second image data generated by converting the second pixel format to the motion-based image analysis unit.

상기 모션 기반 영상 분석부는 상기 제2 영상 데이터와 학습된 배경 영상을 이용하여 차 영상(difference image)을 생성하고, 상기 생성한 차 영상에서 노이즈를 제거하여 상기 복수의 이동 객체에 대한 모션 영역을 검출하는 모션 영역 검출부, 상기 검출한 모션 영역과 상기 제2 영상데이터를 이용하여 상기 복수의 이동 객체를 검출하고 추적하는 객체 추적부, 상기 복수의 제2 비디오 프레임 중 하나의 비디오 프레임에서 추출한 상기 복수의 추적 객체의 객체 이미지를 상기 DCNN에 적용해 얻은 제1 인식결과 및 상기 복수의 제2 비디오 프레임 중 상기 하나의 비디오 프레임과 일정 시간 간격을 두고 입력된 적어도 하나의 다른 비디오 프레임에서 추출한 상기 복수의 추적 객체의 객체 이미지를 상기 DCNN에 적용해 얻은 제2 인식결과를 근거로 상기 복수의 추적 객체를 구분하는 추적 객체 분류부, 상기 구분한 복수의 추적 객체 중에서 지정 기준에 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 객체 추적부의 상기 복수의 추적 객체의 대상에서 제거하는 비관심 추적 객체 제거부 및 상기 제거에 따라 획득되는 사용자 관심 객체의 추적 정보가 지정 규칙을 만족하는 이벤트를 검출하는 관심 추적 객체 기반 이벤트 검출부를 포함할 수 있다.The motion-based image analyzing unit generates a difference image using the second image data and the learned background image, removes noise from the generated difference image, and detects a motion region for the plurality of moving objects An object tracking unit for detecting and tracking the plurality of moving objects by using the detected motion area and the second image data, and an object tracking unit for detecting and tracking the plurality of moving objects using the detected motion area and the second image data, A first recognition result obtained by applying an object image of a tracked object to the DCNN and a second recognition result obtained by applying the plurality of second tracks extracted from at least one other video frame inputted at a predetermined time interval to the one of the plurality of second video frames Based on a second recognition result obtained by applying an object image of an object to the DCNN, A tracking object classifying unit for classifying a plurality of tracking objects, a tracking object classifying unit for classifying objects, a user non-interest object that is out of the plurality of classified tracking objects, And a tracking object-based event detection unit that detects an event in which the tracking information of the user interest object obtained in accordance with the cancellation and the tracking information of the user interest object satisfy the specifying rule.

상기 추적 객체 분류부는 추적 객체 분류부는, 상기 제1 인식결과 및 상기 제2 인식결과를 점수로 각각 계산 및 누적하여 누적 점수가 가장 높은 유형(class)을 추적 객체의 유형으로 확정할 수 있다.The tracked object classifier may calculate and accumulate the first recognition result and the second recognition result as scores, respectively, and determine a class having the highest cumulative score as the type of the tracked object.

상기 비관심 추적 객체 제거부는 상기 확정된 유형의 객체에서 상기 지정 기준을 벗어나는 사용자 비관심 추적 객체를 확인하여 상기 복수의 추적 객체의 대상에서 제거할 수 있다. The non-interest tracking object removal unit may identify a user non-interest tracking object that deviates from the specified reference in the determined type of object and remove the object from the plurality of tracking objects.

상기 영상 변환부는 휘도신호(Y), 적색신호의 차(U) 및 휘도신호와 청색성분의 차(V)의 픽셀 포맷을 갖는 상기 제1 픽셀 포맷을 RGB(Red-Green-Blue) 또는 그레이스케일(Gray-Scale) 픽셀 포맷을 갖는 상기 제2 픽셀 포맷으로 변환할 수 있다. The image converter converts the first pixel format having a pixel format of a luminance signal (Y), a difference (U) of a red signal and a difference (V) between a luminance signal and a blue component to RGB (Red-Green-Blue) To a second pixel format having a Gray-Scale pixel format.

또한, 본 발명의 실시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구동방법은, 영상 내의 객체에 대한 인식결과를 제공해주는 DCNN(Deep Convolutional Neural Network)을 이용해 영상 분석을 수행하는 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구동방법으로서, 제1 픽셀 포맷, 제1 해상도 및 제1 프레임률(frame rate)을 갖는 복수의 제1 비디오 프레임으로 구성되는 제1 영상 데이터를 비디오 제공 장치로부터 수신하며, 상기 수신한 제1 영상 데이터의 상기 제1 해상도, 상기 제1 프레임률 및 상기 제1 픽셀 포맷을 각각 상기 제1 해상도보다 낮은 제2 해상도, 상기 제1 프레임률보다 낮은 제2 프레임률 및 상기 제1 픽셀 포맷(pixel format)과 다른 형식을 갖는 이종의 제2 픽셀 포맷의 제2 영상 데이터로 변환하는 단계, 및 상기 변환한 제2 영상 데이터의 비디오 프레임에서 모션(motion) 기반으로 복수의 이동 객체의 검출 및 추적을 수행하고, 상기 복수의 추적중인 객체의 이미지를 각각 추출하며, 상기 추출한 객체 이미지를 상기 DCNN에 입력하여 얻은 인식 결과를 이용하여 상기 복수의 추적 객체를 구분하며, 상기 구분한 복수의 추적 객체에서 지정 기준을 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 복수의 추적 객체의 대상에서 제거하여 획득되는 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출하는 단계를 포함하되, The driving method of the object image recognition DCNN-based CCTV image analyzing apparatus according to the embodiment of the present invention includes a DCNN (Deep Convolutional Neural Network) that provides a recognition result for an object in an image, A method for driving a CCTV image analysis apparatus based on a first CCTV image, the method comprising: receiving first video data composed of a plurality of first video frames having a first pixel format, a first resolution and a first frame rate, The first resolution, the first frame rate, and the first pixel format of the received first image data, respectively, to a second resolution lower than the first resolution, a second frame rate lower than the first frame rate, Converting the first video data into second video data of a second different pixel format having a different format from the one pixel format, The method comprising the steps of: detecting and tracking a plurality of moving objects on the basis of motion, extracting images of the plurality of objects being tracked, inputting the extracted object images to the DCNN, The method comprising: identifying a plurality of tracking objects; identifying a user non-interest object that deviates from a specified criterion from the plurality of track objects to remove the identified user non-interest object from the objects of the plurality of tracking objects; And detecting an event based on the tracking information and the assignment rule,

상기 변환하는 단계는 상기 제1 프레임률에 따라 입력되는 상기 복수의 비디오 프레임을 수신하는 단계, 상기 입력된 복수의 제1 비디오 프레임보다 적은 수의 비디오 프레임을 샘플링하여 복수의 제2 비디오 프레임을 생성하는 단계, 상기 생성한 복수의 제2 비디오 프레임을 상기 제2 해상도로 변환하는 단계, 및 상기 제2 해상도로 변환된 상기 복수의 제2 비디오 프레임의 상기 제1 픽셀 포맷을 상기 제2 픽셀 포맷으로 변환하여 생성한 상기 제2 영상 데이터를 상기 모션 기반 영상 분석부로 제공하는 단계를 포함한다.Wherein the converting comprises receiving the plurality of video frames input according to the first frame rate, generating a plurality of second video frames by sampling a lesser number of video frames than the input plurality of first video frames Converting the generated plurality of second video frames to the second resolution and converting the first pixel format of the plurality of second video frames converted to the second resolution into the second pixel format And providing the second image data generated by the conversion to the motion-based image analysis unit.

본 발명의 실시예에 따르면, 기존의 모션 기반의 객체 검출 알고리즘을 사용하는 CCTV 영상분석장치의 문제점을 DCNN 기반 이미지 인식 기술을 활용하여 개선하되, 일반적인 CPU가 내장된 장치에서도 DCNN 기반 영상 분석이 가능하게 됨으로써 기존 고가의 장비 구입에 들던 비용을 절약할 수 있을 것이다.According to the embodiment of the present invention, the problem of the CCTV image analysis apparatus using the existing motion-based object detection algorithm is improved by utilizing the DCNN-based image recognition technology, and DCNN-based image analysis is also possible in a device having a general CPU The cost of purchasing existing expensive equipment will be saved.

또한, 딥러닝이 적용된 DCNN 기반 이미지 인식 기술을 활용함으로써 객체 검출의 정확도가 증대되어 이벤트 검출에 따른 경보 발송이 정확해지게 될 것이다.Also, by using the DCNN based image recognition technology with deep learning, the accuracy of object detection will be increased, so that alarm sending according to event detection will be accurate.

도 1은 본 발명의 일 실시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도,
도 2는 본 발명의 일 실시예에 따른 모션 영역 검출부의 처리 과정 및 결과를 보여주는 예시도,
도 3은 본 발명의 일 실시예에 따른 객체 추적부의 처리 과정 및 결과를 보여주는 예시도,
도 4는 본 발명의 일 실시예에 따른 추적 객체 분류부의 처리 과정을 설명하는 도면,
도 5는 주어진 이미지를 사람, 차량, 미확인 클래스 중 하나로 분류하는 객체 이미지 인식 DCNN을 학습하기 위한 샘플 이미지들의 예시도,
도 6은 본 발명의 일 실시예에 따른 추적 객체 분류부를 통해 추적 객체들을 분류한 예시도,
도 7은 본 발명의 일 실시예에 따른 비관심 추적 객체 제거부의 처리 결과를 보여주는 예시도,
도 8은 본 발명의 일 실시예에 따른 관심 추적 객체 기반 이벤트 검출부의 처리 결과를 보여주는 예시도,
도 9은 본 발명의 일 실시예에 따른, N대의 CCTV 카메라 영상을 동시에 실시간 분석하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도,
도 10은 본 발명의 일 실시예에 따른, CCTV 관제 센터 등에서 대규모로 영상 분석을 수행하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상 분석 시스템의 구성도,
도 11은 본 발명의 다른 실시예에 따른 영상분석장치의 구조를 나타내는 블록다이어그램, 그리고
도 12는 본 발명의 실시예에 따른 영상분석장치의 구동과정을 나타내는 흐름도이다.1 is a block diagram of a CCTV image analysis apparatus based on an object image recognition DCNN according to an embodiment of the present invention;
FIG. 2 is an exemplary diagram showing a processing procedure and a result of a motion region detecting unit according to an embodiment of the present invention;
FIG. 3 is an exemplary view showing a processing procedure and a result of an object tracking unit according to an embodiment of the present invention;
4 is a view for explaining a process of a tracking object classifying unit according to an embodiment of the present invention,
5 is an illustration of sample images for learning an object image recognition DCNN that classifies a given image into one of human, vehicle,
FIG. 6 illustrates an example of classifying trace objects through a trace object classifier according to an embodiment of the present invention. FIG.
FIG. 7 is an exemplary diagram illustrating the processing result of the unattended tracking object removal according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating a processing result of an interest tracking object-based event detection unit according to an embodiment of the present invention;
FIG. 9 is a configuration diagram of an object image recognition DCNN-based CCTV image analyzing device for simultaneously real-time analyzing N CCTV camera images, according to an embodiment of the present invention;
10 is a configuration diagram of a CCTV image analysis system based on an object image recognition DCNN for performing image analysis on a large scale in a CCTV control center or the like according to an embodiment of the present invention;
11 is a block diagram illustrating a structure of an image analysis apparatus according to another embodiment of the present invention, and
12 is a flowchart illustrating a process of driving an image analysis apparatus according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른, 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도이다.1 is a block diagram of an object image recognition DCNN based CCTV image analyzing apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치(혹은 영상분석장치)(100)는 영상 변환부(혹은 영상획득부)(121), 모션 기반 영상 분석부(122) 및 객체 이미지 인식 DCNN(부)(108)의 일부 또는 전부를 포함한다.1, an object image recognition DCNN-based CCTV image analysis apparatus (or an image analysis apparatus) 100 according to a temporal example of the present invention includes an image conversion unit (or an image acquisition unit) 121, An analysis unit 122 and an object image recognition DCNN (unit) 108. [

여기서, "일부 또는 전부를 포함한다"는 것은 객체 이미지 인식 DCNN(108)과 같은 일부 구성요소가 생략되어 CCTV 영상분석장치(100)가 구성되거나 모션 기반 영상 분석부(122)와 같은 일부 구성요소가 영상 변환부(121)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분히 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all of" means that some components, such as the object image recognition DCNN 108, are omitted so that the CCTV image analysis apparatus 100 is configured or some components such as the motion-based image analysis unit 122 May be integrated with other components such as the image conversion unit 121, and the like, and the description will be made in order to facilitate a thorough understanding of the present invention.

영상 변환부(121)는 모션 기반 영상 분석부(122)가 사용할 영상 데이터를 취득한다. 가령, 영상 변환부(121)는 통신사의 통신망을 경유하여 CCTV가 제공하는 영상 데이터나 관공서 등의 서버에서 제공하는 영상 데이터를 취득할 수 있다. 모션 기반 영상 분석부(122)는 영상 변환부(121)가 제공하는 영상 데이터를 입력 받아 모션 및 DCNN 기반 영상 분석을 수행한다. 여기서, "영상 데이터"는 영상 신호를 의미할 수 있지만, 비디오 데이터를 의미할 수도 있다. 통상적으로 영상 신호는 비디오 신호와 음성 신호를 포함하고, 부가정보(예: 부호화정보, 자막정보 등)를 더 포함할 수 있는데, 이때 비디오 신호는 비디오 데이터를 나타낸다. 다만, 영상 처리에서는 비디오 신호의 처리가 중점적으로 다루어지는 관계로 당업자들을 영상 데이터와 비디오 데이터를 혼용하는 경향을 보이기도 한다. 따라서, 본 발명의 실시예에서는 위의 용어의 개념에 특별히 한정하지는 않을 것이다. 다시 말해, 영상 데이터와 비디오 데이터를 거의 동일 개념으로 사용할 수도 있다.The image converting unit 121 acquires image data to be used by the motion-based image analyzing unit 122. For example, the image conversion unit 121 can acquire image data provided by the CCTV via the communication network of the communication company or image data provided by the server such as a government office. The motion-based image analyzing unit 122 receives the image data provided by the image converting unit 121 and performs motion and DCNN-based image analysis. Here, "image data" may mean a video signal, but may also mean video data. Generally, a video signal includes a video signal and a voice signal, and may further include additional information (e.g., encoding information, caption information, etc.), wherein the video signal represents video data. However, since the processing of video signals is mainly handled in the image processing, those skilled in the art tend to use video data and video data in combination. Therefore, in the embodiment of the present invention, the concept of the above term is not particularly limited. In other words, the video data and the video data can be used in almost the same concept.

다만, 영상 데이터는 일련의 단위 프레임 영상을 포함한다. 다시 말해, 일련의 비디오 프레임으로 구성된다. 가령 프레임률이 60FPS(Frames Per Second)라 하면, 1초당 60장의 정지 영상 즉 단위 프레임 영상이 송수신되거나 하는 것을 의미한다. 따라서, 비디오 프레임, 프레임 비디오 또는 단위 프레임이라는 것은 거의 유사한 개념으로 사용되지만, 그 지칭하는 대상이 프레임이나 아니면 데이터냐에 따라 사용하는 용어는 조금 상이할 수 있다. 물론 통신 과정에서 이러한 단위 프레임 영상은 시리얼 데이터를 형성하므로, 영상 데이터는 단위 프레임 영상을 구성하는 픽셀의 화소값들이 시리얼로 이루어진 데이터라 볼 수도 있다. 그러나, 본 발명의 실시예에서는 위의 용어들이 당업자에게 다양하게 혼용되어 사용되므로 그 개념에 특별히 한정하지는 않을 것이다. 다만, 구분이 필요한 경우에는 명확히 구분하여 사용한다.However, the image data includes a series of unit frame images. In other words, it consists of a series of video frames. For example, if the frame rate is 60 FPS (Frames Per Second), it means that 60 still images per second are transmitted / received. Therefore, although a video frame, a frame video, or a unit frame is used in a substantially similar concept, the terms used may vary slightly depending on whether the object referred to is a frame or data. Of course, since the unit frame image forms serial data in a communication process, the image data can be regarded as data in which pixel values of pixels constituting a unit frame image are composed of serial data. However, in the embodiment of the present invention, since the above terms are used in various ways by a person skilled in the art, the concept is not particularly limited. However, when classification is necessary, it should be used clearly.

영상 변환부(121)는 비디오 프레임 획득부(101), 비디오 프레임 서브샘플링부(102), 비디오 프레임 스케일링부(103) 및 픽셀 포맷 변환부(104)의 일부 또는 전부를 포함할 수 있으며, 또한 모션 기반 영상 분석부(122)는 모션영역 검출부(105), 객체 추적부(106), 추적 객체 분류부(107), 비관심 추적 객체 제거부(109), 관심 추적 객체 기반 이벤트 검출부(110)의 일부 또는 전부를 포함할 수 있다. 여기서, "일부 또는 전부를 포함"한다는 것은 앞서서의 의미와 동일하다.The image conversion unit 121 may include a part or all of the video frame acquisition unit 101, the video frame subsampling unit 102, the video frame scaling unit 103 and the pixel format conversion unit 104, The motion-based image analyzing unit 122 includes a motion region detecting unit 105, an object tracking unit 106, a tracking object classifying unit 107, a non-interest tracking object removing unit 109, an interest tracking object based event detecting unit 110, May include some or all of < RTI ID = 0.0 > Here, "including some or all of them" is the same as the preceding meaning.

비디오 프레임 획득부(101)는 영상분석장치(100)에 연결된 CCTV 카메라, 비디오 스트리밍 서버 등과 같은 비디오 제공 장치로부터 비디오 프레임 데이터를 지속적으로 획득한다. 일례로 영상분석장치(100)에 연결된 비디오 제공 장치가 IP 카메라인 경우, 비디오 프레임 획득부(101)는 IP 카메라로부터 인코딩된 비디오 스트림 데이터를 수신 및 디코딩하여, YUV 픽셀 포맷(예: YV12 픽셀 포맷)의 비디오 프레임 데이터(혹은 프레임 비디오 데이터)를 지속적으로 획득한다. 여기서, YUV(예: 16비트)는 휘도신호(Y)와 적색신호의 차(U), 휘도신호와 청색성분의 차(V)의 3가지 정보로 색을 나타내는 형식이다. Y 성분은 오차에 민감하므로 색상 성분인 U와 V보다 많은 비트를 코딩하며, Y:U:V의 비율은 일반적으로 4:2:2이므로 YUV422로도 불리운다. 또한 적색 성분과 청색 성분을 이용하므로 색상을 지칭하는 크로마(Chroma)와 합쳐져 Chroma Red, Chroma Blue의 약어를 이용하여 YcrCb로도 불리운다.The video frame obtaining unit 101 continuously obtains video frame data from a video providing apparatus such as a CCTV camera, a video streaming server, and the like connected to the image analyzing apparatus 100. For example, when the video providing apparatus connected to the image analyzing apparatus 100 is an IP camera, the video frame acquiring unit 101 receives and decodes the encoded video stream data from the IP camera to generate a YUV pixel format (e.g., a YV12 pixel format (Or frame video data) of the video frame. Here, YUV (for example, 16 bits) is a color representing a color by three pieces of information: a difference (U) between a luminance signal (Y) and a red signal, and a difference (V) between a luminance signal and a blue component. Since the Y component is sensitive to errors, it codes more bits than the color components U and V, and the ratio of Y: U: V is generally 4: 2: 2, which is also called YUV422. It is also called YcrCb using the abbreviation of Chroma Red and Chroma Blue combined with Chroma which refers to the color because it uses red component and blue component.

비디오 프레임 서브샘플링부(102)는 비디오 프레임 획득부(101)가 획득한 (비디오)프레임들로부터 영상 분석에 사용될 프레임들을 서브샘플링(Subsampling)한다. 일례로 영상분석장치(100)에 입력되는 비디오의 프레임 레이트(Frame Rate)즉 프레임률을 30FPS이고, 영상 분석 프레임 레이트가 6FPS로 설정되어 있으면, 비디오 프레임 서브샘플링부(102)는 입력 비디오로부터 매 5 프레임마다 1 프레임씩 비디오 프레임을 취득한다.The video frame sub-sampling unit 102 subsamples the frames to be used for image analysis from the (video) frames obtained by the video frame obtaining unit 101. For example, if the frame rate of the video input to the video analysis apparatus 100 is 30 FPS and the video analysis frame rate is set to 6 FPS, the video frame sub-sampling unit 102 extracts And acquires a video frame by one frame for every five frames.

입력 비디오의 모든 프레임을 영상 분석에 사용하지 않고 서브샘플링한 프레임(혹은 복수의 제2 비디오 프레임)을 사용하는 이유는, 입력 비디오의 모든 프레임을 사용하는 경우 영상분석장치의 처리 부하는 매우 커지는 반면 그 효과(예: 객체 추적 성능 향상 등)는 통상적으로 작기 때문이다. 통상적인 CCTV 영상에서 보행자나 차량 객체를 추적하는 데 있어서 6~10FPS 정도면 충분하다.The reason why all the frames of the input video are not used for image analysis and the sub-sampled frame (or the plurality of second video frames) is used is that the processing load of the image analysis apparatus becomes very large when all the frames of the input video are used The effect (for example, object tracking performance improvement, etc.) is usually small. 6 to 10 FPS is sufficient to track a pedestrian or vehicle object in a typical CCTV image.

비디오 프레임 스케일링부(103)는 비디오 프레임 서브샘플링부(102)가 획득한 비디오 프레임의 크기(예: 해상도)를 지정된 크기로 스케일링(Scaling)한다. 통상적으로 비디오 프레임 스케일링부(103)는 고해상도의 입력 영상을 저해상도로 줄이는 역할을 한다. 일례로 비디오 프레임 획득부(101)가 획득한 비디오 프레임의 해상도가 1920x1080의 고해상도이고, 영상 분석부(122)에서 사용할 영상의 해상도가 640x480으로 설정되어 있으면, 비디오 프레임 스케일링부(103)는 1920x1080 해상도의 입력 영상으로부터 픽셀 서브샘플링 및 선형 보간(Linear Interpolation) 등을 통해 640x480 해상도의 영상을 생성한다.The video frame scaling unit 103 scales the size (e.g., resolution) of the video frame acquired by the video frame subsampling unit 102 to a predetermined size. In general, the video frame scaling unit 103 reduces a high-resolution input image to a low resolution. For example, if the resolution of the video frame obtained by the video frame obtaining unit 101 is a high resolution of 1920x1080 and the resolution of the image to be used in the image analyzing unit 122 is set to 640x480, the video frame scaling unit 103 obtains a resolution of 1920x1080 resolution And generates an image having a resolution of 640x480 through pixel sub-sampling and linear interpolation from the input image.

영상 분석시 원 고해상도의 영상을 사용하지 않고 축소된 저해상도의 영상을 사용하는 이유는, 고해상도의 영상을 그대로 사용할 경우 영상분석장치의 처리 부하는 매우 커지는 반면 그 효과(예: 객체 검출 성능 향상 등)는 통상적으로 작기 때문이다. 원거리 감시 환경에서 원거리 소형 객체를 감지하려면 고해상도의 입력 영상을 축소하지 않고 그대로 처리하는 것이 필요할 수도 있으나, 통상적인 근거리 감시 환경에서는 고해상도 입력 영상을 축소하여 사용해도 객체 검출률에는 별 영향이 없으며, 반면 영상분석장치의 처리 부하는 훨씬 줄어드는 이득이 있다.The reason for using the reduced low resolution image without using the original high resolution image in the image analysis is that when the high resolution image is used as it is, the processing load of the image analyzing device becomes very large while the effect (for example, Is usually small. To detect a small object in a remote surveillance environment, it may be necessary to process a high-resolution input image without reducing it. However, in a typical short-range surveillance environment, even if a high-resolution input image is reduced, The processing load on the analyzer is much less of a benefit.

픽셀 포맷 변환부(104)는 비디오 프레임 스케일링부(103)를 통해 스케일링이 된 YUV 픽셀 포맷의 영상을 영상 분석부(122)에서 사용할 수 있는 픽셀 포맷인 RGB 픽셀 포맷이나 그레이스케일(Gray-Scale) 픽셀 포맷으로 변환한다.The pixel format conversion unit 104 converts an image of the YUV pixel format scaled by the video frame scaling unit 103 into an RGB pixel format or a gray-scale format, which is a pixel format that can be used in the image analysis unit 122, Pixel format.

모션 영역 검출부(105)는 주기적으로 학습한 배경 영상과 입력 영상의 차(혹은 차 영상)를 통해 기본적인 모션 영역을 구하고(혹은 인식하고), 각종 노이즈 모션 픽셀 제거 방법 및 모폴로지 필터링(Morphology Filtering)에 의한 최종적인 모션 영역을 검출한다. 도 2는 모션 영역 검출부(105)의 처리 과정 및 결과를 보여주는 예로, 입력 영상(201), 학습한 배경 영상(202), 입력 영상과 배경 영상의 차에 의한 기본 모션 영역 검출 결과(203), 노이즈 전경 픽셀 제거 및 모폴로지 필터링에 의한 최종 모션 영역 검출 결과(204)의 예이다.The motion region detection unit 105 obtains (or recognizes) a basic motion region through the difference (or difference image) between the background image and the input image that are periodically learned, and performs a motion picture region removal method and a morphology filtering Thereby detecting the final motion area. FIG. 2 shows an example of a process and a result of the motion region detection unit 105. The motion region detection unit 105 includes an input image 201, a learned background image 202, a basic motion region detection result 203 based on a difference between an input image and a background image, An example of final motion area detection result 204 by noise foreground pixel removal and morphology filtering.

객체 추적부(106)는 도 2의 입력 영상(201) 및 최종 모션 영역 검출 결과(204)를 이용하여 다중(혹은 복수의) 객체 추적을 수행한다. 구체적으로 객체 추적부(106)는 신규 객체 검출, 매칭에 의한 프레임 간 객체 추적, 추적 객체의 템플리트 이미지(Template Image) 및 바운딩 박스 좌표 업데이트, 추적 객체 목록 관리, 추적 객체 별 궤적 관리 등을 수행한다. 도 3은 도 1의 객체 추적부(106)의 처리 과정 및 결과를 보여주는 예로, 몇몇 추적 객체들의 템플리트 이미지(301) 및 객체 검출/추적 결과(302)의 예이다. 도 3의 객체 검출/추적 결과(302)의 예에서, 관심 객체인 사람 객체 이외에 다수의 무의미한 객체들(예: 바람에 흔들리는 나뭇가지들의 모션에 의해 검출된 객체들)이 검출된 것을 볼 수 있다.The object tracking unit 106 performs multiple (or plural) object tracking using the input image 201 and the final motion area detection result 204 of FIG. Specifically, the object tracking unit 106 performs new object detection, inter-frame object tracking by matching, template image of a tracking object and bounding box coordinate update, tracking object list management, trajectory management for each tracking object, and the like . FIG. 3 is an example of the process and result of the object tracking unit 106 of FIG. 1, which is an example of a template image 301 and an object detection / tracking result 302 of some tracking objects. In the example of the object detection / tracking results 302 of FIG. 3, it can be seen that a number of meaningless objects (e.g., objects detected by motion of twigs swaying in the wind) are detected in addition to the human object of interest .

추적 객체 분류부(107)는 추적 중인 객체들에 대해 객체 이미지 인식 DCNN(108)을 이용하여 객체 분류를 수행한다. 추적 객체 분류부(107)와 관련한 자세한 내용은 이후에 도 4 내지 도 6을 참조하여 좀더 살펴보기로 한다.The tracking object classification unit 107 performs object classification using the object image recognition DCNN 108 for the objects being tracked. Details related to the tracking object classifying unit 107 will be described later with reference to FIGS. 4 to 6. FIG.

또한, 비관심 추적 객체 제거부(109)는 비관심 클래스(class, 부류, 계층)로 분류된 추적 객체들을 객체 추적부(106)의 추적 객체 목록에서 제거한다.In addition, the non-interest tracking object removal unit 109 removes the tracking objects classified into the non-interest class (class, class, layer) from the tracking object list of the object tracking unit 106.

관심 추적 객체 기반 이벤트검출부(110)는 관심 클래스의 객체들의 추적 정보를 이용하여 지정된 규칙을 만족하는 이벤트를 검출한다.The tracking object-based event detection unit 110 detects an event that satisfies a specified rule by using tracking information of objects of interest classes.

도 4는 도 1의 추적객체 분류부의 처리 과정을 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining a processing procedure of the tracking object classifying unit of FIG. 1;

설명의 편의상 도 4를 도 1과 함께 참조하면, 도 4의 예에서와 같이 도 1의 객체 이미지 인식 DCNN(108)은 크기가 정규화된 객체 이미지를 입력으로 받아 “사람”, “차량”, “미확인” 클래스 중 하나로 인식하는 심층 신경망이다. 가령, 객체 이미지가 입력되면, 객체 이미지 인식 DCNN(108)은 입력된 객체 이미지를 기저장한 이미지들(예: 샘플 이미지 혹은 이미지 데이터)과 비교하여 그에 대한 인식 결과를 제공해 줄 수 있다.Referring to FIG. 4 with reference to FIG. 1, as in the example of FIG. 4, the object image recognition DCNN 108 of FIG. 1 receives input of a normalized object image of size, Unidentified "class. For example, when the object image is input, the object image recognition DCNN 108 may compare the input object image with previously stored images (e.g., sample image or image data) and provide the recognition result.

이에 도 1의 추적 객체 분류부(107)는 추적 중인 각 객체에 대해, 입력 영상으로부터 객체 바운딩 박스 영역에 해당하는 이미지 패치(Image Patch)(혹은 객체 이미지)를 획득하고, 이미지 패치의 크기를 정규화한 다음, 객체 이미지 인식 DCNN(108)을 통해 인식을 시도한다.The tracking object classifying unit 107 of FIG. 1 acquires an image patch (or an object image) corresponding to the object bounding box area from the input image for each object being tracked, and normalizes the size of the image patch And then attempts recognition through the object image recognition DCNN 108.

추적 객체 분류부(107)는 보다 정확한 객체 분류를 수행하기 위해 DCNN 인식을 단 한 번만 시도하는 것이 아니라 주기적으로 여러 번 시도를 한다. 도 4의 401의 예와 같이, 객체 바운딩 박스의 감지 상태에 따라 특정 시점에서 얻은 DCNN 인식 결과는 불안정한 인식 결과일 수도 있기 때문이다.The tracking object classifying unit 107 periodically tries to perform DCNN recognition not only once but also several times in order to perform more accurate object classification. 4, the DCNN recognition result obtained at a specific point in time may be an unstable recognition result according to the detection state of the object bounding box.

구체적으로 추적 객체 분류부(107)는 도 4와 같이 특정 추적 객체에 대해 지정된 주기 P마다 입력 영상으로부터 객체의 이미지(혹은 이미지 패치)를 획득하여 DCNN 인식을 시도하고, DCNN 인식 결과를 클래스 별로 누적한 다음, 현재 프레임에서 누적 점수가 가장 높은 클래스를 그 추적 객체의 클래스로 확정한다. 가령, 도 4에서 볼 때 7개의 단위 프레임이 일정 시간 간격으로 입력되면, 입력된 첫번째, 세번째, 다섯번째 및 일곱번째의 단위 프레임에서 동일 객체에 대한 객체 이미지 패치를 각각 획득하여 DCNN을 주기적, 즉 일정시간 간격을 주기로 적용할 수 있다.Specifically, as shown in FIG. 4, the tracking object classifying unit 107 acquires an image (or an image patch) of an object from the input image every predetermined period P with respect to a specific tracking object and attempts to recognize the DCNN. Then, the class having the highest cumulative score in the current frame is determined as the class of the trace object. For example, if seven unit frames are input at a predetermined time interval as shown in FIG. 4, the object image patches for the same object are obtained in the first, third, fifth, and seventh unit frames, It can be applied periodically at regular intervals.

위의 주기 P 값은 통상적으로 0.5초~1초가 적당하다. P 값이 작을수록 영상분석장치(100)의 처리 부하가 증가하고, P 값이 클수록 객체 분류 성능이 떨어질 수 있다. 그리고 객체의 이미지 패치를 획득할 때, 원 해상도의 영상(즉, 도 1의 비디오 프레임 서브샘플링부(102)에서 얻은 비디오 프레임)으로부터 획득하는 것이 바람직하다. 이는 축소가 된 이후의 영상으로부터 객체의 이미지 패치를 얻을 경우, 질이 저하되어 인식하기 어려운 이미지 패치를 얻을 수도 있기 때문이다.The above cycle P value is usually 0.5 to 1 second. The smaller the P value, the greater the processing load of the image analysis apparatus 100. The larger the P value, the lower the object classification performance. When acquiring an image patch of the object, it is preferable to acquire the original resolution image (i.e., a video frame obtained from the video frame subsampling unit 102 in FIG. 1). This is because if the image patch of the object is obtained from the image after the reduction, the quality may be degraded and an image patch which is difficult to recognize may be obtained.

주기적으로 수행하는 DCNN 인식 시도 횟수는 총 N회로 한정 지을 수 있다. 이는 장시간 추적하는 객체에 대하여 쓸데 없이 DCNN 인식 시도를 계속 하는 것을 방지하기 위함이다.The number of DCNN recognition attempts that are performed periodically may be limited to a total of N circuits. This is to prevent the DCNN recognition attempt from continuing unnecessarily for long-time tracking objects.

도 5는 주어진 이미지를 사람(a), 차량(b), 미확인(c) 클래스 중 하나로 분류하는 객체 이미지 인식 DCNN을 학습하기 위한 샘플 이미지들의 예이다.5 is an example of sample images for learning an object image recognition DCNN that classifies a given image into one of classes a, b, and c.

설명의 편의상 도 5를 도 1과 함께 참조하면, 도 5의 예에서와 같이 특히 “사람” 객체의 학습 데이터의 경우, 사람의 전신 이미지 이외에 사람의 부분 이미지나 두 명 이상의 사람이 포함된 이미지도 학습 샘플에 포함되어 있음에 주목할 필요가 있다. 이는 도 1의 모션 기반 영상 분석부(122)에 의해 검출된 사람 객체 영역의 경우, 사람의 전신뿐만 아니라 사람의 일부분 또는 두 명 이상의 사람이 포함되는 경우도 자주 발생하기 때문이다.For convenience of explanation, referring to Fig. 5 together with Fig. 1, in the case of learning data of a " person " object in particular, as in the example of Fig. 5, a partial image of a person or an image including two or more persons It is noted that it is included in the learning sample. This is because, in the case of the human object area detected by the motion-based image analyzing unit 122 of FIG. 1, a part of a person or more than two persons is included in a case not only of the whole body of a person.

도 1의 객체 이미지 인식 DCNN(108)의 모델로, 현재까지 제안된 다양한 형태의 DCNN 모델이 사용될 수 있다. 대표적으로 ILSVRC(ImageNet Large Scale Visual Recognition Challenge)에서 우승한 적이 있는 AlexNet, VGGNet, GoogLeNet, ResNet 모델 등이 있다. 실제로 사용할 DCNN 모델 선택 시 단순히 DCNN의 인식 성능만 고려해서는 문제가 있다. 통상적으로 DCNN의 처리 시간(Inference Time) 및 용량(예: 파라미터 수)과 DCNN의 인식 성능 사이에는 트레이드-오프(trade-off) 관계가 있기 때문이다. 최근에 인식 성능은 기존 DCNN 모델과 유사하면서 처리 시간이나 파라미터 수를 획기적으로 줄인 DCNN 모델들이 발표되고 있는데, 예를 들면 스퀴즈넷(SqueezeNet)이나 모바일넷(MobileNet) 등을 들 수 있다.As the model of the object image recognition DCNN 108 of FIG. 1, various types of DCNN models proposed up to now can be used. Examples include AlexNet, VGGNet, GoogLeNet, and ResNet models that have won ILSVRC (ImageNet Large Scale Visual Recognition Challenge). When selecting the actual DCNN model to be used, it is difficult to consider only the recognition performance of DCNN. This is because there is usually a trade-off relationship between the processing time (inference time) and the capacity (for example, the number of parameters) of the DCNN and the recognition performance of the DCNN. Recently, DCNN models that are similar to existing DCNN models and significantly reduce the processing time and the number of parameters have recently been announced. For example, SqueezeNet or MobileNet.

도 6은 도 1의 추적 객체 분류부를 통해 추적 객체들을 분류한 예이고, 도 7은 도 1의 비관심 추적 객체 제거부의 처리 결과를 보여주는 예이며, 도 8은 도 1의 관심 추적 객체 기반 이벤트 검출부의 처리 결과를 보여주는 예이다.FIG. 6 is an example of classifying the tracking objects through the tracking object classifying unit of FIG. 1, FIG. 7 is an example showing the processing result of the unattended tracking object removing unit of FIG. 1, This is an example showing the processing result of the detection unit.

도 6에서 볼 때, 실제 사람 객체 이외에 나머지 오검출된 객체들을 모두 “미확인(Unknown)”으로 정상 분류한 것을 볼 수 있다.In FIG. 6, it can be seen that all of the false-detected objects other than the actual human object are normally classified as " Unknown ".

또한, 도 7에서는 “미확인” 클래스를 비관심 클래스로 지정했을 때, 도 6의 객체 분류 결과로부터 비관심 추적 객체 제거부(109)의 처리 결과를 보여준다.In FIG. 7, when the " unidentified " class is designated as a non-interest class, the processing result of the unattended tracking object removal unit 109 from the object classification result of FIG. 6 is shown.

도 8에서는 “사람” 객체가 지정된 영역에 침입하는 이벤트를 검출하는 예를 잘 보여주고 있다.FIG. 8 shows an example of detecting an event in which a " person " object enters an area designated.

도 9는 본 발명의 일 실시예에 따른, N대의 CCTV 카메라 영상을 동시에 실시간 분석하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도이다.FIG. 9 is a block diagram of an object image recognition DCNN-based CCTV image analyzing apparatus for simultaneously real-time analyzing N CCTV camera images according to an embodiment of the present invention.

도 9에 도시된 바와 같이, 본 발명의 일 실시예에 따른 CCTV 영상분석장치(900)는 N개의 비디오 스트림을 동시에 처리하기 위한 N개의 비디오 채널 처리부(901), 객체 이미지 인식 처리부(902) 및 객체 이미지 인식 DCNN(108)의 일부 또는 전부를 포함하며, 여기서 "일부 또는 전부를 포함"한다는 것은 앞서서의 의미와 동일하다.9, the CCTV image analysis apparatus 900 according to an embodiment of the present invention includes N video channel processing units 901, an object image recognition processing unit 902, Includes some or all of the object image recognition DCNN 108, wherein "including some or all" is the same as the preceding meaning.

비디오 채널 처리부(901)는 도 1의 영상 변환부(121)와 모션 기반 영상 분석부(122)의 일부 또는 전부를 포함할 수 있다. N개의 비디오 채널 처리부(901) 및 객체 이미지 인식 처리부(902)는 통상적으로 영상분석장치(900)의 CPU 상에서 동작한다. 반면 객체 이미지 인식 DCNN(108)은 고속 동작을 위해 대규모 병렬 연산이 가능한 GPU 상에서 동작할 수 있다.The video channel processing unit 901 may include some or all of the image conversion unit 121 and the motion-based image analysis unit 122 of FIG. The N video channel processing unit 901 and the object image recognition processing unit 902 typically operate on the CPU of the image analysis apparatus 900. [ On the other hand, the object image recognition DCNN 108 may operate on a GPU capable of large scale parallel operations for high speed operation.

N개의 비디오 채널 처리부(901)는 N개의 객체 이미지 인식 DCNN을 각각 사용하는 것이 아니라, 한 개의 객체 이미지 인식 DCNN(108)을 “공유”하여 사용한다. 한 개의 DCNN을 구동하기 위해 통상적으로 많은 시스템 메모리가 요구되는데, N개의 비디오 채널 처리부(901)가 개별적으로 DCNN을 메모리에 올려서 사용할 경우, N의 값이 커짐에 따라 메모리 부족 문제가 발생할 수 있기 때문이다.The N video channel processing units 901 use "shared" one object image recognition DCNN 108 instead of using N object image recognition DCNNs, respectively. A large amount of system memory is generally required to drive one DCNN. If N video channel processing units 901 individually use DCNN in memory, an insufficient memory problem may occur as N increases to be.

N개의 비디오 채널 처리부(901) 내에 있는 각 모션 기반 영상 분석부(122)는 도 4에서 기술한 방식과 동일한 방식으로 추적 객체들의 분류 작업을 수행하기 위해, 추적 객체의 정규화된 객체 이미지 패치와 인식 요청 메시지를 주기적으로 객체 이미지 인식 처리부(902)에 전달한다. 상기의 인식 요청 메시지들은 객체 이미지 인식 처리부(902)의 요청 메시지 큐(Queue)에 순차적으로 저장된다. 객체 이미지 인식 처리부(902)는 요청 메시지 큐에 인식 요청 메시지가 있는 것을 발견하는 즉시 요청 메시지를 큐로부터 꺼내 처리한다. 즉, 객체 이미지 인식 처리부(902)는 인식 요청 메시지와 함께 전달받은 정규화된 객체 이미지 패치를 객체 이미지 인식 DCNN(108)에 입력하여 객체 이미지 인식 결과를 획득하고, 인식 요청을 한 해당 모션 기반 영상 분석부(122)로 객체 이미지 인식 결과를 전달한다. 해당 모션 기반 영상 분석부(122)는 객체 이미지 인식 결과를 받아 도 4와 같이 객체 분류 작업을 수행한다.Each of the motion-based image analyzing units 122 in the N video channel processing units 901 performs a classification operation of the tracking objects in the same manner as described with reference to FIG. 4, And transmits the request message to the object image recognition processing unit 902 periodically. The recognition request messages are sequentially stored in a request message queue of the object image recognition processor 902. [ The object image recognition processing unit 902 immediately fetches the request message from the queue and processes it as soon as it finds the recognition request message in the request message queue. That is, the object image recognition processing unit 902 acquires the object image recognition result by inputting the normalized object image patch received together with the recognition request message to the object image recognition DCNN 108, and outputs the corresponding motion-based image analysis And transmits the object image recognition result to the unit 122. The motion-based image analyzing unit 122 receives the object image recognition result and performs an object classification operation as shown in FIG.

도 10은 본 발명의 일 실시예에 따른, CCTV 관제 센터 등에서 대규모로 영상 분석을 수행하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상분석 시스템의 구성도이다.10 is a block diagram of an object image recognition DCNN-based CCTV image analysis system for performing image analysis on a large scale in a CCTV control center or the like according to an embodiment of the present invention.

대규모의 영상분석 시스템을 구축하기 위해 단순히 도 9에서 제시한 다수의 영상분석장치(900)를 이용하여 시스템을 구성할 수도 있다. 그러나 관제 센터에서 사용 가능한 고성능 GPU는 통상적으로 매우 고가이므로, 개별 영상분석장치마다 GPU를 설치하여 사용하기에는 비용적인 문제가 발생한다.In order to construct a large-scale image analysis system, a system may be constructed simply by using a plurality of image analysis apparatuses 900 shown in FIG. However, high-performance GPUs that can be used in the control center are usually very expensive, so there is a costly problem to install and use the GPUs for individual image analysis devices.

상기의 문제를 해결하기 위해 본 발명의 실시예에서는 도 10과 같이 통상적인 비디오 채널 처리는 GPU가 탑재되어 있지 않은 다수의 영상 분석 서버(1001)가 수행하고, DCNN 기반의 객체 이미지 인식은 고성능 GPU가 탑재된 소수의 객체 이미지 인식 서버(1002)가 수행하도록 영상분석 시스템(1000)을 구성할 수 있다. 영상 분석 서버(1001)는 객체 이미지 인식 서버(1002)와 고속의 네트워크 통신을 수행한다.In order to solve the above problem, in the embodiment of the present invention, as shown in FIG. 10, the conventional video channel processing is performed by a plurality of image analysis servers 1001 without GPU, and DCNN-based object image recognition is performed by a high- The image analysis system 1000 may be configured so that a small number of object image recognition servers 1002 loaded with the image analysis system 1002 can perform the image analysis. The image analysis server 1001 performs high-speed network communication with the object image recognition server 1002.

영상 분석 서버(1001)의 비디오 채널 처리부는 도 4와 동일한 방식으로 추적 객체들의 분류 작업을 수행하기 위해, 추적 객체의 정규화된 객체 이미지 패치와 인식 요청 메시지를 주기적으로 객체 이미지 인식 서버(1002)에 전달한다. 이때 객체 이미지 패치 데이터는 JPEG 등의 형식으로 압축하여 전달한다. 상기의 인식 요청 메시지들은 객체 이미지 인식 서버(1002)의 객체 이미지 인식 처리부(1003)의 요청 메시지 큐(Queue)에 순차적으로 저장된다. 객체 이미지 인식 처리부(1003)는 요청 메시지 큐에 인식 요청 메시지가 있는 것을 발견하는 즉시 요청 메시지를 큐로부터 꺼내 처리한다. 즉, 객체 이미지 인식 처리부(1003)는 인식 요청 메시지와 함께 전달받은 압축된 객체 이미지 패치 데이터를 디코딩한 후, 객체 이미지 인식 DCNN(108)에 입력하여 객체 이미지 인식 결과를 획득한다. 객체 이미지 인식 처리부(1003)는 인식 요청을 한 영상 분석 서버(1001)의 비디오 채널 처리부로 객체 이미지 인식 결과를 전달한다. 영상 분석 서버(1001)의 비디오 채널 처리부는 객체 이미지 인식 결과를 받아 도 4와 같이 객체 분류 작업을 수행하는 것이다.The video channel processing unit of the image analysis server 1001 periodically transmits the normalized object image patch of the tracking object and the recognition request message to the object image recognition server 1002 in order to classify the tracking objects in the same manner as FIG. . At this time, the object image patch data is compressed and delivered in a format such as JPEG. The recognition request messages are sequentially stored in a request message queue of the object image recognition processing unit 1003 of the object image recognition server 1002. [ The object image recognition processing unit 1003 immediately fetches the request message from the queue and processes it as soon as it finds the recognition request message in the request message queue. That is, the object image recognition processing unit 1003 decodes the compressed object image patch data received together with the recognition request message, and inputs the decoded object image patch data to the object image recognition DCNN 108 to acquire the object image recognition result. The object image recognition processing unit 1003 transmits the object image recognition result to the video channel processing unit of the image analysis server 1001 that has issued the recognition request. The video channel processing unit of the image analysis server 1001 receives the object image recognition result and performs the object classification operation as shown in FIG.

도 11은 본 발명의 또 다른 실시예에 따른 영상분석장치의 구조를 나타내는 블록다이어그램이다.11 is a block diagram illustrating a structure of an image analysis apparatus according to another embodiment of the present invention.

도 11에 도시된 바와 같이, 본 발명의 또 다른 실시예에 따른 영상분석장치(1100)는 통신 인터페이스부(1110), 제어부(1120), DCNN기반 영상분석부(1130) 및 저장부(1140)의 일부 또는 전부를 포함한다.11, the image analysis apparatus 1100 according to another embodiment of the present invention includes a communication interface unit 1110, a control unit 1120, a DCNN-based image analysis unit 1130, and a storage unit 1140, Or the like.

여기서, "일부 또는 전부를 포함한다"는 것은 통신 인터페이스부(1110)나 저장부(1140)와 같은 일부 구성요소가 생략되어 영상분석장치(1100)가 구성되거나 DCNN기반 영상분석부(1130)와 같은 일부 구성요소가 제어부(1120)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all of" means that the image analyzing apparatus 1100 is constituted by omitting some components such as the communication interface unit 1110 and the storing unit 1140, or the DCNN-based image analyzing unit 1130 But it is to be understood that the same reference numerals are used throughout the drawings to refer to the same or like components, such as the control unit 1120, and the like.

통신 인터페이스부(1110)는 통신사의 통신망을 경유하여 CCTV와 통신을 수행하고, 이의 과정에서 변/복조 등의 동작을 수행할 수 있다.The communication interface unit 1110 communicates with the CCTV via the communication network of the communication company, and can perform operations such as side / demodulation in the course of the communication.

제어부(1120)는 도 11의 영상분석장치(1100)를 구성하는 통신 인터페이스부(1110), DCNN기반 영상분석부(1130) 및 저장부(1140)의 전반적인 제어 동작을 담당한다. 가령, 제어부(1120)는 통신 인터페이스부(1110)를 통해 제공되는 CCTV의 촬영영상을 DCNN기반 영상분석부(1130)에 제공할 수 있다.The control unit 1120 performs overall control operations of the communication interface unit 1110, the DCNN-based image analysis unit 1130, and the storage unit 1140 of the image analysis apparatus 1100 of FIG. For example, the control unit 1120 may provide the DCNN-based image analysis unit 1130 with a photographed image of the CCTV provided through the communication interface unit 1110.

DCNN기반 영상분석부(1130)는 앞서 도 1 내지 도 10을 참조하여 설명한 바 있는 본 발명의 실시예에 관련되는 영상분석 동작을 수행할 수 있다. 대표적으로는 수신된 프레임 영상의 프레임률을 변환하거나 단위 프레임 영상을 랜덤하게 취득하며, 본 발명의 실시예에서는 이를 서브샘플링이라 명명한 바 있다. 또한, 서브샘플링을 통해 취득된 단위 프레임 영상의 해상도를 고해상도에서 저해상도로 변환한다. 그리고, 수신된 프레임 영상의 포맷을 변환할 수 있다.The DCNN-based image analyzing unit 1130 can perform the image analyzing operation according to the embodiment of the present invention described above with reference to FIG. 1 to FIG. Typically, the frame rate of the received frame image is converted or a unit frame image is randomly acquired. In the embodiment of the present invention, this is called subsampling. Further, the resolution of the unit frame image acquired through subsampling is changed from high resolution to low resolution. Then, the format of the received frame image can be converted.

또한, DCNN기반 영상분석부(1130)는 포맷이 변환된 서브샘플링된 프레임 영상에서 객체 이미지 패치를 추출하여 DCNN기반으로 객체 이미지를 인식시켜 객체를 정확히 분류하고, 이때 비관심 추적 객체는 제거 즉 필터링한다. DCNN기반 영상분석부(1130)는 이와 같이 제거 동작을 통해 관심 추적 객체에 대하여만 이벤트를 검출하게 되는 것이다. 그리고, 그 이벤트를 근거로 경보를 출력할 수 있다.In addition, the DCNN-based image analyzer 1130 extracts object image patches from the format-converted sub-sampled frame images, recognizes object images based on DCNN, and accurately classifies the objects. In this case, do. The DCNN-based image analyzing unit 1130 detects an event only for the target object of interest through the removing operation. Then, an alarm can be output based on the event.

상기한 내용들 이외에, 도 11의 통신 인터페이스부(1110), 제어부(1120), DCNN기반 영상분석부(1130) 및 저장부(1140)와 관련해서는 앞서 충분히 설명하였으므로, 자세한 내용은 그 내용들로 대신하고자 한다.The communication interface unit 1110, the control unit 1120, the DCNN-based image analysis unit 1130 and the storage unit 1140 of FIG. 11 have been described above in detail. I would rather.

한편, 본 발명의 또 다른 실시예로서 도 11의 제어부(1120)는 CPU와 메모리를 포함할 수 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부 및 레지스트리 등을 포함할 수 있고, 메모리는 램을 포함할 수 있다. 여기서, CPU는 도 9에서의 CPU를 의미할 수도 있다. 제어회로는 제어동작을 담당하고, 연산부는 비트연산을 수행하며, 명령어해석부는 기계어를 해석해 어떠한 명령인지를 판단할 수 있다. 레지스트리는 데이터의 일시 저장에 관여할 수 있다. 제어부(1120)는 통상 원칩(One-chip)으로 구성될 수 있다. 따라서, CPU는 영상분석장치(1100)의 동작 초기에 도 11의 DCNN기반 영상분석부(1130)에 저장된 프로그램을 복사하여 메모리에 로딩하고 이를 실행시킴으로써 CPU의 연산 처리 속도를 빠르게 증가시킬 수 있을 것이다.Meanwhile, as another embodiment of the present invention, the controller 1120 of FIG. 11 may include a CPU and a memory. The CPU may include a control circuit, an operation unit (ALU), an instruction interpretation unit, a registry, and the like, and the memory may include a RAM. Here, the CPU may mean the CPU in Fig. The control circuit is responsible for the control operation, the arithmetic operation unit performs the bit operation, and the instruction interpretation unit can interpret the machine language to determine which instruction is the instruction. The registry can participate in the temporary storage of data. The control unit 1120 may be generally configured as a one-chip. Therefore, the CPU can rapidly increase the processing speed of the CPU by copying the program stored in the DCNN-based image analysis unit 1130 of FIG. 11 at the beginning of operation of the image analysis apparatus 1100, loading the program into the memory, and executing the program .

도 12는 본 발명의 실시예에 따른 영상분석장치의 구동과정을 나타내는 흐름도이다.12 is a flowchart illustrating a process of driving an image analysis apparatus according to an embodiment of the present invention.

설명의 편의상 도 12를 도 11과 함께 참조하면, 본 발명의 실시예에 따른 영상분석장치(1100)는 비디오 제공 장치(예: CCTV 등)에서 제공하는 영상, 즉 복수의 비디오 프레임 영상을 수신하여 영상처리의 부하를 줄이는 포맷으로 변환한다(S1200). 여기서, 포맷은 앞서의 제1 포맷을 의미한다기보다는 수신된 비디오 영상과 다른 형태의 비디오 영상으로 변환한다고 이해하는 것이 좋다. 따라서, 이를 위하여 영상분석장치(1100)는 앞서와 같이 해상도를 입력 영상보다 저해상도로, 그리고 처리해야하는 영상은 더 적게, 그리고 가급적 RGB나 G-scale 픽셀 포맷의 비디오 영상으로 변환하는 것이 바람직하다. 이는 어디까지나 일반적인 CPU 환경에서도 CCTV의 영상을 통해 분석이 가능하고, 이를 통해 관제요원 등이 관제를 수행할 수도 있도록 하기 위한 것이라 볼 수 있다.11, the image analysis apparatus 1100 according to the embodiment of the present invention receives an image provided by a video providing apparatus (e.g., CCTV or the like), that is, a plurality of video frame images To a format for reducing the load of the image processing (S1200). Here, it is preferable to understand that the format is converted into a video image different from the received video image, rather than the first format. Accordingly, for this purpose, the image analysis apparatus 1100 preferably converts the resolution to a lower resolution than the input image, and to convert the image to be processed into a video image of RGB or G-scale pixel format as much as possible. This can be analyzed through CCTV images even in a general CPU environment so that the control personnel can perform the control.

이어, 영상분석장치(1100)는 포맷이 변환된 복수의 비디오 프레임 영상에서 모션 추적을 위한 복수의 객체 객체에 대한 객체 이미지를 추출하여, 추출한 객체 이미지를 객체 이미지 인식 방식을 사용하는 DCNN에 적용하여 적용 결과를 근거로 복수의 추적 객체를 분류하고, 분류한 복수의 추적 객체에서 지정 기준을 벗어나는 사용자 비관심 객체를 제거해 획득하는 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출한다(S1210).Then, the image analyzing apparatus 1100 extracts object images of a plurality of object objects for motion tracking in a plurality of formatted video frame images, and applies the extracted object images to a DCNN using an object image recognizing method A plurality of tracking objects are classified based on the application result, and an event is detected based on the tracking information and the designation rule of the user interest object acquired by removing the user non-interest objects out of the plurality of tracking objects that fall within the specified reference (S1210 ).

가령, DCNN은 샘플 이미지를 저장하고 이를 활용해 입력된 객체 이미지의 인식 결과를 제공해 줄 수 있다. 또한, 학습에 의해 적용 결과를 누적하여 누적 결과를 생성하고 누적 결과 높은 클래스를 최종적으로 추적 객체로 확정한다. 예를 들어, 사람 객체에 대한 누적 결과가 2점이고, 미확인 1점이라면 해당 프레임 구간에서는 사람 객체로 확정하는 것이다. For example, DCNN can store a sample image and use it to provide recognition results for the input object image. In addition, accumulation results are accumulated by learning to generate cumulative results, and final accumulation result classes are finally confirmed as tracking objects. For example, if the accumulation result for a human object is 2 points and the unknown result is 1 point, it is confirmed as a human object in the corresponding frame period.

이에 따라, 가령 미확인으로 분류된 특정 객체는 해당 객체의 객체 이미지를 근거로 복수의 추적 객체의 대상에서 제외시킨다. 따라서, 화면에서 추적 객체에 형성된 바운딩 박스는 제거될 수 있다.Accordingly, for example, a specific object classified as an unidentified object is excluded from a plurality of tracking objects based on the object image of the object. Therefore, the bounding box formed on the trace object on the screen can be removed.

이러한 방식으로 사용자의 관심 객체만 추적하게 되고, 그 추적 정보가 지정된 규칙을 만족하는 이벤트를 검출하게 되는 것이다.In this way, only the object of interest of the user is tracked, and the trace information detects an event that satisfies the specified rule.

이외에도 다양한 동작이 가능할 수 있지만, 자세한 내용은 앞서 설명한 내용들로 대신하고자 한다.In addition, various operations may be possible, but the details described above are supposed to be replaced.

한편, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them. In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. The codes and code segments constituting the computer program may be easily deduced by those skilled in the art. Such a computer program may be stored in a non-transitory computer readable medium readable by a computer, readable and executed by a computer, thereby implementing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium is not a medium for storing data for a short time such as a register, a cache, a memory, etc., but means a medium which semi-permanently stores data and can be read by a device . Specifically, the above-described programs can be stored in non-volatile readable recording media such as CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.While the invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention.

100, 900, 1100: 영상분석장치 101: 비디오 프레임 획득부
102: 비디오 프레임 서브샘플링부 103: 비디오 프레임 스케일링부
104: 픽셀 포맷 변환부 105: 모션 영역 검출부
106: 객체 추적부 107: 추적 객체 분류부
108: 객체 이미지 인식 DCNN 109: 비관심 추적 객체 제거부
110: 관심 추적 객체 기반 이벤트 검출부 121: 영상 변환부
122: 모션 기반 영상 분석부 901: 비디오 채널 처리부
902, 1003: 객체 이미지 인식 처리부 1000: 영상분석 시스템
1001: 영상 분석 서버 1002: 객체 이미지 인식 서버
1110: 통신 인터페이스부 1120: 제어부
1130: DCNN 영상분석부 1140: 저장부100, 900, 1100: Image analysis apparatus 101: Video frame acquisition unit
102: video frame sub-sampling unit 103: video frame scaling unit
104: Pixel format conversion unit 105:
106: Object tracking unit 107: Tracked object classification unit
108: object image recognition DCNN 109: non-interest tracking object removal
110: an interest tracking object-based event detection unit 121:
122: motion-based image analysis unit 901: video channel processing unit
902, 1003: object image recognition processing unit 1000: image analysis system
1001: image analysis server 1002: object image recognition server
1110: Communication interface unit 1120:
1130: DCNN image analysis unit 1140:

Claims

The present invention relates to an image recognition DCNN-based CCTV image analysis apparatus for performing image analysis using a DCNN (Deep Convolutional Neural Network)
A method for receiving first video data composed of a plurality of first video frames having a first pixel format, a first resolution and a first frame rate from a video providing apparatus, 1 resolution, the first frame rate, and the first pixel format as a second resolution lower than the first resolution, a second frame rate lower than the first frame rate, and a format different from the first pixel format To a second image data of a second different pixel format having the second pixel format; And
Detecting and tracking a plurality of moving objects based on motion in a video frame of the converted second image data, extracting images of the plurality of objects being tracked, respectively, and outputting the extracted object images to the DCNN The method comprising the steps of: identifying the plurality of tracking objects by using a recognition result obtained by inputting the plurality of tracking objects; And a motion-based image analyzing unit for detecting an event based on the tracking information and the specifying rule of the user interest object obtained by removing the user interest object,
The image converter may include:
A video frame obtaining unit that receives the plurality of video frames input according to the first frame rate;
A video frame subsampling unit for sampling a number of video frames less than the input plurality of first video frames to generate a plurality of second video frames;
A video frame scaling unit for converting the plurality of generated second video frames into the second resolution; And
A pixel format conversion unit for converting the first pixel format of the plurality of second video frames converted into the second resolution into the second pixel format and providing the second image data to the motion-based image analysis unit; Lt; / RTI >
Wherein the motion-
A motion region detection unit for generating a difference image using the second image data and the learned background image, and detecting a motion region for the plurality of moving objects by removing noise from the generated difference image;
An object tracking unit for detecting and tracking the plurality of moving objects using the detected motion area and the second image data;
A first recognition result obtained by applying an object image of the plurality of tracking objects extracted from one of the plurality of second video frames to the DCNN, and a second recognition result obtained by applying the first recognition result obtained by applying the object image of the plurality of tracking objects extracted from one of the plurality of second video frames to the DCNN, A tracking object classifying unit that classifies the plurality of tracking objects based on a second recognition result obtained by applying an object image of the plurality of tracking objects extracted from at least one other video frame inputted at intervals to the DCNN;
An unattended tracking object removing unit that identifies user unattended objects out of the plurality of tracked objects and out of the plurality of tracking objects of the object tracking unit; And
And a tracking object-based event detector for detecting an event that the tracking information of the user interest object obtained according to the removal satisfies a specified rule
Including object image recognition DCNN based CCTV image analysis device.

The method according to claim 1,
The tracking object classifier calculates and accumulates the first recognition result and the second recognition result as scores, and determines a class having the highest cumulative score as the type of the tracking object. The DCNN-based CCTV image analysis Device.

3. The method of claim 2,
Wherein the non-interest tracking object removing unit identifies a user non-interest tracking object that deviates from the specified reference in the determined type of object and removes the non-interest tracking object from the objects of the plurality of tracking objects.

The method according to claim 1,
The image conversion unit converts the first pixel format having a pixel format of a luminance signal (Y), a difference (U) of a red signal and a difference (V) between a luminance signal and a blue component to RGB (Red-Green-Blue) DCNN-based CCTV image analysis apparatus for converting an object image into a second pixel format having a gray-scale pixel format.