KR20240115149A

KR20240115149A - Apparatus and method for integrated anomaly detection

Info

Publication number: KR20240115149A
Application number: KR1020230063771A
Authority: KR
Inventors: 김도형; 전호범; 김형민; 김재홍; 최정단
Original assignee: 한국전자통신연구원
Priority date: 2022-10-06
Filing date: 2023-05-17
Publication date: 2024-07-25

Abstract

본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 방법은 제1 신경망을 이용하여 입력 영상으로부터 사물 객체 및 인간 객체를 검출하는 단계, 상기 인간 객체를 추적하는 단계 및 객체 검출 결과 및 인간 객체의 추적 결과에 기반하여 이상 상황을 탐지하는 단계를 포함한다.An integrated abnormal situation detection method according to an embodiment of the present invention includes the steps of detecting an object object and a human object from an input image using a first neural network, tracking the human object, and the object detection result and the human object tracking result. It includes a step of detecting an abnormal situation based on.

Description

Integrated abnormal situation detection method and device {APPARATUS AND METHOD FOR INTEGRATED ANOMALY DETECTION}

본 발명은 다수의 이상 상황들을 통합적으로 검출할 수 있는 단일 이상 상황 탐지 기술에 관한 것이다.The present invention relates to a single abnormal situation detection technology that can comprehensively detect multiple abnormal situations.

구체적으로, 본 발명은 다수의 이상 상황들을 통합적으로 검출하기 객체 검출 및 추적 기술에 관한 것이다.Specifically, the present invention relates to object detection and tracking technology to comprehensively detect multiple abnormal situations.

최근, 범죄나 사고로 인한 인명 및 재산 피해를 줄이기 위한 방법으로 보안 시설뿐 아니라 도시 곳곳에 CCTV(Closed-Circuit Television)가 설치되고 있다. 하지만, 가파르게 증가하는 CCTV들을 소수의 관제 요원들이 모니터링하기 때문에 관제 효율성에 대한 문제가 발생하였고, 관제 인력의 확보와 더불어 자동화된 지능형 영상 감시를 위한 이상 상황 탐지 시스템의 필요성이 점차 커지고 있다. 이상 상황 탐지 시스템은 사고나 사건을 사전에 포착하여 알람 이벤트를 발생시키거나 사건 발생 이후에도 사건이 발생한 상황을 빠르게 검색할 수 있어 관제 효율을 크게 증가시킬 수 있다. Recently, CCTV (Closed-Circuit Television) is being installed not only in security facilities but also throughout the city as a way to reduce damage to life and property due to crimes or accidents. However, as a small number of control personnel monitor the rapidly increasing number of CCTVs, problems with control efficiency have arisen. In addition to securing control personnel, the need for an abnormal situation detection system for automated intelligent video surveillance is gradually increasing. Abnormal situation detection systems can significantly increase control efficiency by detecting accidents or incidents in advance and generating alarm events or quickly searching for situations where an incident has occurred even after the incident has occurred.

따라서, 원활한 위기 대응 능력과 사고의 골든 타임을 확보하기 위해서 대부분의 관제 센터에서 인공지능 기술을 기반으로 하는 이상 상황 탐지 시스템을 도입하고 시도하고 있다. 하지만 현재의 시스템들은 여러가지 문제점들로 인해 실제 환경에서의 운용에 있어 매우 제한적으로 활용되고 있다. 현장에 적용 가능한 고도화된 이상 상황 탐지 시스템의 구축 및 확보를 위해서는 다음과 같은 문제의 해결이 반드시 필요하다.Therefore, in order to ensure smooth crisis response capabilities and golden time for accidents, most control centers are trying to introduce abnormal situation detection systems based on artificial intelligence technology. However, current systems have very limited use in actual environments due to various problems. In order to build and secure an advanced abnormal situation detection system applicable to the field, solving the following problems is essential.

먼저, 다수의 이상 상황들을 복합적으로 검출할 수 있는 단일화된 통합 프레임워크가 필요하다. 사회 안전망 구축을 위한 지능형 CCTV 이상 상황 검출 기술들은 실제 현장에서 발생할 수 있는 침입, 배회, 유기, 방화, 쓰러짐, 싸움 등의 다양한 상황 탐지에 활용되고 있다. 하지만 대부분의 종래 기술들은 단일 이상 상황에 대해서만 최적화된 검출 방법을 제안하는데 그치고 있을 뿐이며, 서로 다른 이상 상황들을 복합적으로 검출하고 처리할 수 있는 통합 프레임워크를 제시하지 못하고 있다. 다수의 이상 상황 발생에 동시에 대응하기 위해서는 서로 다른 특성을 지니는 탐지 모듈들을 유기적으로 결합하여 시스템의 실시간성과 신뢰성을 확보할 수 있는 단일화된 통합 프레임워크가 필요하다.First, a unified, integrated framework that can complexly detect multiple abnormal situations is needed. Intelligent CCTV abnormality detection technologies for building a social safety net are being used to detect various situations that may occur in actual sites, such as intrusion, loitering, abandonment, arson, collapse, and fighting. However, most prior technologies only propose an optimized detection method for a single abnormal situation, and do not present an integrated framework that can complexly detect and process different abnormal situations. In order to simultaneously respond to the occurrence of multiple abnormal situations, a unified integrated framework is needed that can secure real-time and reliability of the system by organically combining detection modules with different characteristics.

또한, 안정적인 휴먼 검출과 추적 기술의 확보가 선행되어야 한다. 영상 내에 등장하는 복수의 사람 각각에 대하여 이상 상황을 검출하기 위해서는 휴먼 검출 및 추적 기술이 반드시 필요하다. 비디오 환경에서 다중 물체를 탐지하고 추적하는 기술들은 공개적으로 경쟁하는 챌린지들을 통해 급격한 성능 발전을 이루어 왔다. 하지만 CCTV 환경에서는 조명, 환경, 기상의 빈번한 변화와 카메라와 휴먼과의 거리 및 각도의 다양성으로 인해 휴먼 검출의 안정성을 확보하기가 쉽지 않다. 휴먼 검출 및 추적 기술이 휴먼을 제대로 검출하지 못하거나 다른 물체를 휴먼으로 잘못 검출하게 되면 전체 관제 시스템의 성능이 급격히 하락할 수밖에 없다. 따라서, 실내외 다양한 CCTV 환경에서 정확하게 휴먼을 검출하고 추적하는 기술은 관제 시스템의 신뢰성 확보를 위해 필수적으로 요구된다.Additionally, securing stable human detection and tracking technology must take precedence. Human detection and tracking technology is necessary to detect abnormal situations for each of the multiple people appearing in the video. Technologies for detecting and tracking multiple objects in video environments have made rapid progress through publicly competitive challenges. However, in the CCTV environment, it is not easy to ensure the stability of human detection due to frequent changes in lighting, environment, and weather and the diversity of distances and angles between cameras and humans. If human detection and tracking technology fails to properly detect humans or incorrectly detects other objects as humans, the performance of the entire control system is bound to decline sharply. Therefore, technology to accurately detect and track humans in various indoor and outdoor CCTV environments is essential to ensure the reliability of the control system.

마지막으로, 학습되지 않은 새로운 CCTV 환경에서도 이상 상황을 검출할 수 있는 범용성이 확보되어야 한다. 기존의 대부분의 이상 상황 검출 연구는 사전에 수집된 데이터를 이용한 학습 및 평가에 관한 내용을 다루고 있다. 하지만 시스템이 설치될 목표 도메인(target domain)에서 사전에 수집된 대량의 학습 데이터는 존재할 가능성이 매우 낮으며, 이를 수집하는 것 또한 매우 어려운 일이다. 따라서 추가의 데이터 수집없이 다양한 CCTV 환경의 변화에도 강인하게 동작할 수 있는 이상 상황 검출 기술이 필요하다.Lastly, the versatility to detect abnormal situations even in new, untrained CCTV environments must be secured. Most existing abnormal situation detection studies deal with learning and evaluation using data collected in advance. However, it is very unlikely that a large amount of learning data collected in advance will exist in the target domain where the system will be installed, and collecting it is also very difficult. Therefore, an abnormal situation detection technology that can operate robustly despite changes in various CCTV environments without additional data collection is needed.

국내 등록특허공보 제2344606호(발명의 명칭: 추적 감시 cctv 시스템 및 추적 감시 방법)Domestic Registered Patent Publication No. 2344606 (Title of Invention: Tracking and Monitoring CCTV System and Tracking and Monitoring Method)

본 발명의 목적은 여러 이상 상황 발생에 대하여 동시에 대응할 수 있는 통합형 이상 상황 탐지 구조를 제공하는 것이다.The purpose of the present invention is to provide an integrated abnormal situation detection structure that can simultaneously respond to the occurrence of multiple abnormal situations.

또한, 본 발명의 목적은 인간 객체의 오탐 및 추적 실패를 줄일 수 있는 필터링 기법을 제공하는 것이다.Additionally, the purpose of the present invention is to provide a filtering technique that can reduce false positives and tracking failures of human objects.

또한, 본 발명의 목적은 추가적인 데이터의 수집없이 다양한 환경에서 강인하게 동작하는 이상 상황 검출 방법을 제공하는 것이다.Additionally, the purpose of the present invention is to provide a method for detecting abnormal situations that operates robustly in various environments without collecting additional data.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 방법은 제1 신경망을 이용하여 입력 영상으로부터 사물 객체 및 인간 객체를 검출하는 단계; 상기 인간 객체를 추적하는 단계; 및 객체 검출 결과 및 인간 객체의 추적 결과에 기반하여 이상 상황을 탐지하는 단계를 포함한다. An integrated abnormal situation detection method according to an embodiment of the present invention for achieving the above object includes detecting object objects and human objects from an input image using a first neural network; tracking the human object; and detecting an abnormal situation based on the object detection result and the human object tracking result.

이때, 상기 인간 객체를 추적하는 단계는 상기 제1 신경망의 중간 연산 결과에 기반하여 생성된 제1 특징 정보; 및 상기 제1 신경망의 최종 연산 결과에 상응하는 인간 객체 영역을 입력으로, 제2 신경망을 통해 추출된 제2 특징 정보를 이용하여 수행될 수 있다. At this time, the step of tracking the human object includes first feature information generated based on an intermediate calculation result of the first neural network; And it may be performed using the human object area corresponding to the final calculation result of the first neural network as input and second feature information extracted through the second neural network.

이때, 상기 중간 연산 결과는 상기 입력 영상의 공간 정보 및 텍스처 정보를 포함하고, 상기 제1 특징 정보는 상기 중간 연산 결과에 대하여, 상기 검출된 인간 객체의 영역을 마스킹하여 추출될 수 있다. At this time, the intermediate operation result includes spatial information and texture information of the input image, and the first feature information may be extracted by masking the area of the detected human object with respect to the intermediate operation result.

이때, 상기 인간 객체를 추적하는 단계는 상기 제1 특징 정보 및 상기 제2 특징 정보를 이용하여 프레임 간 동일한 인간 객체를 매칭할 수 있다. At this time, the step of tracking the human object may match the same human object between frames using the first characteristic information and the second characteristic information.

이때, 상기 제1 특징 정보는 M차원 특징 벡터에 상응하고, 상기 제2 특징 정보는 N차원 특징 벡터에 상응할 수 있다.At this time, the first feature information may correspond to an M-dimensional feature vector, and the second feature information may correspond to an N-dimensional feature vector.

이때, 상기 인간 객체를 추적하는 단계는 상기 제1 특징 정보 및 상기 제2 특징 정보에 기반하여 생성된 (M+N) 차원 특징 벡터를 이용하여 상기 인간 객체를 추적할 수 있다. At this time, in the step of tracking the human object, the human object may be tracked using a (M+N) dimensional feature vector generated based on the first feature information and the second feature information.

이때, 상기 인간 객체의 추적 결과는 상기 인간 객체의 점유 영역 및 이동 궤적 정보를 포함할 수 있다.At this time, the tracking result of the human object may include information on the occupied area and movement trajectory of the human object.

이때, 상기 인간 객체를 추적하는 단계는 상기 인간 객체의 이동 궤적 정보에 기반하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, the step of tracking the human object may identify a work object that is incorrectly detected as a human object based on movement trajectory information of the human object.

이때, 상기 인간 객체를 추적하는 단계는 상기 인간 객체의 이동 궤적 상응하는 움직임 벡터를 산출하고, 구간별 움직임 벡터 간 외적 연산 결과를 이용하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, in the step of tracking the human object, a motion vector corresponding to the movement trajectory of the human object may be calculated, and an object misdetected as a human object may be identified using the result of an external operation between motion vectors for each section.

이때, 상기 이상 상황을 탐지하는 단계는 상기 입력 영상에 상응하는 시각 특징 정보; 및 방화 상황 텍스트에 상응하는 언어 특징 정보를 이용하여 방화 상황을 탐지할 수 있다. At this time, the step of detecting the abnormal situation includes visual feature information corresponding to the input image; And the arson situation can be detected using language feature information corresponding to the arson situation text.

이때, 상기 이상 상황을 탐지하는 단계는 상기 시각 특징 정보 및 상기 언어 특징 정보를 동일한 비교 공간으로 맵핑하고, 상기 시각 특징 정보와 상기 언어 특징 정보의 유사도를 산출하여 방화 상황을 탐지할 수 있다. At this time, the step of detecting the abnormal situation may detect an arson situation by mapping the visual feature information and the language feature information to the same comparison space and calculating the similarity between the visual feature information and the language feature information.

이때, 상기 시각 특징 정보는 상기 입력 영상, 상기 인간 객체 영역 이미지, 및 상기 인간 객체가 기설정된 시간을 초과하여 머무른 것으로 판별된 영역 이미지에 기반하여 생성될 수 있다. At this time, the visual characteristic information may be generated based on the input image, the human object area image, and the area image in which the human object is determined to have stayed for more than a preset time.

이때, 상기 이상 상황을 탐지하는 단계는 인간 객체 행동을 탐지하고, 상기 인간 객체 행동의 빈도에 기반하여 구간별 메인 행동을 설정하고, 상기 구간별 메인 행동 정보를 이용하여 이상 상황의 발생 구간을 산출할 수 있다. At this time, the step of detecting the abnormal situation detects human object behavior, sets main behavior for each section based on the frequency of the human object behavior, and calculates the section in which the abnormal situation occurs using the main behavior information for each section. can do.

이때, 상기 이상 상황은 침입, 배회, 방화, 유기, 싸움 및 쓰러짐 상황을 포함할 수 있다. At this time, the abnormal situation may include intrusion, loitering, arson, abandonment, fighting, and collapse.

또한, 상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 장치는 제1 신경망을 이용하여 입력 영상으로부터 사물 객체 및 인간 객체를 검출하는 객체 검출부; 상기 인간 객체를 추적하는 인간 객체 추적부; 및 객체 검출 결과 및 인간 객체의 추적 결과에 기반하여 이상 상황을 탐지하는 이상 상황 탐지부를 포함한다. In addition, an integrated abnormal situation detection device according to an embodiment of the present invention for achieving the above object includes an object detection unit that detects object objects and human objects from an input image using a first neural network; a human object tracking unit that tracks the human object; and an abnormal situation detection unit that detects an abnormal situation based on the object detection result and the human object tracking result.

이때, 상기 인간 객체 추척부는 상기 제1 신경망의 중간 연산 결과에 기반하여 생성된 제1 특징 정보; 및 상기 제1 신경망의 최종 연산 결과에 상응하는 인간 객체 영역을 입력으로, 제2 신경망을 통해 추출된 제2 특징 정보를 이용하여 인간 객체를 추적할 수 있다. At this time, the human object tracking unit includes first feature information generated based on an intermediate calculation result of the first neural network; And the human object area corresponding to the final calculation result of the first neural network is input, and the human object can be tracked using second feature information extracted through the second neural network.

이때, 상기 인간 객체 추적부는 상기 제1 특징 정보 및 상기 제2 특징 정보를 이용하여 프레임 간 동일한 인간 객체를 매칭할 수 있다. At this time, the human object tracker may match the same human object between frames using the first characteristic information and the second characteristic information.

이때, 상기 제1 특징 정보는 M차원 특징 벡터에 상응하고, 상기 제2 특징 정보는 N차원 특징 벡터에 상응할 수 있다. At this time, the first feature information may correspond to an M-dimensional feature vector, and the second feature information may correspond to an N-dimensional feature vector.

이때, 상기 인간 객체 추적부는 상기 제1 특징 정보 및 상기 제2 특징 정보에 기반하여 생성된 (M+N) 차원 특징 벡터를 이용하여 상기 인간 객체를 추적할 수 있다. At this time, the human object tracking unit may track the human object using an (M+N)-dimensional feature vector generated based on the first feature information and the second feature information.

이때, 상기 인간 객체의 추적 결과는 상기 인간 객체의 점유 영역 및 이동 궤적 정보를 포함할 수 있다. At this time, the tracking result of the human object may include information on the occupied area and movement trajectory of the human object.

이때, 상기 인간 객체 추적부는 상기 인간 객체의 이동 궤적 정보에 기반하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, the human object tracking unit may identify the object that is incorrectly detected as a human object based on the movement trajectory information of the human object.

이때, 상기 인간 객체 추적부는 상기 인간 객체의 이동 궤적 상응하는 움직임 벡터를 산출하고, 구간별 움직임 벡터 간 외적 연산 결과를 이용하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, the human object tracking unit may calculate a motion vector corresponding to the movement trajectory of the human object, and use the result of the outer product between motion vectors for each section to identify the object that was incorrectly detected as a human object.

이때, 상기 이상 상황 탐지부는 상기 입력 영상에 상응하는 시각 특징 정보; 및 방화 상황 텍스트에 상응하는 언어 특징 정보를 이용하여 방화 상황을 탐지할 수 있다. At this time, the abnormal situation detection unit includes visual feature information corresponding to the input image; And the arson situation can be detected using language feature information corresponding to the arson situation text.

본 발명에 따르면, 여러 이상 상황 발생에 대하여 동시에 대응할 수 있는 통합형 이상 상황 탐지 구조를 제공할 수 있다.According to the present invention, it is possible to provide an integrated abnormal situation detection structure that can simultaneously respond to the occurrence of multiple abnormal situations.

또한, 본 발명은 인간 객체의 오탐 및 추적 실패를 줄일 수 있는 필터링 기법을 제공할 수 있다. Additionally, the present invention can provide a filtering technique that can reduce false positives and tracking failures of human objects.

또한, 본 발명은 추가적인 데이터의 수집없이 다양한 환경에서 강인하게 동작하는 이상 상황 검출 방법을 제공할 수 있다.Additionally, the present invention can provide a method for detecting abnormal situations that operates robustly in various environments without collecting additional data.

도 1은 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 방법을 나타낸 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 이상 상황 탐지 프레임워크를 나타낸 블록도이다.
도 3은 본 발명의 일 실시예에 따른 휴먼 추적 모듈을 나타낸 블록도이다.
도 4는 정지한 물체에서 검출된 이동 궤적 예시이다.
도 5는 본 발명의 일 실시예에 따른 방화 인식 모듈을 상세히 나타낸 블록도이다.
도 6은 이상 행동 구간 검출 과정을 나타낸 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 장치를 나타낸 블록도이다.
도 8은 실시예에 따른 컴퓨터 시스템의 구성을 나타낸 도면이다.Figure 1 is a flowchart showing an integrated abnormal situation detection method according to an embodiment of the present invention.
Figure 2 is a block diagram showing an abnormal situation detection framework according to an embodiment of the present invention.
Figure 3 is a block diagram showing a human tracking module according to an embodiment of the present invention.
Figure 4 is an example of a movement trajectory detected from a stationary object.
Figure 5 is a block diagram showing in detail a fire prevention recognition module according to an embodiment of the present invention.
Figure 6 is a flowchart showing the abnormal behavior section detection process.
Figure 7 is a block diagram showing an integrated abnormal situation detection device according to an embodiment of the present invention.
Figure 8 is a diagram showing the configuration of a computer system according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and will be implemented in various different forms. The present embodiments only serve to ensure that the disclosure of the present invention is complete and that common knowledge in the technical field to which the present invention pertains is not limited. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

비록 "제1" 또는 "제2" 등이 다양한 구성요소를 서술하기 위해서 사용되나, 이러한 구성요소는 상기와 같은 용어에 의해 제한되지 않는다. 상기와 같은 용어는 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용될 수 있다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있다.Although terms such as “first” or “second” are used to describe various components, these components are not limited by the above terms. The above terms may be used only to distinguish one component from another component. Accordingly, the first component mentioned below may also be the second component within the technical spirit of the present invention.

본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소 또는 단계가 하나 이상의 다른 구성요소 또는 단계의 존재 또는 추가를 배제하지 않는다는 의미를 내포한다.The terms used in this specification are for describing embodiments and are not intended to limit the invention. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. As used in the specification, “comprises” or “comprising” implies that the mentioned component or step does not exclude the presence or addition of one or more other components or steps.

본 명세서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다.As used herein, “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 해석될 수 있다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used in this specification can be interpreted as meanings commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not to be interpreted ideally or excessively unless clearly specifically defined.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면 부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings. When describing with reference to the drawings, identical or corresponding components will be assigned the same reference numerals and redundant description thereof will be omitted. .

도 1은 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 방법을 나타낸 흐름도이다.Figure 1 is a flowchart showing an integrated abnormal situation detection method according to an embodiment of the present invention.

본 발명의 실시예에 따른 통합형 이상 상황 탐지 방법은 컴퓨팅 디바이스와 같은 이상 상황 탐지 장치에서 수행될 수 있다. The integrated abnormal situation detection method according to an embodiment of the present invention may be performed in an abnormal situation detection device such as a computing device.

도 1을 참조하면, 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 방법은 제1 신경망을 이용하여 입력 영상으로부터 사물 객체 및 인간 객체를 검출하는 단계(S110), 상기 인간 객체를 추적하는 단계(S120) 및 객체 검출 결과 및 인간 객체의 추적 결과에 기반하여 이상 상황을 탐지하는 단계(S130)를 포함한다. Referring to FIG. 1, the integrated abnormal situation detection method according to an embodiment of the present invention includes detecting object objects and human objects from an input image using a first neural network (S110), and tracking the human object (S110). S120) and detecting an abnormal situation based on the object detection result and the human object tracking result (S130).

이때, 상기 인간 객체를 추적하는 단계(S120)는 상기 제1 신경망의 중간 연산 결과에 상응하는 제1 특징 정보 및 상기 제1 신경망의 최종 연산 결과에 상응하는 인간 객체 영역을 입력으로, 제2 신경망을 통해 추출된 제2 특징 정보를 이용하여 수행될 수 있다. At this time, the step of tracking the human object (S120) uses the first feature information corresponding to the intermediate calculation result of the first neural network and the human object area corresponding to the final calculation result of the first neural network as input, and the second neural network It can be performed using the second feature information extracted through .

이때, 상기 제1 특징 정보는 상기 입력 영상의 공간 정보 및 텍스처 정보를 나타내고, 상기 제2 특징 정보는 상기 인간 객체의 외형 정보를 나타낼 수 있다. At this time, the first feature information may represent spatial information and texture information of the input image, and the second feature information may represent appearance information of the human object.

이때, 상기 인간 객체를 추적하는 단계(S120)는 상기 제1 특징 정보 및 상기 제2 특징 정보를 이용하여 프레임 간 동일한 인간 객체를 매칭할 수 있다. At this time, the step of tracking the human object (S120) may match the same human object between frames using the first characteristic information and the second characteristic information.

이때, 상기 제1 특징 정보는 M차원 특징 벡터에 상응하고, 상기 제2 특징 정보는 N차원 특징 벡터에 상응할 수 있다. 이때, 상기 M 및 N은 각각 임의의 자연수에 상응할 수 있다.At this time, the first feature information may correspond to an M-dimensional feature vector, and the second feature information may correspond to an N-dimensional feature vector. At this time, M and N may each correspond to any natural number.

이때, 상기 인간 객체를 추적하는 단계(S120)는 상기 인간 객체의 이동 궤적 정보에 기반하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, in the step of tracking the human object (S120), the object that is incorrectly detected as a human object may be identified based on the movement trajectory information of the human object.

이때, 상기 인간 객체를 추적하는 단계(S120)는 상기 인간 객체의 이동 궤적 상응하는 움직임 벡터를 산출하고, 구간별 움직임 벡터 간 외적 연산 결과를 이용하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, in the step of tracking the human object (S120), a motion vector corresponding to the movement trajectory of the human object is calculated, and the object that is incorrectly detected as a human object can be identified using the result of the outer product between the motion vectors for each section. there is.

이때, 상기 이상 상황을 탐지하는 단계(S130)는 상기 입력 영상에 상응하는 시각 특징 정보 및 상기 입력 영상에 기반하여 생성된 텍스트에 상응하는 언어 특징 정보를 이용하여 방화 상황을 탐지할 수 있다. At this time, in the step of detecting the abnormal situation (S130), an arson situation may be detected using visual feature information corresponding to the input image and language feature information corresponding to text generated based on the input image.

이때, 상기 이상 상황을 탐지하는 단계(S130)는 상기 시각 특징 정보 및 상기 언어 특징 정보를 동일한 비교 공간으로 맵핑하고, 상기 시각 특징 정보와 상기 언어 특징 정보의 유사도를 산출하여 방화 상황을 탐지할 수 있다. At this time, in the step of detecting the abnormal situation (S130), an arson situation can be detected by mapping the visual feature information and the language feature information to the same comparison space and calculating the similarity between the visual feature information and the language feature information. there is.

이때, 상기 이상 상황을 탐지하는 단계(S130)는 인간 객체 행동을 탐지하고, 상기 인간 객체 행동의 빈도에 기반하여 구간별 메인 행동을 설정하고, 상기 구간별 메인 행동 정보를 이용하여 이상 상황의 발생 구간을 산출할 수 있다. At this time, the abnormal situation detection step (S130) detects human object behavior, sets main behavior for each section based on the frequency of the human object behavior, and uses the main behavior information for each section to generate an abnormal situation. The section can be calculated.

도 2는 본 발명의 일 실시예에 따른 이상 상황 탐지 프레임워크를 나타낸 블록도이다.Figure 2 is a block diagram showing an abnormal situation detection framework according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 프레임워크는 휴먼 검출 및 추적부(200), 휴먼 동선 주목 이상 상황 탐지부(300), 목표 물체 주목 이상 상황 탐지부(400), 휴먼 행동 주목 이상 상황 탐지부(500), 이상 상황 탐지 통합 관리부(600)를 포함할 수 있다.Referring to FIG. 2, the integrated abnormal situation detection framework according to an embodiment of the present invention includes a human detection and tracking unit 200, an abnormal situation detection unit focusing on human movement 300, and an abnormal situation detection unit focusing on a target object 400. ), a human behavior attention abnormal situation detection unit 500, and an abnormal situation detection integrated management unit 600.

이때, 휴먼 검출 및 추적부(200)는 CCTV에서 획득된 영상 스트림(100)을 입력받아 휴먼을 추적하여 영상 내에서 휴먼의 영역과 궤적을 생성할 수 있다. At this time, the human detection and tracking unit 200 can receive the video stream 100 obtained from CCTV, track the human, and generate the human's area and trajectory within the video.

휴먼 동선 주목 이상 상황 탐지부(300)는 휴먼의 영역과 궤적을 입력 받아 궤적을 분석하고 경계 영역에 휴먼이 얼마나 머물렀는지를 감지하여 휴먼의 동선과 관련된 이상 상황인 침입과 배회를 탐지할 수 있다. The abnormal situation detection unit 300, which pays attention to the human movement, receives the human's area and trajectory, analyzes the trajectory, and detects how long the human stayed in the border area to detect abnormal situations related to the human's movement, such as intrusion and wandering. .

목표 물체 주목 이상 상황 탐지부(400)는 휴먼 검출 및 궤적 정보와 물체 검출 정보를 입력 받아 수화물을 추적하고 방화 장면을 인식하여 물체와 관련된 이상 상황인 유기와 방화를 탐지할 수 있다.The abnormal situation detection unit 400 that focuses on the target object can receive human detection and trajectory information and object detection information to track baggage and recognize arson scenes to detect abandonment and arson, which are abnormal situations related to objects.

휴먼 행동 주목 이상 상황 탐지부(500)는 휴먼 검출 및 궤적 정보를 입력 받아 휴먼의 행동을 인식하고 행동이 발생한 구간을 탐지하여 휴먼의 행동과 관련된 이상 상황인 싸움과 쓰러짐을 탐지할 수 있다.The abnormal situation detection unit 500 that pays attention to human behavior can receive human detection and trajectory information, recognize human behavior, detect the section in which the behavior occurred, and detect abnormal situations related to human behavior such as fighting and collapse.

이상 상황 탐지 통합 관리부(600)는 각 이상 상황 탐지부(300, 400, 500)에서 탐지한 상이한 특성을 가지는 다양한 종류의 이상 상황들을 통합적으로 관리하면서 최종적으로 각각의 이상 상황에 해당되는 이벤트(700)를 발생시킬 수 있다.The abnormal situation detection integrated management unit 600 integrates and manages various types of abnormal situations with different characteristics detected by each abnormal situation detection unit 300, 400, and 500, and finally generates an event (700) corresponding to each abnormal situation. ) can occur.

이하, 이상 상황 탐지 프레임워크를 구성하는 각각의 모듈에 관하여 상세히 설명한다.Hereinafter, each module that constitutes the abnormal situation detection framework will be described in detail.

휴먼 검출 및 추적부(200)는 영상 내에 등장하는 휴먼의 위치 정보를 수집하기 위해 휴먼을 포함한 물체 검출 및 휴먼 추적을 수행할 수 있다. 휴먼 검출 및 추적부(200)는 물체 검출 모듈(210) 및 휴먼 추적 모듈(220)을 포함할 수 있다.The human detection and tracking unit 200 may detect objects including humans and perform human tracking to collect location information on humans appearing in the video. The human detection and tracking unit 200 may include an object detection module 210 and a human tracking module 220.

물체 검출 모듈(210)은 영상 내에 존재하는 물체의 분류 라벨, 위치 정보, 확률을 추정할 수 있다. 본 발명의 일 실시예에서, 물체 탐지 모델은 RCNN, Yolo 등의 알려진 모델을 활용할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 물체 검출 모듈(210)은 휴먼으로 분류한 물체에 대해서만 위치 정보를 획득하여 휴먼 추적 모듈(220)에 전달할 수 있다.The object detection module 210 can estimate the classification label, location information, and probability of an object existing in an image. In one embodiment of the present invention, the object detection model may utilize known models such as RCNN and Yolo, but the scope of the present invention is not limited thereto. The object detection module 210 may obtain location information only for objects classified as humans and transmit it to the human tracking module 220.

휴먼 추적 모듈(220)은 각각의 프레임에 등장하는 휴먼이 동일인인지 여부를 판별하여 동일인에 대해서는 동일한 추적 아이디를 부여하고, 휴먼 위치에 대한 궤적을 생성할 수 있다. 본 발명의 일 실시예에서, 휴먼 추적 모델은 Deep SORT, OSNet 등의 알려진 모델을 활용할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 휴먼 추적 모듈(220)은 생성된 휴먼 궤적 정보를 휴먼 궤적 분석 모듈(310)에 전달할 수 있다.The human tracking module 220 can determine whether the human appearing in each frame is the same person, assign the same tracking ID to the same person, and generate a trajectory for the human location. In one embodiment of the present invention, the human tracking model may utilize known models such as Deep SORT and OSNet, but the scope of the present invention is not limited thereto. The human tracking module 220 may transmit the generated human trajectory information to the human trajectory analysis module 310.

물체 검출을 위해 공개된 대규모의 이미지 데이터셋들은 매우 많다. 조명, 물체의 크기, 카메라 각도, 배경의 변화 등 다양한 환경 조건이 반영된 물체 영상들이 대규모의 데이터셋에 모두 포함되어 있다. 따라서 대규모 데이터셋들로 학습된 물체 검출 모듈의 경우 일반적으로 다양한 환경 변화에 대해서도 안정적으로 잘 동작할 수 있다. There are many large-scale image datasets released for object detection. Object images reflecting various environmental conditions such as lighting, object size, camera angle, and background changes are all included in the large-scale dataset. Therefore, object detection modules learned from large-scale datasets can generally operate stably even in various environmental changes.

하지만, 일련의 이미지 스트림(비디오)으로 구성되어야 하는 물체 추적용 데이터셋은 물체 검출용 데이터셋에 비해서 그 수가 압도적으로 부족하며, 물체 추적용 데이터셋 대부분이 CCTV 환경에서 휴먼의 전신을 촬영한 데이터셋이다. However, object tracking datasets, which must be composed of a series of image streams (videos), are overwhelmingly fewer in number than object detection datasets, and most of the object tracking datasets are data that captures the entire body of a human in a CCTV environment. There are three.

따라서 종래의 휴먼 추적 모듈은 학습되지 않은 환경에 매우 취약하다는 단점이 있으며, 이는 전체 이상 상황 탐지 통합 프레임워크의 신뢰성을 하락 시키는 직접적인 요인이 될 수 있다. Therefore, the conventional human tracking module has the disadvantage of being very vulnerable to untrained environments, which can be a direct factor that reduces the reliability of the entire abnormal situation detection integrated framework.

구체적으로, 유사도를 이용하여 신원을 할당하는 휴먼 추적 모듈의 성능은 이미지 속에서 사람이 착용한 옷의 모양이나 색, 가방 등의 특징들을 신원 벡터로 표현하는 특징 추출기에 의존하여 추적을 수행한다. 특징 추출기가 학습한 데이터 분포 안에 존재하는 외형 정보들은 신원 벡터를 통해 고차원 공간에서 특징을 비교하기 때문에 동일 인물을 추적하는 것에 효과적으로 작동할 수 있다. 하지만 학습된 데이터셋에 존재하지 않는 외형 정보를 신원 벡터로 추출하는 상황인 학습 외 분포 환경에서는 안정적인 신원 벡터의 매핑이 어려울 수 있다.Specifically, the performance of the human tracking module, which assigns identity using similarity, relies on a feature extractor that expresses features such as the shape, color, and bag of clothes worn by the person in the image as an identity vector to perform tracking. The appearance information that exists in the data distribution learned by the feature extractor can be effective in tracking the same person because it compares features in a high-dimensional space through an identity vector. However, in a distribution environment other than learning, where appearance information that does not exist in the learned dataset is extracted as an identity vector, stable mapping of the identity vector may be difficult.

예를 들어, 특징 추출기가 비가 오는 날의 사람들의 외형 정보를 충분히 학습하지 못한 경우, 동일한 사람이지만 우산을 쓴 상태와 우산을 휴대하고 있는 상태에서는 다른 사람으로 인식할 가능성이 있다. 즉, 휴먼 추적용 학습 데이터셋에서 샘플링한 휴먼 이미지들은 국한된 환경에서 수집되어 사용 환경에 맞는 다양한 외형 정보를 제공할 수 없으며, 그에 따라 특징 추출기는 다양한 CCTV 운용 상황에서 안정적인 신원 벡터의 매핑을 수행함에 어려움이 있다. For example, if the feature extractor does not learn enough information about the appearance of people on a rainy day, there is a possibility that the same person will be recognized as different people when wearing an umbrella and when carrying an umbrella. In other words, the human images sampled from the learning dataset for human tracking are collected in a limited environment and cannot provide various appearance information suitable for the usage environment. Accordingly, the feature extractor performs stable mapping of identity vectors in various CCTV operation situations. There are difficulties.

지능형 영상 감시를 위한 이상 상황 탐지 시스템은 관제가 어려워지는 야간 및 기상 악조건 속에서 그 중요성이 더 크다고 할 수 있다. 따라서 이상 상황 탐지 시스템에 있어서 안정적인 휴먼 추적 성능의 확보는 필수적이다. The importance of the abnormal situation detection system for intelligent video surveillance can be said to be greater at night and under adverse weather conditions, when control becomes difficult. Therefore, securing stable human tracking performance is essential in an abnormal situation detection system.

본 발명의 일 실시예에 따른 이상 상황 탐지 방법은 다양한 환경 조건 및 기상 악조건에 강인한 휴먼 추적을 위해 물체 검출 모듈에서 추출된 특징 정보를 활용하여 휴먼 추적의 성능을 향상시킬 수 있는 방법을 제공한다.The abnormal situation detection method according to an embodiment of the present invention provides a method for improving human tracking performance by utilizing feature information extracted from an object detection module for human tracking that is robust to various environmental conditions and adverse weather conditions.

도 3은 본 발명의 일 실시예에 따른 휴먼 추적 모듈을 나타낸 블록도이다.Figure 3 is a block diagram showing a human tracking module according to an embodiment of the present invention.

도 3을 참조하면, 휴먼 검출 및 추적부(200)는 물체 검출 모듈(210) 및 휴먼 추적 모듈(220)을 포함할 수 있다. Referring to FIG. 3 , the human detection and tracking unit 200 may include an object detection module 210 and a human tracking module 220.

휴먼 검출 및 추적부(200)는 수신한 영상 스트림(100)에서 물체 검출 모듈(210)을 통해 먼저 휴먼의 위치를 검출하고 이전에 등장한 휴먼과 비교하여 매칭하는 과정을 통해 추적한다. 휴먼 추적 모듈(220)의 신원 인식 절차는 휴먼 특징 추출 과정, 휴먼 추적 과정, 궤적 안정화 과정으로 구성될 수 있다. 특징 추출기는 개별적인 휴먼들을 구분하기 위해 현재 프레임에서 검출된 휴먼 영역을 입력으로 N 차원의 신원 벡터를 추출할 수 있다. The human detection and tracking unit 200 first detects the location of the human in the received video stream 100 through the object detection module 210 and tracks it through a matching process by comparing it with the previously appearing human. The identity recognition process of the human tracking module 220 may consist of a human feature extraction process, a human tracking process, and a trajectory stabilization process. The feature extractor can extract an N-dimensional identity vector using the human area detected in the current frame as input to distinguish individual humans.

추적기는 추적 중인 휴먼 정보에 저장된 신원 벡터들과 현재 프레임에서 추출된 신원 벡터들 간에 유사도를 계산하여 가장 가까운 추적 아이디로 할당할 수 있다. 이후 추적된 휴먼이 움직인 궤적 정보는 안정화 필터를 통해 예측 및 보정이 수행될 수 있다.The tracker can calculate the similarity between the identity vectors stored in the human information being tracked and the identity vectors extracted from the current frame and assign them to the closest tracking ID. Afterwards, the tracked human movement trajectory information can be predicted and corrected through a stabilization filter.

본 발명의 일 실시예에 따른 휴먼 추적 기법은 환경 의존도가 높은 신원 벡터를 다양한 환경에서 안정적으로 획득하기 위한 차원 확장 기법이다.The human tracking technique according to an embodiment of the present invention is a dimensionality expansion technique for stably obtaining an identity vector that is highly dependent on the environment in various environments.

딥러닝 기반 물체 검출 모듈은 이미지에 존재하는 물체의 종류, 위치 정보, 존재 확률을 추정할 수 있다. 전술한 바와 같이 물체 검출 모듈은 대용량 데이터셋들을 활용하여 다양한 물체의 엣지 형상과 내부 텍스쳐 정보를 추출할 수 있으며, 위치 정보를 정확하게 추론하기 위해 앞뒤 관계와 물체의 상호 연관된 정보를 학습할 수 있다. The deep learning-based object detection module can estimate the type, location information, and probability of existence of objects present in the image. As mentioned above, the object detection module can extract the edge shape and internal texture information of various objects by utilizing large-capacity datasets, and can learn the front-back relationship and interrelated information of objects to accurately infer location information.

물체 검출 모듈의 중간 연산 과정에서 획득한 중간 레벨 특징(Mid-Level Feature)은 물체를 분류하고 위치를 식별하기 위해 이미지 전체에서 추출된 공간적 정보와 텍스처 정보를 포함할 수 있다.Mid-level features acquired during the intermediate calculation process of the object detection module may include spatial information and texture information extracted from the entire image to classify objects and identify their locations.

휴먼의 위치를 식별한 이후 수행되는 추적 과정에서는 이미지 전체에서 추출된 정보가 아닌 개별적인 휴먼에 대한 정보를 선별하여야 한다. 휴먼의 위치를 검출하기 위해 위치 정보를 보존하며, 각각의 휴먼 영역을 관심 영역(ROI: Region of Interest)으로 이미지에서 획득한 물체 검출 관련 특징 정보를 마스킹하여 개별적 휴먼에 관한 정보를 획득한다. 개별적인 휴먼의 영역으로 마스킹된 정보는 평균 풀링(Average Pooling)과정으로 M 차원의 물체 특징 벡터로 변환된다. 물체 검출 모듈에서 추출한 물체 특징 벡터로 휴먼 추적 모듈의 특징 추출기가 표현한 N 차원의 신원 벡터를 확장하여 (M+N)차원의 벡터를 구성하여 휴먼 추적에 활용할 수 있다. In the tracking process performed after identifying the human's location, information about individual humans must be selected rather than information extracted from the entire image. To detect the location of a human, location information is preserved, and information about individual humans is obtained by masking the feature information related to object detection obtained from the image with each human region as a region of interest (ROI). Information masked as individual human areas is converted into an M-dimensional object feature vector through an average pooling process. By expanding the N-dimensional identity vector expressed by the feature extractor of the human tracking module with the object feature vector extracted from the object detection module, a (M+N)-dimensional vector can be formed and used for human tracking.

본 발명의 일 실시예에 따른 신원 벡터의 차원 확장 기법은 물체 검출 모듈의 정보 표현 공간과 특징 추출기의 표현 공간을 교차하여 각각 학습한 사람 외형의 특징 정보를 함께 추적에 활용하는 기법이다. 차원 확장을 통해 서로 다른 모듈에서 추출된 외형 정보를 동시에 조회하는 것뿐만 아니라 기존에 표현 가능한 차원의 제약으로 구분이 불가능했던 서로 다른 휴먼들을 구분하여 식별할 수 있다.The dimension expansion technique of the identity vector according to an embodiment of the present invention is a technique that uses the feature information of the person's appearance learned separately for tracking by crossing the information expression space of the object detection module and the expression space of the feature extractor. Through dimension expansion, it is possible not only to simultaneously query appearance information extracted from different modules, but also to distinguish and identify different humans that were previously impossible to distinguish due to limitations in the dimensions that can be expressed.

또한, 물체 검출 모듈에서 획득한 정보를 재사용하기 때문에 낮은 비용으로 효율적으로 휴먼 추적의 성능을 향상할 수 있다.Additionally, because the information obtained from the object detection module is reused, the performance of human tracking can be improved efficiently at low cost.

또한, 물체 검출 모듈과 휴먼 추적 모듈이 사용하는 데이터셋의 양과 학습 양상이 다르기 때문에 상호 보완적으로 외형 특징을 추출할 수 있으며, 다양한 방식의 휴먼 추적 방법에 적용할 수 있다.In addition, since the amount and learning aspect of the dataset used by the object detection module and the human tracking module are different, appearance features can be extracted in a complementary manner and can be applied to various human tracking methods.

휴먼 동선 주목 이상 상황 탐지부(300)는 휴먼 검출 및 추적부(200)가 출력하는 휴먼의 영역과 궤적을 입력 받아 궤적을 분석할 수 있다. 또한, 경계 영역에 휴먼이 얼마나 머물렀는지를 감지하여 휴먼의 동선과 관련된 이상 상황인 침입과 배회를 탐지할 수 있다.The abnormal situation detection unit 300 that pays attention to the human movement can receive the human area and trajectory output by the human detection and tracking unit 200 and analyze the trajectory. In addition, by detecting how long a human has stayed in the border area, it is possible to detect intrusion and wandering, which are abnormal situations related to the human's movements.

휴먼 궤적 분석 모듈(310)은 추적되고 있는 휴먼의 연속된 위치 정보인 궤적 정보를 분석하여 휴먼으로 추적된 객체가 실제 휴먼이 맞는지, 아니면 휴먼으로 잘못 탐지된 물체인지 검증할 수 있다. The human trajectory analysis module 310 can analyze trajectory information, which is continuous location information of the human being tracked, to verify whether the object tracked as a human is an actual human or an object incorrectly detected as a human.

물체 검출 모듈(210)의 학습에 사용되는 다중 물체 검출 데이터셋들은 성능 평가 기준으로 mean Average Precision(mAP)을 널리 사용하고 있다. 검출 대상 물체마다 개별적으로 평균 정밀도(AP)를 계산하여 다중 물체에 대한 성능을 평가하기 위해 다시 평균을 취하는 mAP 평가 방식은 물체를 정확하게 인식하여 분류하는 성능보다 위치를 정확하게 검출하는 성능에 치중하는 문제점을 가진다.Multiple object detection datasets used for learning of the object detection module 210 widely use mean Average Precision (mAP) as a performance evaluation standard. The mAP evaluation method, which calculates average precision (AP) individually for each object to be detected and then averages it again to evaluate performance for multiple objects, focuses on the performance of accurately detecting the location rather than the performance of accurately recognizing and classifying objects. has

구체적으로 설명하면, mAP 방식은 평가 대상 물체에 해당하는 확률만 추출하여 각각의 정확도를 산출한 후 평균을 취하여 평가를 수행한다. 예를 들어, 물체 검출 모듈이 의자가 있는 위치에 다른 물체가 존재할 확률이 더 높다고 잘못 예측하여도 의자 검출 성능에 영향을 미치지 않는다. 실제 이미지의 사람 영역에 사람보다 높은 확률로 강아지가 있다고 판단하더라도 휴먼에 대한 물체 분류 성능이 하락하지 않는다. 따라서 종래의 고성능 물체 검출 모듈은 물체를 정확하게 분류하는 것보다 위치를 정밀하게 추정하는 것을 중점적으로 고려하여 설계된다.Specifically, the mAP method extracts only the probability corresponding to the object to be evaluated, calculates each accuracy, and then performs the evaluation by taking the average. For example, even if the object detection module incorrectly predicts that there is a higher probability that another object exists in the location where the chair is, this does not affect the chair detection performance. Even if it is determined that there is a dog with a higher probability than a human in the human area of the actual image, the object classification performance for humans does not decrease. Therefore, conventional high-performance object detection modules are designed with a focus on accurately estimating the location rather than accurately classifying the object.

이와 같이 분류 성능을 정확하게 검증하지 못한 물체 검출 모듈은 휴먼이 아닌 물체를 휴먼으로 분류하는 오류를 빈번하게 발생시킨다는 문제점이 있다. 객체 검출 이후의 휴먼 추적 과정에서는 휴먼과 물체를 구분하지 않기 때문에 휴먼으로 잘못 추적되고 있는 물체는 지속적으로 불필요한 이상 상황 탐지 연산을 유발할 수 있다. 잘못 탐지한 물체를 이상 상황 탐지부의 연산 과정에서 배제하는 것은 연산 속도를 높이는 것과 더불어 정확한 이상 상황 탐지를 위해 필수적이다.In this way, object detection modules that do not accurately verify classification performance have the problem of frequently generating errors in classifying non-human objects as human. Since humans and objects are not distinguished in the human tracking process after object detection, objects that are incorrectly tracked as humans may continuously cause unnecessary abnormal situation detection calculations. Excluding incorrectly detected objects from the calculation process of the anomaly detection unit is essential for increasing computation speed and accurately detecting abnormal situations.

프레임 단위로 추적되는 특성으로 이미지간 미세한 변화에도 물체 검출 모듈에서 위치 노이즈가 발생하며 휴먼 추적 과정의 예측 연산에서 노이즈가 증폭되어 정지한 물체의 위치정보는 상하좌우 무작위로 움직이게 된다. 그에 따라 제자리에 멈춘 상태인 오탐된 물체의 궤적에서 측정된 이동 거리는 휴먼의 이동거리와 유사하게 측정되어 식별이 힘들다. Due to the feature of being tracked on a frame-by-frame basis, positional noise is generated in the object detection module even when there is a slight change between images, and the noise is amplified in the prediction calculation of the human tracking process, causing the positional information of a stationary object to move randomly up, down, left, and right. Accordingly, the moving distance measured from the trajectory of a falsely detected object that is stopped in place is measured similarly to the moving distance of a human, making it difficult to identify.

또한, 처음 등장한 위치와 현재 위치의 변화량인 변위의 경우에도 ID 할당 실패로 인해 신뢰할 수 있는 휴먼 분석 결과를 제공하기 어렵다. 물체의 움직임을 안정적으로 측정하기 위해서는 추적된 물체 궤적의 노이즈를 제거하여 정량적으로 움직임을 측정하는 방법이 필수적이다.Additionally, even in the case of displacement, which is the amount of change between the first appearing location and the current location, it is difficult to provide reliable human analysis results due to ID allocation failure. In order to reliably measure the movement of an object, it is essential to have a method to measure the movement quantitatively by removing noise from the tracked object trajectory.

본 발명의 일 실시예에 따른 휴먼 궤적 분석 방법은 영상 속에서 검출된 물체의 움직임 궤적 정보에서 노이즈를 효과적으로 제거하여 휴먼으로 잘못 탐지된 물체를 식별할 수 있다. 상하좌우로 무작위로 흔들리는 노이즈의 양상을 제거하고 영상 내에서 움직인 정도를 정확하게 측정하도록 벡터의 방향성을 이용한 면적 계산 기법을 활용한다. 하기 수학식 1 및 2는 각각 좌표 변화를 이용한 다각형 면적 계산 방법 및 벡터 외적 연산을 이용한 다각형 면적 계산 방법을 나타낸다.The human trajectory analysis method according to an embodiment of the present invention can identify objects incorrectly detected as humans by effectively removing noise from the motion trajectory information of objects detected in an image. Area calculation techniques using the directionality of vectors are used to remove noise that randomly shakes up, down, left, and right and to accurately measure the degree of movement within the image. Equations 1 and 2 below represent a polygon area calculation method using coordinate changes and a polygon area calculation method using a vector cross product operation, respectively.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

수학식 1은 다각형의 2차원상 꼭짓점(x, y)들을 이용하여 y축으로 감소하는 영역과 증가하는 영역을 반대 부호로 합산하여 면적을 계산한다. 수학식 2는 수학식 1에서 다각형의 면적을 벡터의 연관관계에 의해 부호를 결정하도록 변경한 식이다.Equation 1 calculates the area by using the two-dimensional vertices (x, y) of the polygon and summing the decreasing area and increasing area on the y-axis with opposite signs. Equation 2 is a formula that changes the area of the polygon in Equation 1 so that the sign is determined by the correlation between vectors.

수학식 2는 두 벡터의 외적을 통해 시계 방향의 회전인지 반시계 방향의 회전인지를 판단하고 부호를 반대로 설정하여 면적을 계산한다. 수학식 2로 다각형의 면적을 계산하였을 때 상하좌우로 흔들리는 노이즈들은 실제로 이동한 정상적인 면적 계산에 합산되지 않고 상호 상쇄되는 효과를 보여줄 수 있다.Equation 2 determines whether the rotation is clockwise or counterclockwise through the cross product of the two vectors, and calculates the area by setting the sign to the opposite side. When calculating the area of a polygon using Equation 2, noises that shake up, down, left, and right may not be added to the calculation of the normal area that has actually moved, but may show a mutually cancelling effect.

도 4는 정지한 물체에서 검출된 이동 궤적 예시이다.Figure 4 is an example of a movement trajectory detected from a stationary object.

도 4를 참조하면, 정지한 물체에서 검출된 이동 궤적은 노이즈가 포함되어 무작위로 움직이는 특성을 가지고 있다. 벡터 A₀A₁과 A₁A₂는 반시계 방향의 관계를 가지고 있으므로, 음의 방향으로 삼각형 A₀A₁A₂의 면적을 계산한다. 반면 벡터 A₂A₃과 A₃A₄는 시계 방향의 관계를 가지므로, 양의 방향으로 삼각형 A₂A₃A₄의 면적을 계산한다. 이와 같이 벡터의 방향을 고려하여 노이즈가 제거된 이동 면적을 산출할 수 있다.Referring to FIG. 4, the movement trajectory detected from a stationary object contains noise and has the characteristic of moving randomly. Since vectors A ₀ A ₁ and A ₁ A ₂ have a counterclockwise relationship, the area of triangle A ₀ A ₁ A ₂ is calculated in the negative direction. On the other hand, because vectors A ₂ A ₃ and A ₃ A ₄ have a clockwise relationship, the area of triangle A ₂ A ₃ A ₄ is calculated in the positive direction. In this way, the movement area with noise removed can be calculated by considering the direction of the vector.

또한, 본 발명의 일 실시예에 따른 이동 궤적의 면적 계산 방법은 휴먼 추적 실패의 형태인 반복적인 id-switch가 발생하여 크게 움직이는 노이즈에도 방향성을 고려해 이동 면적을 산출하는 것이 가능하다. 또한 벡터 연산을 활용하므로, 2차원 공간뿐만 아니라 3차원 공간에서 움직일 때 발생하는 노이즈를 제거할 수 있다. 휴먼의 움직임을 정량적으로 측정할 수 있다면 움직인 정도를 이용하여 잘못 탐지된 물체를 구분할 수 있다. 또한 이동 궤적을 짧은 구간으로 분할하여 면적을 연산하여 휴먼이 정지한 상태를 파악하고 위험 요소 및 지역을 식별할 수 있다.In addition, the method for calculating the area of a movement trajectory according to an embodiment of the present invention is capable of calculating the movement area by taking directionality into account even in the case of large moving noise caused by repetitive id-switches, which are a form of human tracking failure. Additionally, by utilizing vector operations, noise that occurs when moving in not only two-dimensional space but also three-dimensional space can be removed. If human movement can be measured quantitatively, incorrectly detected objects can be distinguished using the degree of movement. In addition, by dividing the movement trajectory into short sections and calculating the area, it is possible to identify the human stopped state and identify risk factors and areas.

휴먼 궤적 분석 모듈(310)에서는 휴먼 검출 및 추적부의 잘못 탐지된 결과를 필터링하고 휴먼의 움직임 상태를 정량적으로 측정하여 다른 이상 상황 탐지부(400, 500)에 전달한다.The human trajectory analysis module 310 filters incorrectly detected results from the human detection and tracking unit, quantitatively measures the human's movement state, and transmits it to other abnormal situation detection units 400 and 500.

경계 영역 교차 탐지 모듈(320)은 휴먼의 위치정보만을 이용하여 빠르고 효율적으로 침입, 배회 상황을 탐지한다. 개별 CCTV 화면에서 사전에 설정된 경계 구역과 휴먼의 몸에 해당하는 영역의 중첩 정도를 계산하여 휴먼의 진입 여부를 판단한다. 휴먼이 침입 금지 구역에 진입한 시점과 퇴장한 시점을 이상 상황 통합 관리부(600)로 전달한다.The border area intersection detection module 320 quickly and efficiently detects intrusion and loitering situations using only human location information. The degree of overlap between the pre-set boundary zone on each CCTV screen and the area corresponding to the human body is calculated to determine whether the human has entered. The timing of when the human enters and exits the no-trespassing zone is transmitted to the abnormal situation integrated management unit (600).

목표 물체 주목 이상 상황 탐지부(400)는 휴먼 검출 및 궤적 정보와 물체 검출 정보를 입력 받아 수화물을 추적하고 방화 장면을 인식하여 물체와 관련된 이상 상황인 유기와 방화를 탐지한다. The abnormal situation detection unit 400 that focuses on the target object receives human detection and trajectory information and object detection information, tracks baggage, recognizes arson scenes, and detects abandonment and arson, which are abnormal situations related to objects.

수화물 추적 모듈 (410)은 수화물을 추적하고 소지한 주인을 식별하고 유기 행위를 탐지하는 모듈이다. 물체 검출 정보에서 사람의 악세사리에 해당하는 핸드백, 백팩, 여행용 가방 등을 추적할 수 있다. 추적된 물체는 주인을 식별하여 할당하며 수화물을 소지한 휴먼이 해당 물체를 버리고 자리를 벗어나는 것을 탐지하여 이상 상황 탐지 통합 관리부(600)로 결과를 전달한다.The baggage tracking module 410 is a module that tracks baggage, identifies the owner in possession, and detects abandonment. From object detection information, it is possible to track handbags, backpacks, travel bags, etc. that correspond to human accessories. Tracked objects are assigned by identifying their owners, and when a human carrying luggage abandons the object and leaves the location, the result is transmitted to the abnormal situation detection integrated management unit 600.

도 5는 본 발명의 일 실시예에 따른 방화 인식 모듈을 상세히 나타낸 블록도이다.Figure 5 is a block diagram showing in detail a fire prevention recognition module according to an embodiment of the present invention.

도 5를 참조하면, 방화 인식 모듈(420)은 상황을 설명하는 텍스트를 통해 화재 상황을 탐지하고 방화 행동을 식별하는 모듈이다. 도 5의 시각-언어 정보 변환 모델은 대규모의 이미지와 텍스트를 쌍으로 학습한 CLIP, Florence, Flamingo 등의 알려진 모델을 활용할 수 있으나, 본 발명의 범위가 이에 제한되는 것은 아니다.Referring to FIG. 5, the arson recognition module 420 is a module that detects a fire situation and identifies arson behavior through text describing the situation. The visual-language information conversion model of Figure 5 can utilize known models such as CLIP, Florence, and Flamingo that learn large-scale image and text pairs, but the scope of the present invention is not limited thereto.

종래 기술은 불이 포함된 이미지를 인식하거나 불이 발생한 영역을 물체로 학습한 물체 검출 모듈을 통해 화재 위치를 검출하는 방법을 사용하고 있다. 이러한 방법들은 대규모의 데이터셋을 학습한다면 안정적으로 동작할 수 있으나, 환경과 가연물에 따라 다양하게 발생하는 불의 모습과 양상을 고려하여 촬영된 데이터셋은 높은 수집 비용으로 인해 그 수가 매우 부족하다. 수집된 화재 영상의 다양성 부족 문제로 인해 기존의 방법으로 학습된 모델은 CCTV 환경의 변화에 매우 취약할 수밖에 없다. 따라서 기존의 방법들은 새로운 설치 장소에서 추가로 데이터를 수집하고 학습하는 과정을 필요로 한다.Conventional technology uses a method of detecting the location of a fire through an object detection module that recognizes images containing fire or learns the area where the fire occurred as an object. These methods can operate stably if they learn from large-scale datasets, but the number of datasets taken in consideration of the various appearances and aspects of fire that occur depending on the environment and combustibles is very insufficient due to the high collection cost. Due to the lack of diversity in collected fire images, models learned using existing methods are inevitably very vulnerable to changes in the CCTV environment. Therefore, existing methods require additional data collection and learning processes at new installation locations.

또한, 불의 위치를 탐지하는 기존의 기술들은 대부분 큰 화재가 발생한 지점 등을 찾아내어 대응하기 위한 목적으로 개발되었기 때문에 방화범이 불을 지르기 시작하는 방화 상황을 다루지 않는다. 사람이 시설이나 장소를 위협하는 목적으로 화재를 발생시키는 방화를 식별하기 위해서는 불의 위치 검출보다 불을 지르는 행위 인식이 더 중요하다고 할 수 있다.In addition, most of the existing technologies for detecting the location of a fire were developed for the purpose of finding and responding to the location of a large fire, so they do not address arson situations where an arsonist starts a fire. In order to identify arson, in which a person starts a fire for the purpose of threatening a facility or place, recognizing the act of starting a fire is more important than detecting the location of the fire.

전술한 바와 같이 종래의 화재 검출 기술은 대규모의 데이터셋의 부재로 인해 다양한 모습의 화재 상황에 취약한 단점이 있으며, 불을 지르기 시작하는 방화 행위는 인식 자체가 불가능하다. 따라서 기존의 화재 탐지 기법으로는 CCTV 환경에서 방화범을 식별하고 화재 초기에 상황을 검출하기 위한 영상 감시 시스템을 구축할 수 없다.As mentioned above, conventional fire detection technology has the disadvantage of being vulnerable to various types of fire situations due to the absence of large-scale datasets, and arson acts that start a fire are impossible to recognize. Therefore, existing fire detection techniques cannot establish a video surveillance system to identify arsonists in a CCTV environment and detect fire situations in the early stages.

본 발명의 일 실시예에 따른 방화 상황 검출 방법은 데이터 부족 문제를 해결하여 학습하지 않은 환경에서 안정적으로 동작하기 위한 이미지-텍스트 비교 추론 기법이다.The arson situation detection method according to an embodiment of the present invention is an image-text comparison inference technique to solve the data shortage problem and operate stably in an untrained environment.

이미지-텍스트 비교 추론을 가능하게 하기 위해서 이미지를 설명하는 텍스트로 구성된 대용량의 데이터셋으로 사전 훈련된 모델을 활용한다. 사전 훈련 모델은 수억장의 이미지를 사용하여 수억 개의 이미지 설명 중에서 해당 이미지를 가장 잘 설명하는 텍스트를 같은 공간에서 유사한 매핑 지점을 찾도록 학습된다. 사전 훈련이 완료된 시각 정보 변환기와 언어 정보 변환기는 이미지와 텍스트를 같은 공간인 이미지-텍스트 비교 공간(Contrastive Embedding Space)으로 변환할 수 있다. To enable image-text comparison inference, we utilize a model pre-trained with a large dataset consisting of text describing images. The pre-trained model is trained using hundreds of millions of images to find similar mapping points in the same space for text that best describes the image among hundreds of millions of image descriptions. The pre-trained visual information converter and language information converter can convert images and text into the same space, the image-text comparison space (Contrastive Embedding Space).

비교 가능한 차원 공간으로 매핑된 시각-언어 교차 벡터를 통해 이미지를 설명하는 텍스트 정보를 검색하거나 반대로 텍스트 정보를 통해 관련된 이미지를 검색할 수 있다. 기존 온톨로지 기반의 추론은 학습하지 못한 환경에 취약하고 새로운 검출 목표를 수행하기 위해 재학습이 필요하다. 하지만 이미지-텍스트 비교 추론 기법은 대용량으로 학습되어 이미지를 언어 정보로 해석하는 모델을 사용하여 자연어 기반의 추론이 가능하며 새로운 검출 목표를 설정하여 인식하는 것이 가능하다.Visual-language intersection vectors mapped into comparable dimensional spaces can be used to retrieve textual information describing an image, or conversely, related images can be retrieved through textual information. Existing ontology-based inference is vulnerable to untrained environments and requires retraining to perform new detection goals. However, the image-text comparison inference technique is learned on a large scale, so natural language-based inference is possible using a model that interprets images as language information, and it is possible to recognize by setting a new detection goal.

방화 인식 모듈(420)은 CCTV에서 수신한 이미지 정보와 이상 상황을 설명하는 텍스트 정보를 같은 공간에서 비교 추론하여 방화 상황을 인식한다. 방화 인식 모듈(420)은 CCTV 영상의 전체 화면 영역에서 장면 정보를 변환하며 정보를 다각도로 분석하기 위해 지역적인 정보로 패치를 분할하여 함께 사용할 수 있다. 도 5의 예시는 시각 정보를 추출하기 위해 검출된 휴먼 영역과 휴먼이 정지한 영역의 이미지를 함께 시각 정보 변환기를 통해 비교 공간으로 매핑한다. The arson recognition module 420 recognizes the arson situation by comparing and inferring image information received from CCTV and text information explaining the abnormal situation in the same space. The arson recognition module 420 converts scene information from the entire screen area of the CCTV image and can be used by dividing patches into regional information to analyze the information from various angles. In the example of Figure 5, in order to extract visual information, the detected human area and the image of the area where the human stops are mapped into a comparison space through a visual information converter.

언어 정보 변환기는 검출하기 위한 상황을 설명한 텍스트를 동일한 비교 공간으로 매핑한다. 같은 공간으로 매핑된 이미지 정보와 텍스트 설명은 유사도를 계산하거나 정상 환경과 대조하여 방화를 인식할 수 있다. 도 5에서는 장면에 대한 설명과 방화 행동에 대한 설명을 통해 장면뿐만 아니라 행동에 대한 인식을 동시에 수행할 수 있다.The language information converter maps the text describing the situation to be detected into the same comparison space. Image information and text descriptions mapped to the same space can be used to calculate similarity or recognize arson by comparing it to a normal environment. In Figure 5, recognition of not only the scene but also the action can be performed simultaneously through the description of the scene and the description of the arson action.

일 실시예에 따른 이미지-언어 비교 추론 기법은 대용량으로 학습된 시각-언어 정보 변환기를 사용하여 다양한 환경에 강인한 범용성을 가진다. 물체를 설명하는 텍스트뿐만 아니라 이미지에 등장한 사람의 행동과 물체의 상호작용을 언어로 이해할 수 있는 변환 과정을 학습하여 자연어 기반의 안정적인 추론이 가능하다. 인식 모듈은 이미지를 입력 받아 사람에게 친숙한 언어를 통해 표현이 가능하므로 추론한 결과를 해석할 수 있으며 사람과 상호작용이 가능하다.The image-language comparative inference technique according to one embodiment has robust versatility in various environments by using a large-capacity learned visual-language information converter. Stable inference based on natural language is possible by learning a conversion process that can understand not only the text that describes the object, but also the human behavior and object interaction in the image through language. The recognition module can receive images as input and express them through human-friendly language, so the inferred results can be interpreted and interaction with people is possible.

일 실시예에 따른 이미지-언어 비교 추론 기법은 사용하는 환경에 맞춰 설명 텍스트를 변경하여 검출 목표를 조정하는 것이 가능한 적응형 방법이다. 예를 들어, “하얗게 빛나는 모닥불이 보인다.”등의 텍스트를 검출 항목에 추가하여 학습 없이 즉각적으로 환경에 적응할 수 있다. 또한 촬영하는 시간 정보 혹은 날씨 정보를 이용하여 “야간에”, “아침에”, “눈이 오는 날에”, “비 내리는 날에” 등의 어미를 추가하여 검출 정확도를 높일 수 있다.The image-language comparative inference technique according to one embodiment is an adaptive method that can adjust the detection goal by changing the description text to suit the environment in which it is used. For example, by adding text such as “I see a white glowing bonfire” to the detection items, it is possible to immediately adapt to the environment without learning. Additionally, detection accuracy can be improved by adding endings such as “at night,” “in the morning,” “on a snowy day,” or “on a rainy day,” using information on the time of shooting or weather information.

휴먼 행동 주목 이상 상황 탐지부(500)는 휴먼 검출 및 궤적 정보를 입력 받아 휴먼의 행동을 인식하고 행동이 발생한 구간을 탐지하여 휴먼의 행동과 관련된 이상 상황인 싸움과 쓰러짐을 탐지한다.The abnormal situation detection unit 500, which pays attention to human behavior, receives human detection and trajectory information, recognizes human behavior, detects the section where the behavior occurred, and detects abnormal situations related to human behavior, such as fighting and falling.

휴먼 행동 인식 모듈(510)은 휴먼 검출 및 추적부(200)에서 획득된 휴먼의 위치 및 추적 결과를 이용해 시간의 흐름에 따른 휴먼 관심 영역의 이미지를 입력 받아 휴먼의 행동을 인식하고 행동의 종류를 연속적으로 인식하는 모듈이다. 본 발명에서의 행동 인식 모델은 R(2+1)D-18 등의 알려진 모델을 활용할 수 있으며 화면에 등장한 휴먼의 행동 분류 결과를 연속적으로 분류하여 분류 결과를 이상 행동 구간 탐지 모듈(520)에 전달한다. The human behavior recognition module 510 uses the human location and tracking results obtained from the human detection and tracking unit 200 to receive an image of the human area of interest over time, recognizes the human behavior, and determines the type of behavior. It is a module that recognizes continuously. The behavior recognition model in the present invention can utilize known models such as R(2+1)D-18, and continuously classifies the human behavior classification results that appear on the screen to send the classification results to the abnormal behavior section detection module 520. Deliver.

이상 행동 구간 탐지 모듈(520)은 앞서 분류된 휴먼 행동 인식 모듈(510)의 행동 분류 결과를 시계열 데이터로써 처리하여 이상 행동의 구간을 탐지할 수 있다. 휴먼 행동 인식 모듈(510)의 분류 결과를 이상 행동 분류 큐(Queue)와 이상 행동 구간 검출 큐(Queue)에 적재하여 처리할 수 있다. 이상 행동 분류 큐는 매 처리 영상에 대해 발생한 최종 이상 행동의 종류를 판단하는데 활용되며, 행동 구간 검출 큐는 앞선 이상 행동 분류 큐에서 결정된 최종 이상 행동의 구간을 판단하는데 활용될 수 있다.The abnormal behavior section detection module 520 can detect a section of abnormal behavior by processing the previously classified behavior classification results of the human behavior recognition module 510 as time series data. The classification result of the human behavior recognition module 510 can be processed by loading it into an abnormal behavior classification queue (Queue) and an abnormal behavior section detection queue (Queue). The abnormal behavior classification cue is used to determine the type of final abnormal behavior that occurred for each processed image, and the behavior section detection cue can be used to determine the section of the final abnormal behavior determined in the previous abnormal behavior classification cue.

도 6은 이상 행동 구간 검출 과정을 나타낸 흐름도이다.Figure 6 is a flowchart showing the abnormal behavior section detection process.

도 6을 참조하면, 본 발명의 일 실시예에 따른 이상 행동 구간 검출 방법은 현재 프레임에 등장하는 신원을 입력하고(S610), 각 신원 별 최근 행동을 획득한다(S620). 이때, 각 시점의 휴먼 신원에 대해 가장 최근에 발생한 행동의 분류 결과 중 가장 신뢰도가 높은 행동을 해당 시점의 해당 신원을 가진 휴먼의 행동으로 정의할 수 있다. 등장 시간이 짧아 행동 분류가 아직 수행되지 못한 신원에 대해서는 정상 행동으로 취급할 수 있다. Referring to FIG. 6, the method for detecting abnormal behavior sections according to an embodiment of the present invention inputs identities appearing in the current frame (S610) and obtains recent behavior for each identity (S620). At this time, the most reliable behavior among the classification results of the most recent behavior for the human identity at each point in time can be defined as the behavior of the human with that identity at that point in time. Identity for which behavioral classification has not yet been performed due to a short appearance time can be treated as normal behavior.

매 시점마다 존재하는 모든 신원들의 행동을 획득하여 가장 빈번한 행동을 이상 행동 분류 큐에 적재한다(S630). 반면에 이상 행동 구간 검출 큐에는 매 시점마다 존재하는 모든 신원들의 행동 중 이상행동(싸움, 쓰러짐)에 대해서만 최빈 행동을 계산하여 적재한다(S640, S650). The behavior of all identities present at each point in time is acquired and the most frequent behavior is loaded into the abnormal behavior classification queue (S630). On the other hand, in the abnormal behavior section detection queue, the most common behavior is calculated and loaded only for abnormal behavior (fighting, falling) among the behaviors of all identities present at each time (S640, S650).

이상 행동 구간 탐지를 위한 방법은 크게 두 단계로 구성된다. 먼저 이상 행동 분류 큐의 행동에서 가장 빈번히 발생한 행동을 획득한다. 획득된 행동을 입력 영상에서 발생한 주된 행동으로 정의한다. 주된 행동의 구간을 탐지하기 위해 이상 행동 구간 검출 큐에서 주된 행동만을 남긴다. 이후 일정 프레임 수를 기준으로 연속적으로 하나의 클러스터로 연결하고, 최종적으로 획득된 각 클러스터들을 입력 영상의 주된 행동의 구간으로 정의할 수 있다. 위 과정(S610 내지 S650)을 영상이 종료될 때 까지 반복할 수 있다(S660).The method for detecting abnormal behavior sections largely consists of two steps. First, the most frequently occurring actions are obtained from the actions in the abnormal action classification cue. Obtained behavior is defined as the main behavior occurring in the input image. To detect the main behavior section, only the main behavior is left in the abnormal behavior section detection queue. Afterwards, they are continuously connected into one cluster based on a certain number of frames, and each finally obtained cluster can be defined as the main action section of the input image. The above process (S610 to S650) can be repeated until the video ends (S660).

도 7은 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 장치를 나타낸 블록도이다. Figure 7 is a block diagram showing an integrated abnormal situation detection device according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 일 실시예에 따른 통합형 이상 상황 탐지 장치는 제1 신경망을 이용하여 입력 영상으로부터 사물 객체 및 인간 객체를 검출하는 객체 검출부(710), 상기 인간 객체를 추적하는 인간 객체 추적부(720) 및 객체 검출 결과 및 인간 객체의 추적 결과에 기반하여 이상 상황을 탐지하는 이상 상황 탐지부(730)를 포함한다. Referring to FIG. 7, the integrated abnormal situation detection device according to an embodiment of the present invention includes an object detector 710 that detects object objects and human objects from an input image using a first neural network, and a human that tracks the human object. It includes an object tracking unit 720 and an abnormal situation detection unit 730 that detects an abnormal situation based on the object detection result and the human object tracking result.

이때, 상기 인간 객체 추척부(720)는 상기 제1 신경망의 중간 연산 결과에 기반하여 생성된 제1 특징 정보 및 상기 제1 신경망의 최종 연산 결과에 상응하는 인간 객체 영역을 입력으로, 제2 신경망을 통해 추출된 제2 특징 정보를 이용하여 인간 객체를 추적할 수 있다. At this time, the human object tracking unit 720 uses the first feature information generated based on the intermediate calculation result of the first neural network and the human object area corresponding to the final calculation result of the first neural network as input, and the second neural network A human object can be tracked using the second feature information extracted through .

이때, 상기 인간 객체 추적부(720)는 상기 제1 특징 정보 및 상기 제2 특징 정보를 이용하여 프레임 간 동일한 인간 객체를 매칭할 수 있다. At this time, the human object tracking unit 720 may match the same human object between frames using the first characteristic information and the second characteristic information.

이때, 상기 인간 객체 추적부(720)는 상기 제1 특징 정보 및 상기 제2 특징 정보에 기반하여 생성된 (M+N) 차원 특징 벡터를 이용하여 상기 인간 객체를 추적할 수 있다. At this time, the human object tracking unit 720 may track the human object using an (M+N)-dimensional feature vector generated based on the first feature information and the second feature information.

이때, 상기 인간 객체 추적부(720)는 상기 인간 객체의 이동 궤적 정보에 기반하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, the human object tracking unit 720 may identify the object that was incorrectly detected as a human object based on the movement trajectory information of the human object.

이때, 상기 인간 객체 추적부(720)는 상기 인간 객체의 이동 궤적 상응하는 움직임 벡터를 산출하고, 구간별 움직임 벡터 간 외적 연산 결과를 이용하여 인간 객체로 오검출된 사물 객체를 식별할 수 있다. At this time, the human object tracking unit 720 may calculate a motion vector corresponding to the movement trajectory of the human object, and use the result of the outer product between motion vectors for each section to identify the object that was incorrectly detected as a human object.

이때, 상기 이상 상황 탐지부(730)는 상기 입력 영상에 상응하는 시각 특징 정보 및 방화 상황 텍스트에 상응하는 언어 특징 정보를 이용하여 방화 상황을 탐지할 수 있다. At this time, the abnormal situation detection unit 730 may detect an arson situation using visual feature information corresponding to the input image and language feature information corresponding to the arson situation text.

도 8은 실시예에 따른 컴퓨터 시스템의 구성을 나타낸 도면이다.Figure 8 is a diagram showing the configuration of a computer system according to an embodiment.

실시예에 따른 통합형 이상 상황 탐지 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(1000)에서 구현될 수 있다.The integrated abnormal situation detection device according to the embodiment may be implemented in a computer system 1000 such as a computer-readable recording medium.

컴퓨터 시스템(1000)은 버스(1020)를 통하여 서로 통신하는 하나 이상의 프로세서(1010), 메모리(1030), 사용자 인터페이스 입력 장치(1040), 사용자 인터페이스 출력 장치(1050) 및 스토리지(1060)를 포함할 수 있다. 또한, 컴퓨터 시스템(1000)은 네트워크(1080)에 연결되는 네트워크 인터페이스(1070)를 더 포함할 수 있다. 프로세서(1010)는 중앙 처리 장치 또는 메모리(1030)나 스토리지(1060)에 저장된 프로그램 또는 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1030) 및 스토리지(1060)는 휘발성 매체, 비휘발성 매체, 분리형 매체, 비분리형 매체, 통신 매체, 또는 정보 전달 매체 중에서 적어도 하나 이상을 포함하는 저장 매체일 수 있다. 예를 들어, 메모리(1030)는 ROM(1031)이나 RAM(1032)을 포함할 수 있다.Computer system 1000 may include one or more processors 1010, memory 1030, user interface input device 1040, user interface output device 1050, and storage 1060 that communicate with each other via bus 1020. You can. Additionally, the computer system 1000 may further include a network interface 1070 connected to the network 1080. The processor 1010 may be a central processing unit or a semiconductor device that executes programs or processing instructions stored in the memory 1030 or storage 1060. The memory 1030 and storage 1060 may be storage media that includes at least one of volatile media, non-volatile media, removable media, non-removable media, communication media, and information transfer media. For example, memory 1030 may include ROM 1031 or RAM 1032.

본 발명에서 설명하는 특정 실행들은 실시예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described in the present invention are examples and are not intended to limit the scope of the present invention in any way. For the sake of brevity of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connections or connection members of lines between components shown in the drawings exemplify functional connections and/or physical or circuit connections, and in actual devices, various functional connections or physical connections may be replaced or added. Can be represented as connections, or circuit connections. Additionally, if there is no specific mention such as “essential,” “important,” etc., it may not be a necessary component for the application of the present invention.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all scopes equivalent to or equivalently changed from the scope of the claims are within the scope of the spirit of the present invention. It will be said to belong to

710: 객체 검출부
720: 인간 객체 추적부
730: 이상 상황 탐지부
1000: 컴퓨터 시스템 1010: 프로세서
1020: 버스 1030: 메모리
1031: 롬 1032: 램
1040: 사용자 인터페이스 입력 장치
1050: 사용자 인터페이스 출력 장치
1060: 스토리지 1070: 네트워크 인터페이스
1080: 네트워크710: Object detection unit
720: Human object tracking unit
730: Abnormal situation detection unit
1000: computer system 1010: processor
1020: Bus 1030: Memory
1031: Rom 1032: RAM
1040: User interface input device
1050: User interface output device
1060: Storage 1070: Network Interface
1080: Network

Claims

In an integrated abnormal situation detection method performed in an abnormal situation detection device,
Detecting object objects and human objects from an input image using a first neural network;
tracking the human object; and
An integrated abnormal situation detection method comprising the step of detecting an abnormal situation based on an object detection result and a human object tracking result.

In claim 1,
The step of tracking the human object is
First feature information generated based on an intermediate calculation result of the first neural network; and
An integrated abnormal situation detection method, characterized in that it is performed using the human object area corresponding to the final calculation result of the first neural network as input and second feature information extracted through a second neural network.

In claim 2,
The intermediate operation result includes spatial information and texture information of the input image,
An integrated abnormal situation detection method, characterized in that the first characteristic information is extracted by masking the area of the detected human object with respect to the intermediate calculation result.

In claim 2,
The step of tracking the human object is
An integrated abnormal situation detection method, characterized in that matching the same human object between frames using the first characteristic information and the second characteristic information.

In claim 2,
The first feature information corresponds to an M-dimensional feature vector, and the second feature information corresponds to an N-dimensional feature vector,
The step of tracking the human object is
An integrated abnormal situation detection method, characterized in that tracking the human object using an (M+N) dimensional feature vector generated based on the first feature information and the second feature information.

In claim 1,
The tracking result of the human object is
Contains occupied area and movement trajectory information of the human object,
The step of tracking the human object is
An integrated abnormal situation detection method characterized by identifying an object that is incorrectly detected as a human object based on the movement trajectory information of the human object.

In claim 6,
The step of tracking the human object is
An integrated abnormal situation detection method characterized by calculating a motion vector corresponding to the movement trajectory of the human object and identifying an object misdetected as a human object using the result of an outer product between motion vectors for each section.

In claim 1,
The step of detecting the abnormal situation is
Visual feature information corresponding to the input image; and
An integrated abnormal situation detection method characterized by detecting an arson situation using language feature information corresponding to the arson situation text.

In claim 8,
The step of detecting the abnormal situation is
An integrated abnormal situation detection method, characterized in that the arson situation is detected by mapping the visual feature information and the language feature information to the same comparison space and calculating the similarity between the visual feature information and the language feature information.

In claim 8,
The visual feature information is
An integrated abnormal situation detection method, characterized in that it is generated based on the input image, the human object area image, and the area image in which the human object is determined to have stayed for more than a preset time.

In claim 1,
The step of detecting the abnormal situation is
An integrated abnormal situation detection method characterized by detecting human object behavior, setting main behavior for each section based on the frequency of the human object behavior, and calculating the section in which the abnormal situation occurs using the main behavior information for each section. .

In claim 1,
The above abnormal situation is
An integrated abnormal situation detection method characterized by including intrusion, loitering, arson, abandonment, fighting, and collapse situations.

an object detection unit that detects object objects and human objects from an input image using a first neural network;
a human object tracking unit that tracks the human object; and
an abnormal situation detection unit that detects an abnormal situation based on the object detection result and the human object tracking result;
An integrated abnormal situation detection device comprising:

In claim 13,
The human object tracking unit
First feature information generated based on an intermediate calculation result of the first neural network; and
An integrated abnormal situation detection device, characterized in that the human object area corresponding to the final calculation result of the first neural network is input and the human object is tracked using second feature information extracted through a second neural network.

In claim 14,
The intermediate operation result includes spatial information and texture information of the input image,
An integrated abnormal situation detection device, characterized in that the first characteristic information is extracted by masking the area of the detected human object with respect to the intermediate calculation result.

In claim 14,
The human object tracking unit
An integrated abnormal situation detection device, characterized in that matching the same human object between frames using the first characteristic information and the second characteristic information.

In claim 14,
The first feature information corresponds to an M-dimensional feature vector, and the second feature information corresponds to an N-dimensional feature vector,
The human object tracking unit
An integrated abnormal situation detection device, characterized in that tracking the human object using an (M+N) dimensional feature vector generated based on the first feature information and the second feature information.

In claim 13,
The tracking result of the human object is
Contains occupied area and movement trajectory information of the human object,
The human object tracking unit
An integrated abnormal situation detection device characterized in that it identifies an object that is incorrectly detected as a human object based on the movement trajectory information of the human object.

In claim 18,
The human object tracking unit
An integrated abnormal situation detection device that calculates a motion vector corresponding to the movement trajectory of the human object and identifies an object that has been incorrectly detected as a human object using the result of an outer product between motion vectors for each section.

In claim 13,
The abnormal situation detection unit
Visual feature information corresponding to the input image; and
An integrated abnormal situation detection device characterized in that it detects an arson situation using language feature information corresponding to the arson situation text.