KR20210114169A

KR20210114169A - Method for analyzing monitoring image using object verification, and apparatus for the same

Info

Publication number: KR20210114169A
Application number: KR1020200029486A
Authority: KR
Inventors: 이승익; 이진하
Original assignee: 한국전자통신연구원
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2021-09-23

Abstract

The present invention relates to a monitoring image analysis method using object verification to reduce a false alarm rate and increase accuracy of anomaly detection and an apparatus thereof. According to the present invention, the monitoring image analysis method comprises the following steps: extracting an image frame of a monitoring image to generate an input frame; generating a plurality of image patches including an image of the object of interest from the input frame; selecting at least one object-oriented image patch from the plurality of image patches; training a generative neural network by using a normal situation image patch selected on the basis of the object-oriented image patch; comparing the object-oriented image patch with the regenerated image generated by inputting the object-oriented image patch into the generative neural network to calculate an anomaly score; and comparing the abnormal score with a threshold value to generate an anomaly signal.

Description

객체검증을 이용한 모니터링 영상 분석 방법 및 이를 위한 장치{METHOD FOR ANALYZING MONITORING IMAGE USING OBJECT VERIFICATION, AND APPARATUS FOR THE SAME} Monitoring image analysis method using object verification and device therefor

본 발명은 모니터링 영상 분석 방법에 관한 것으로, 감시 지역의 모니터링 영상으로부터 이상 상황을 감지하는 기술에 관한 것이다. The present invention relates to a monitoring image analysis method, and to a technology for detecting an abnormal situation from a monitoring image in a monitoring area.

기존의 이상 감지 기술들은 영상에서 관심객체의 추출과 이벤트 발생 탐지를 위해 프레임 전체를 이용하는 방법 또는 그리드 추출 혹은 객체 검출기 등을 이용해 이미지 패치(Image patch)를 이용하는 방법을 사용하고 있다. Existing anomaly detection technologies use a method of using the entire frame for extracting an object of interest from an image and detecting event occurrence, or a method of using an image patch using grid extraction or an object detector.

한국공개특허 제 10-2019-0079110호는 자가 학습 기반의 모니터링 영상 분석 장치 및 방법을 개시하고 있다. 상기 장치 및 방법은 촬영된 영상의 프레임을 누적하고 프레임을 추출한다. 추출된 프레임을 오토인코더에 적용하여 출력되는 오차를 이용하여 이상을 감지한다. 즉, 이상 감지를 위한 데이터로 프레임 전체를 시간에 따라 누적하여 이를 인공지능의 입력으로 이용한다. 그러나 상기 선행특허의 경우 프레임 전체를 사용하기 때문에 모션 추출이나 분할(Segmentation)이 적절히 이루어지지 않는다면 실제 객체보다 배경이 지배적인 영향을 끼치는 문제가 있다. Korean Patent Application Laid-Open No. 10-2019-0079110 discloses a self-learning-based monitoring image analysis apparatus and method. The apparatus and method accumulate frames of captured images and extract frames. The extracted frame is applied to the autoencoder and an error is detected using the output error. That is, as data for detecting anomalies, the entire frame is accumulated over time and used as an input for artificial intelligence. However, in the case of the prior patent, since the entire frame is used, if motion extraction or segmentation is not performed properly, there is a problem in that the background has a dominant influence over the actual object.

이러한 문제를 해결하기 위하여 실제 관심 대상만을 추출한 이미지 패치를 이용하기도 한다. 그러나 이와 같이 추출한 이미지 패치는 항상 온전한 객체를 포함하지는 못한다. 이때 잘못된 객체 검출은 시스템의 오경보를 유발하고 정확도를 떨어트린다. In order to solve this problem, an image patch extracted only from an actual object of interest is used. However, the image patch extracted in this way does not always contain the whole object. In this case, false object detection causes a false alarm in the system and lowers the accuracy.

한국등록특허 제 10-1980551호는 기계학습 객체 검출을 이용한 실시간 지능형 CCTV 영상 분석 행위 탐지 시스템 및 방법을 개시하고 있다. 상기 시스템 및 방법은 영상프레임에서 사전에 기 입력된 기준설정객체와 매칭되는 객체를 검출하고, 기 입력된 객체 행동과 매칭되는 위험행동객체를 검출한다. 이 기술의 경우 해당 이미지 패치가 온전한 객체를 포함하는지 확인하는 과정이 없기 때문에 잘못된 객체를 검출하였을 경우, 잘못된 이상 감지로 이어질 수 있다는 문제점이 있다. Korean Patent Registration No. 10-1980551 discloses a real-time intelligent CCTV image analysis behavior detection system and method using machine learning object detection. The system and method detects an object matching a pre-input reference setting object in an image frame, and detects a dangerous behavior object matching a pre-entered object action. In the case of this technology, since there is no process of checking whether the corresponding image patch contains an intact object, there is a problem that when an incorrect object is detected, it may lead to an incorrect detection of anomaly.

따라서 위 종래 기술들이 가지는 문제점을 해결하고 이상 검출의 정확도를 높인 이상 상황을 감지하는 기술의 필요성이 대두된다. Accordingly, there is a need for a technique for detecting an abnormal situation that solves the problems of the above prior art and increases the accuracy of abnormal detection.

본 발명의 목적은 객체검증을 이용한 모니터링 영상 분석 방법 및 그 장치를제공함으로써 촬영된 영상에서 객체 중심의 이미지 패치를 추출하고 객체검증을 통해 선별된 데이터를 이용하여 오경보율을 낮추고 이상 검출의 정확도를 높인 이상 감지 시스템을 제공함에 있다. An object of the present invention is to provide a monitoring image analysis method and apparatus using object verification, thereby extracting an object-centered image patch from a captured image, and using data selected through object verification to lower the false alarm rate and improve the accuracy of abnormal detection It is to provide a high anomaly detection system.

또한 이상 검출을 위해 생성형 신경망 학습을 이용함으로써 이상검출의 정확도를 더 높인 이상 감지 시스템을 제공함에 있다. Another object of the present invention is to provide an anomaly detection system that further increases the accuracy of anomaly detection by using generative neural network learning for anomaly detection.

실시예에 따른 객체검증을 이용한 모니터링 영상 분석 방법은, 모니터링 영상의 영상 프레임을 추출하여 입력 프레임을 생성하는 단계; 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성하는 단계; 상기 복수개의 이미지 패치들 중 적어도 하나 이상의 객체 중심 이미지 패치를 선택하는 단계; 상기 객체 중심 이미지 패치를 기반으로 선택된 정상 상황 이미지 패치를 이용하여 생성형 신경망을 학습하는 단계; 상기 객체 중심 이미지 패치를 상기 생성형 신경망에 입력하여 생성된 재생성 이미지와 상기 객체 중심 이미지 패치를 비교하여 이상 점수를 산출하는 단계; 및 상기 이상 점수와 임계값을 비교하여 이상신호를 생성하는 단계를 포함한다. A monitoring image analysis method using object verification according to an embodiment includes: generating an input frame by extracting an image frame of a monitoring image; generating a plurality of image patches including an image of the object of interest from the input frame; selecting at least one object-oriented image patch from among the plurality of image patches; learning a generative neural network using a normal situation image patch selected based on the object-oriented image patch; calculating an abnormality score by comparing the object-oriented image patch with the regenerated image generated by inputting the object-oriented image patch into the generative neural network; and generating an abnormality signal by comparing the abnormality score with a threshold value.

상기 이미지 패치들을 생성하는 단계는 상기 입력 프레임으로부터 관심객체를 추출하는 단계; 상기 입력 프레임으로부터 상기 관심객체의 이미지가 포함된 복수개의 이미지 패치를 생성하는 단계를 포함할 수 있다. The generating of the image patches may include: extracting an object of interest from the input frame; The method may include generating a plurality of image patches including the image of the object of interest from the input frame.

상기 객체 중심 이미지 패치는 상기 복수개의 이미지 패치들 중 상기 관심 객체의 이미지가 가장 온전히 포함된 것일 수 있다. The object-oriented image patch may include the image of the object of interest most completely among the plurality of image patches.

상기 생성형 신경망을 학습하는 단계는 상기 객체 중심 이미지 패치 중 정상 상황을 나타내는 정상 이미지 패치를 적어도 하나 이상 선택하는 단계; 상기 정상 이미지 패치를 이용하여 생성형 신경망을 학습하는 단계; 학습된 생성형 신경망을 저장하는 단계를 포함할 수 있다. The step of learning the generative neural network may include: selecting at least one normal image patch representing a normal situation from among the object-oriented image patches; learning a generative neural network using the normal image patch; It may include storing the learned generative neural network.

상기 이상 점수를 산출하는 단계는 상기 객체 중심 이미지 패치를 상기 학습된 생성형 신경망을 이용하여 재생성 이미지를 생성하는 단계; 상기 객체 중심 이미지 패치를 상기 재생성 이미지와 비교하여 차이값을 계산하는 단계; 상기 복수개의 객체 중심 이미지 패치에 대하여 상기 차이값을 수집하는 단계를 포함하고, 수집한 차이값을 종합하여 개별 이상 점수를 계산하는 것일 수 있다. Calculating the abnormality score may include: generating a regenerated image of the object-oriented image patch using the learned generative neural network; calculating a difference value by comparing the object-oriented image patch with the regenerated image; The method may include collecting the difference values for the plurality of object-oriented image patches, and calculating individual abnormality scores by synthesizing the collected difference values.

상기 이상 점수를 산출하는 단계는 복수의 관심객체에 대하여 상기 개별 이상 점수를 수집하는 단계를 더 포함하고, 상기 개별 이상 점수 중 가장 큰 점수를 상기 입력 프레임에 대한 이상 점수로 정하는 것일 수 있다. The step of calculating the abnormal score is The method may further include collecting the individual anomaly scores for a plurality of objects of interest, and determining a highest score among the individual anomaly scores as an abnormality score for the input frame.

상기 관심객체는 기 정의된 객체 집합에 대한 이미지 비교 탐색 알고리즘 또는 객체 탐지용 인공지능 알고리즘을 이용하여 얻는 것일 수 있다. The object of interest may be obtained by using an image comparison search algorithm for a predefined object set or an artificial intelligence algorithm for object detection.

상기 입력 프레임을 생성하는 단계는 상기 입력 프레임으로부터 모션 지도를 생성하는 단계를 포함하고, 상기 이미지 패치들을 생성하는 단계는 상기 모션 지도에 기반하여 움직임이 큰 지역을 관심영역으로 생성하는 단계; 상기 관심객체와 상기 관심영역간의 거리에 기반하여 상기 이미지 패치를 생성하는 단계를 포함할 수 있다. The generating of the input frame may include generating a motion map from the input frame, and the generating of the image patches may include: generating a region with high motion as a region of interest based on the motion map; and generating the image patch based on a distance between the object of interest and the region of interest.

상기 이상신호를 생성하는 단계는 상기 이상신호에 기반하여 사용자에게 경고음 또는 알림 메시지를 발생하는 단계를 포함할 수 있다. The generating of the abnormal signal may include generating a warning sound or a notification message to the user based on the abnormal signal.

실시예에 따른 객체검증을 이용한 모니터링 영상 분석 장치는 모니터링 영상의 영상 프레임을 추출하여 입력 프레임을 생성하는 프레임전처리부; 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성하는 객체탐지부; 상기 복수개의 이미지 패치들 중 적어도 하나 이상의 객체 중심 이미지 패치를 선택하는 객체검증부; 상기 객체 중심 이미지 패치를 기반으로 선택된 정상 상황 이미지 패치를 이용하여 생성형 신경망을 학습하는 모델학습부; 상기 객체 중심 이미지 패치를 상기 생성형 신경망에 입력하여 생성된 재생성 이미지와 상기 객체 중심 이미지 패치를 비교하여 이상 점수를 산출하는 이상점수산출부; 및 상기 이상 점수와 임계값을 비교하여 이상신호를 생성하는 이상검출부를 포함한다. A monitoring image analysis apparatus using object verification according to an embodiment includes: a frame pre-processing unit for generating an input frame by extracting an image frame of a monitoring image; an object detection unit generating a plurality of image patches including an image of an object of interest from the input frame; an object verification unit that selects at least one object-oriented image patch from among the plurality of image patches; a model learning unit for learning a generative neural network using a normal situation image patch selected based on the object-oriented image patch; an abnormality score calculation unit for calculating an abnormality score by comparing the object-oriented image patch with the regenerated image generated by inputting the object-oriented image patch into the generative neural network; and an abnormality detection unit for generating an abnormality signal by comparing the abnormality score with a threshold value.

본 발명에 따르면 객체검증을 이용한 모니터링 영상 분석 방법 및 그 장치를제공함으로써 촬영된 영상에서 객체 중심의 이미지 패치를 추출하고 객체검증을 통해 선별된 데이터를 이용하여 오경보율을 낮추고 이상 검출의 정확도를 높인 이상 감지 시스템을 제공할 수 있다. According to the present invention, by providing a monitoring image analysis method and apparatus using object verification, an object-centered image patch is extracted from a photographed image, and the false alarm rate is lowered and the accuracy of abnormal detection is increased by using the data selected through object verification. An anomaly detection system may be provided.

또한 이상 검출을 위해 생성형 신경망 학습을 이용함으로써 이상검출의 정확도를 더 높인 이상 감지 시스템을 제공할 수 있다. In addition, it is possible to provide an anomaly detection system that further increases the accuracy of anomaly detection by using generative neural network learning for anomaly detection.

도 1은 실시예에 따른 객체검증을 이용한 모니터링 영상 분석 장치의 일 예를 나타낸 블록도이다.
도 2는 도 1에 도시된 객체 중심 이미지 생성부의 일 예를 나타낸 블록도이다.
도 3은 도 1에 도시된 객체 중심 이미지 생성부의 활용예를 나타낸 도면이다.
도 4는 실시예에 따른 객체검증을 이용한 모니터링 영상 분석 방법의 일 예를 나타낸 동작 흐름도이다.
도 5는 실시예에 따른 컴퓨터 시스템 구성을 나타낸 도면이다.1 is a block diagram illustrating an example of a monitoring image analysis apparatus using object verification according to an embodiment.
FIG. 2 is a block diagram illustrating an example of the object-oriented image generator shown in FIG. 1 .
FIG. 3 is a diagram illustrating an example of application of the object-oriented image generator shown in FIG. 1 .
4 is an operation flowchart illustrating an example of a monitoring image analysis method using object verification according to an embodiment.
5 is a diagram showing the configuration of a computer system according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the technical field to which the present invention belongs It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

비록 "제1" 또는 "제2" 등이 다양한 구성요소를 서술하기 위해서 사용되나, 이러한 구성요소는 상기와 같은 용어에 의해 제한되지 않는다. 상기와 같은 용어는 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용될 수 있다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있다.Although "first" or "second" is used to describe various elements, these elements are not limited by the above terms. Such terms may only be used to distinguish one component from another. Accordingly, the first component mentioned below may be the second component within the spirit of the present invention.

본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소 또는 단계가 하나 이상의 다른 구성요소 또는 단계의 존재 또는 추가를 배제하지 않는다는 의미를 내포한다.The terminology used herein is for the purpose of describing the embodiment and is not intended to limit the present invention. As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” or “comprising” implies that the stated component or step does not exclude the presence or addition of one or more other components or steps.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 해석될 수 있다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used herein may be interpreted with meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하에서는, 도 1 내지 도 5를 참조하여 실시예에 따른 객체검증을 이용한 모니터링 영상 분석 방법 및 장치가 상세히 설명된다.Hereinafter, a monitoring image analysis method and apparatus using object verification according to an embodiment will be described in detail with reference to FIGS. 1 to 5 .

도 1은 실시예에 따른 객체검증을 이용한 모니터링 영상 분석 장치의 일 예를 나타낸 블록도이다. 1 is a block diagram illustrating an example of a monitoring image analysis apparatus using object verification according to an embodiment.

도 1을 참조하면, 객체검증을 이용한 모니터링 영상 분석 장치는 영상 촬영부(110), 객체 중심 이미지 생성부(120), 모델 학습부(130), 이상점수 산출부(140), 이상 검출부(150)를 포함한다. Referring to FIG. 1 , the monitoring image analysis apparatus using object verification includes an image capturing unit 110 , an object-centered image generating unit 120 , a model learning unit 130 , an abnormal score calculating unit 140 , and an abnormality detecting unit 150 . ) is included.

영상 촬영부(110)는 감시 대상 영역에 대한 영상을 수집한다. 객체 중심 이미지 생성부(120)는 영상촬영부(110)에서 수집한 영상으로부터 객체 중심 이미지를 생성한다. 모델 학습부(130)는 상기 객체 중심 이미지 중 정상 상황 이미지를 이용해 생성형 신경망을 학습한다. 이상점수 산출부(140)는 상기 객체 중심 이미지로부터 이상점수를 산출한다. 이상 검출부(150)는 상기 이상 점수에 따라 이상 신호를 생성하는 동작을 수행한다.The image capturing unit 110 collects images of the monitoring target area. The object-oriented image generating unit 120 generates an object-oriented image from the images collected by the image capturing unit 110 . The model learning unit 130 learns a generative neural network using a normal situation image among the object-oriented images. The outlier score calculator 140 calculates an outlier score from the object-centered image. The abnormality detection unit 150 generates an abnormality signal according to the abnormality score.

구체적으로 각 구성의 동작을 설명하면 다음과 같다.In detail, the operation of each configuration will be described as follows.

영상 촬영부(110)는 고정형 CCTV와 같은 영상촬영 장치로 감시 대상 영역에대한 비디오 영상을 수집한다. The image capturing unit 110 collects video images for the area to be monitored with an image capturing device such as a fixed CCTV.

객체 중심 이미지 생성부(120)는 우선 영상 촬영부(110)에서 수집한 영상으로부터 모니터링 영상의 영상 프레임을 추출하여 입력 프레임을 생성한다. 또한 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성한다. 그리고 상기 복수개의 이미지 패치들 중 적어도 하나 이상의 객체 중심 이미지 패치를 선택하는 동작을 수행한다. The object-oriented image generating unit 120 generates an input frame by first extracting an image frame of a monitoring image from the image collected by the image capturing unit 110 . In addition, a plurality of image patches including the image of the object of interest are generated from the input frame. Then, an operation of selecting at least one object-oriented image patch from among the plurality of image patches is performed.

상기 입력 프레임을 생성할 때 입력 프레임을 생성하는 추출 프레임률(Frame Per Second = FPS)은 사용자나 응용이 요구하는 정확도에 따라 달리 정할 수 있는데, 일반적으로 10 FPS 정도의 추출 프레임률을 사용함이 적당하다.When generating the input frame, the extraction frame rate (Frame Per Second = FPS) for generating the input frame can be determined differently depending on the accuracy required by the user or application. In general, it is appropriate to use an extraction frame rate of about 10 FPS. do.

그리고 상기 입력 프레임을 생성하는 과정에서 상기 입력 프레임으로부터 모션 지도를 생성하는 동작을 추가할 수 있다. 그리고 상기 모션 지도로부터 얻은 정보를 이후 이미지 패치를 생성하는 데 이용할 수 있다. 이 때 모션 지도는 연속적인 프레임의 차이(Frame Subtraction), 광학 흐름(Optical Flow), 배경 제거(Background Subtraction) 등의 기법을 통해 생성할 수 있다. In the process of generating the input frame, an operation of generating a motion map from the input frame may be added. The information obtained from the motion map may then be used to generate an image patch. In this case, the motion map may be generated through techniques such as frame subtraction, optical flow, and background subtraction.

객체 중심 이미지 생성부(120)는 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성한다. 이때 상기 복수개의 이미지 패치들을 생성하는 동작은 상기 입력 프레임으로부터 관심객체를 추출하고, 상기 입력 프레임으로부터 상기 관심객체의 이미지가 포함된 복수개의 이미지 패치를 생성하는 동작을 포함할 수 있다. The object-oriented image generator 120 generates a plurality of image patches including the image of the object of interest from the input frame. In this case, the operation of generating the plurality of image patches may include extracting the object of interest from the input frame and generating a plurality of image patches including the image of the object of interest from the input frame.

우선 상기 관심객체를 추출하는 동작은 사전 정의 된 객체집합에 대한 이미지 비교 탐색 알고리즘이나 SSD(Single-Shot Detector), YOLO(You Only Look Once)등과 같은 객체 탐지용 딥러닝망을 이용하여 수행할 수 있다. 이렇게 하여 관심객체의 위치와 종류를 검출한다.First, the operation of extracting the object of interest can be performed using an image comparison search algorithm for a predefined object set or a deep learning network for object detection such as SSD (Single-Shot Detector) or YOLO (You Only Look Once). have. In this way, the position and type of the object of interest are detected.

검출된 관심객체들은 각 관심객체마다 관심객체의 위치를 중심으로 일정한 크기의 바운딩 박스내의 이미지를 추출하여 이미지 패치를 생성할 수 있다. 이미지 패치를 생성할 때 가로와 세로의 비율이 1:1로 같은 바운딩 박스뿐만 아니라 가로 세로의 비율이 2:1, 2:2, 1.5: 0.8 등으로 다른 바운딩 박스를 사용할 수도 있다. The detected objects of interest may generate an image patch by extracting an image in a bounding box of a predetermined size based on the location of the object of interest for each object of interest. When creating an image patch, you can use not only bounding boxes with the same aspect ratio of 1:1, but also other bounding boxes with aspect ratios of 2:1, 2:2, 1.5:0.8, etc.

또한 이미지 패치를 생성하는 과정에서 앞서 생성된 모션 지도의 정보를 이용할 수 있다. 상기 모션 지도에서 해당 입력 프레임 내의 움직임이 큰 지역을 관심 영역(Region of Interest)으로 정한다. 관심객체의 위치와 상기 관심 영역의 거리 차를 오프셋으로 하여 상기 바운딩 박스의 위치를 상하좌우로 움직이거나 상기 관심객체의 위치에서 관심영역 방향으로 이동할 수도 있다. In addition, in the process of generating the image patch, information on the previously generated motion map may be used. In the motion map, a region having a large motion within a corresponding input frame is determined as a region of interest. The position of the bounding box may be moved up, down, left and right, or from the position of the object of interest in the direction of the region of interest by using the distance difference between the position of the object of interest and the region of interest as an offset.

각 관심객체마다 복수개의 이미지 패치들이 생성되면, 상기 복수개의 이미지 패치들 중 상기 관심 객체의 이미지가 가장 온전히 포함된 이미지 패치를 객체 중심 이미지 패치로 선택하고, 상기 객체 중심 이미지 패치는 적어도 하나 이상 선택할 수 있다. When a plurality of image patches are generated for each object of interest, an image patch including the most complete image of the object of interest from among the plurality of image patches is selected as an object-oriented image patch, and at least one of the object-oriented image patches is selected. can

상기 객체 중심 이미지 패치를 선택하는 동작은 관심객체의 집합을 사전 정의하여 이미지 패치와 대조하는 알고리즘이나 이미지 분류를 위해 사전학습된 딥러닝망(Image Classifier)을 이용할 수 있는데, 실제 이미지 패치 내에 해당 관심객체가 있는지, 분류가 정확한지 등을 확인할 수 있다. The operation of selecting the object-oriented image patch may use an algorithm that pre-defines a set of objects of interest and contrasts it with an image patch, or a pre-trained deep learning network (Image Classifier) for image classification. It is possible to check whether an object exists and whether the classification is correct.

예를 들어 딥러닝망을 사용하여 객체 중심 이미지 패치를 선택하는 경우, 입력 이미지가 주어질 때 결과물로써 특정 클래스에 현재 이미지가 속할 확률이 나온다. 이 확률로부터 분류 신뢰도(Classification Confidence)가 높은, 즉 실제 검출 클래스에 속할 확률이 높은 이미지 패치들을 객체 중심 이미지 패치로 선별할 수 있다. For example, when an object-oriented image patch is selected using a deep learning network, when an input image is given, the probability that the current image belongs to a specific class comes out as a result. From this probability, image patches having high classification confidence, that is, having a high probability of belonging to an actual detection class, may be selected as object-oriented image patches.

그리고 이렇게 선별된 객체 중심 이미지 패치들은 이상 검출을 위한 모델의 학습과 이상 검출에 이용할 수 있다. In addition, the object-oriented image patches selected in this way can be used to train a model for anomaly detection and to detect anomaly.

모델 학습부(130)는 상기 객체 중심 이미지 패치 중 정상 상황을 나타내는 정상 이미지 패치를 적어도 하나 이상 선택한다. 또한 상기 정상 이미지 패치를 이용하여 생성형 신경망을 학습하고, 학습된 생성형 신경망 모델을 저장하는 동작을 수행할 수 있다. The model learning unit 130 selects at least one normal image patch representing a normal situation from among the object-oriented image patches. In addition, an operation of learning a generative neural network using the normal image patch and storing the learned generative neural network model may be performed.

이때 상기 생성형 신경망은 VAE(Variational Auto Encoder), GAN(Generative Adversarial Net), AAE(Adversarial Auto Encoder)등의 구조에 컨볼루션(Convolution) 레이어를 사용해 구성될 수 있다. 상기 생성형 신경망은 주어진 객체 중심 이미지 패치를 인코딩(Encoding)해 잠재 변수(Latent Code)를 추출하고 이로부터 다시 원래 이미지를 디코딩(Decoding)하는 학습을 통해 정상 상황만을 학습데이터로 받아 정상의 정의와 기준을 설정하고 이렇게 정상 상황만을 학습한 생성형 신경망 모델을 저장한다. In this case, the generative neural network may be configured by using a convolution layer in a structure such as a variational auto encoder (VAE), a generative adversarial net (GAN), or an adversarial auto encoder (AAE). The generative neural network encodes a given object-oriented image patch, extracts a latent code, and decodes the original image from it again. A generative neural network model that sets a standard and learns only normal situations in this way is saved.

이상점수 산출부(140)는 상기 객체 중심 이미지 패치를 상기 학습된 생성형 신경망 모델을 이용하여 재생성 이미지를 생성하고, 상기 객체 중심 이미지 패치를 상기 재생성 이미지와 비교하여 차이값을 계산하며, 상기 복수개의 객체 중심 이미지 패치에 대하여 상기 차이값들을 수집하는 동작을 수행하고, 수집한 차이값들을 종합하여 개별 이상 점수를 계산할 수 있다. The outlier score calculator 140 generates a regenerated image using the object-oriented image patch using the learned generative neural network model, compares the object-oriented image patch with the regenerated image, and calculates a difference value, An operation of collecting the difference values for the object-oriented image patches of dogs may be performed, and an individual abnormality score may be calculated by synthesizing the collected difference values.

그리고 복수의 관심객체에 대하여 상기 개별 이상 점수들을 수집하는 동작을 더 수행하여 상기 개별 이상 점수들 중 가장 큰 점수를 상기 입력 프레임에 대한 이상 점수로 정할 수 있다. In addition, the operation of collecting the individual anomaly scores for a plurality of objects of interest may be further performed, and the highest score among the individual abnormal scores may be determined as the abnormal score for the input frame.

학습된 생성형 신경망 모델에 정상 상황을 나타내는 입력 이미지가 입력된 경우 인코딩과 디코딩이 원할하게 이루어져 원본 입력 이미지와 유사한 재생성 이미지가 만들어진다. 반면 기존에 학습되지 않은 이상 상황을 나타내는 입력 이미지가 입력된 경우 원본 입력 이미지와 차이가 큰 재생성 이미지가 만들어진다. 따라서 상기 생성형 신경망 모델의 입력인 원본 이미지와 상기 생성형 신경망 모델의 출력인 재생성 이미지의 차이값을 계산하면, 상기 차이값은 입력 이미지가 모델이 학습한 정상 상황 이미지(또는 정상 기준)와 유사할수록 낮아지고, 상기 정상 상황 이미지와 다를수록 높아진다.When an input image representing a normal situation is input to the learned generative neural network model, encoding and decoding are performed smoothly to create a regenerated image similar to the original input image. On the other hand, when an input image representing an abnormal situation that has not been previously learned is input, a regenerated image with a large difference from the original input image is created. Therefore, when calculating the difference value between the original image that is the input of the generative neural network model and the regenerated image that is the output of the generative neural network model, the difference value is similar to the normal situation image (or normal reference) in which the input image is learned by the model It becomes lower as it increases, and becomes higher as it differs from the normal situation image.

이 때 하나의 입력 프레임에는 여러 개의 관심객체가 존재할 수 있고, 한 관심 객체에 대하여 복수의 이미지 패치가 추출될 수 있다. 따라서 우선 각 관심객체의 복수의 이미지 패치에 대하여 상기 차이값들을 수집, 종합하여 해당 객체의 이상 점수인 개별 이상 점수를 계산할 수 있다. 이 때 개별 이상 점수는 각 차이값들의 평균, 차이값들 중 가장 높은 값, 또는 분류 신뢰도가 가장 높았던 객체 중심 이미지 패치의 값 등으로 결정할 수 있다. In this case, a plurality of objects of interest may exist in one input frame, and a plurality of image patches may be extracted for one object of interest. Therefore, it is possible to first collect and synthesize the difference values for a plurality of image patches of each object of interest to calculate an individual anomaly score, which is an anomaly score of the corresponding object. In this case, the individual anomaly score may be determined as the average of each difference value, the highest value among the difference values, or the value of the object-oriented image patch having the highest classification reliability.

이렇게 해서 모든 객체에 대하여 개별 이상 점수를 계산하게 되면, 해당 입력 프레임의 이상 점수는 해당 프레임에 속한 개별 이상 점수 중 가장 높은 점수로 결정할 수 있다. When individual anomaly scores are calculated for all objects in this way, the anomaly score of the corresponding input frame may be determined as the highest score among the individual abnormal scores belonging to the corresponding frame.

도 2는 도 1에 도시된 객체 중심 이미지 생성부(120)의 일 예를 나타낸 블록도이다. FIG. 2 is a block diagram illustrating an example of the object-oriented image generator 120 shown in FIG. 1 .

객체 중심 이미지 생성부(120)는 프레임 전처리부(210), 객체 탐지부(220), 객체 검증부(230)를 포함할 수 있다.The object-oriented image generator 120 may include a frame preprocessor 210 , an object detector 220 , and an object verifier 230 .

프레임 전처리부(210)는 영상 촬영부(110)에서 수집한 영상으로부터 모니터링 영상의 영상 프레임을 추출하여 입력 프레임을 생성한다. 객체 탐지부(220)는 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성한다. 객체 검증부(230)는 상기 복수개의 이미지 패치들 중 적어도 하나 이상의 객체 중심 이미지 패치를 선택하는 동작을 수행한다. The frame preprocessor 210 generates an input frame by extracting an image frame of a monitoring image from the image collected by the image capturing unit 110 . The object detector 220 generates a plurality of image patches including the image of the object of interest from the input frame. The object verifier 230 selects at least one object-oriented image patch from among the plurality of image patches.

프레임 전처리부(210)가 입력 프레임을 생성하는 추출 프레임률은 사용자나 응용이 요구하는 정확도에 따라 달리 정할 수 있다. 프레임 전처리부(210)는 상기 입력 프레임으로부터 모션 지도를 생성하는 동작을 수행할 수 있다. 그리고 상기 모션 지도로부터 얻은 정보를 이후 이미지 패치를 생성하는 데 이용할 수 있다. 이 때 모션 지도는 연속적인 프레임의 차이, 광학 흐름, 배경 제거 등의 기법을 통해 생성할 수 있다. The extracted frame rate at which the frame preprocessor 210 generates the input frame may be differently determined according to the accuracy required by the user or application. The frame preprocessor 210 may generate a motion map from the input frame. The information obtained from the motion map may then be used to generate an image patch. At this time, the motion map can be generated through techniques such as successive frame difference, optical flow, and background removal.

객체 탐지부(220)는 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성한다. 이때 상기 복수개의 이미지 패치들을 생성하는 동작은 상기 입력 프레임으로부터 관심객체를 추출하고, 상기 입력 프레임으로부터 상기 관심객체의 이미지가 포함된 복수개의 이미지 패치를 생성하는 동작을 포함할 수 있다. The object detector 220 generates a plurality of image patches including the image of the object of interest from the input frame. In this case, the operation of generating the plurality of image patches may include extracting the object of interest from the input frame and generating a plurality of image patches including the image of the object of interest from the input frame.

우선 상기 관심객체를 추출하는 동작은 사전 정의된 객체집합에 대한 이미지 비교 탐색 알고리즘이나 SSD, YOLO 등과 같은 객체 탐지용 딥러닝망을 이용하고, 관심객체의 위치와 종류를 검출하게 된다.First, the operation of extracting the object of interest uses an image comparison search algorithm for a predefined object set or a deep learning network for object detection such as SSD or YOLO, and detects the location and type of the object of interest.

이렇게 검출된 관심객체들은 각 관심객체마다 관심객체의 위치를 중심으로 일정한 크기의 바운딩 박스내의 이미지를 추출하여 이미지 패치를 생성할 수 있다. 이미지 패치를 생성할 때 가로와 세로의 비율을 달리한 여러 개의 바운딩 박스들을 사용할 수도 있다. The detected objects of interest may generate an image patch by extracting an image in a bounding box of a predetermined size based on the location of the object of interest for each object of interest. When creating an image patch, you can also use multiple bounding boxes with different horizontal and vertical ratios.

또한 객체 탐지부(220)가 이미지 패치를 생성하는 과정에서 프레임 전처리부(210)에서 생성된 모션 지도의 정보를 이용할 수 있다. 상기 모션 지도에서 해당 입력 프레임 내의 움직임이 큰 지역을 관심 영역으로 정하고, 관심객체의 위치와 상기 관심 영역의 거리 차를 오프셋으로 하여 상기 바운딩 박스의 위치를 움직이거나 상기 관심객체의 위치에서 관심영역 방향으로 이동할 수도 있다. In addition, the object detector 220 may use information on the motion map generated by the frame preprocessor 210 while generating the image patch. In the motion map, a region with a large movement within the corresponding input frame is determined as a region of interest, and the position of the bounding box is moved or the region of interest is moved from the position of the object of interest by using the distance difference between the position of the object of interest and the region of interest as an offset. can also move to

객체 검증부(230)는 객체 탐지부(220)에서 생성한 상기 복수개의 이미지 패치들 중 상기 관심 객체의 이미지가 가장 온전히 포함된 이미지 패치를 객체 중심 이미지 패치로 선택하고, 상기 객체 중심 이미지 패치는 적어도 하나 이상 선택할 수 있다. The object verification unit 230 selects an image patch including the most complete image of the object of interest from among the plurality of image patches generated by the object detection unit 220 as an object-oriented image patch, and the object-centered image patch is At least one can be selected.

상기 객체 중심 이미지 패치를 선택하는 동작은 관심객체의 집합을 사전 정의하여 이미지 패치와 대조하는 알고리즘이나 이미지 분류를 위해 사전학습된 딥러닝망을 이용할 수 있다. 이때 실제 이미지 패치 내에 해당 관심객체가 있는지, 분류가 정확한지 등을 확인할 수 있다. 예를 들어 딥러닝망을 사용하여 객체 중심 이미지 패치를 선택하는 경우, 입력 이미지가 주어질 때 결과물로써 특정 클래스에 현재 이미지가 속할 확률이 나오고, 이 확률로부터 분류 신뢰도가 높은 이미지 패치들을 객체 중심 이미지 패치로 선별할 수 있다. The operation of selecting the object-oriented image patch may use a pre-trained deep learning network for image classification or an algorithm that pre-defines a set of objects of interest and contrasts the image patch with the image patch. At this time, it is possible to check whether the corresponding object of interest exists in the actual image patch and whether the classification is correct. For example, when object-oriented image patch is selected using a deep learning network, when an input image is given, the probability that the current image belongs to a specific class is obtained as a result, and from this probability, image patches with high classification reliability are selected as object-oriented image patches. can be selected as

그리고 이상과 같이 선별된 객체 중심 이미지 패치들은 이상 검출을 위한 모델의 학습과 이상 검출에 이용할 수 있다. In addition, the object-oriented image patches selected as described above can be used to train a model for anomaly detection and to detect anomaly.

도 3은 도 1에 도시된 객체 중심 이미지 생성부(120)의 활용예를 나타낸 도면이다. FIG. 3 is a diagram illustrating an example of application of the object-oriented image generator 120 shown in FIG. 1 .

도 3를 참조하면, 영상 촬영부에서 수집된 영상(310)이 객체 중심 이미지 생성부의 입력으로 들어간다. 상기 수집된 영상(310)에서 각 사각형은 영상의 프레임을 나타내고, 그 중 검정색으로 칠해진 사각형이 이후 프레임 전처리부에서 추출될 프레임에 해당한다. Referring to FIG. 3 , an image 310 collected by the image capturing unit is input to the object-oriented image generating unit. In the collected image 310, each rectangle represents a frame of an image, and a rectangle painted in black corresponds to a frame to be extracted by the frame preprocessor thereafter.

이후, 프레임 전처리부는 상기 수집된 영상으로부터 입력 프레임(320)을 추출하고, 상기 입력 프레임(320)으로부터 모션 지도(330)을 생성한다. 즉, 프레임 전처리부는 프레임 단위로 전처리를 수행하는데, 입력 프레임에 대해 쌍을 이루는 모션 지도를 계산하여 생성한다. Thereafter, the frame preprocessor extracts an input frame 320 from the collected image and generates a motion map 330 from the input frame 320 . That is, the frame preprocessor performs preprocessing on a frame-by-frame basis, and calculates and generates paired motion maps with respect to the input frame.

프레임 전처리부가 상기 입력 프레임을 생성할 때 입력 프레임을 생성하는 추출 프레임률은 사용자나 응용이 요구하는 정확도에 따라 달리 정할 수 있는 것은 앞서 도 1과 도 2에서 설명한 바와 같고, 도 3에서는 6개의 프레임 중 1개의 프레임을 선택하고 있는 예를 보여준다.When the frame preprocessor generates the input frame, the extracted frame rate for generating the input frame can be set differently according to the accuracy required by the user or application as described above in FIGS. 1 and 2, and in FIG. 3, 6 frames It shows an example of selecting one frame from among them.

그리고 프레임 전처리부가 상기 입력 프레임으로부터 모션 지도를 생성하는 동작을 수행할 수 있다. 그리고 상기 모션 지도로부터 얻은 정보를 이후 이미지 패치를 생성하는 데 이용할 수 있다. 이 때 모션 지도는 연속적인 프레임의 차이(Frame Subtraction), 광학 흐름(Optical Flow), 배경 제거(Background Subtraction)등의 기법을 통해 생성할 수 있는데, 도 3에서는 광학 흐름(Optical Flow)을 이용하여 모션 지도를 계산한 예를 보여주고 있다. 이렇게 생성된 모션 지도를 기반으로 프레임 내의 움직임이 가장 많은 부분을 관심 영역(Region of Interest)로 지정한다. 이렇게 쌍을 이루는 입력 프레임과 모션 지도는 객체 탐지부로 같이 전달될 수 있다. In addition, the frame preprocessor may perform an operation of generating a motion map from the input frame. The information obtained from the motion map may then be used to generate an image patch. At this time, the motion map can be generated through techniques such as frame subtraction, optical flow, and background subtraction. An example of calculating a motion map is shown. Based on the motion map generated in this way, the portion with the most motion in the frame is designated as a region of interest. The paired input frame and motion map may be transmitted together to the object detector.

객체 탐지부는 프레임 전처리부가 생성한 모션 지도에 기반하여 입력 프레임내의 관심객체의 종류와 위치를 추출하고(340), 상기 입력 프레임으로부터 상기 관심객체의 이미지가 포함된 복수개의 이미지 패치들(350)을 생성한다. 도 3에서 찾아낸 객체의 종류는 사람이고, 상기 객체의 위치는 관심객체가 추출된 입력 프레임(340)의 회색으로 칠해진 바운딩 박스로 탐지할 수 있다. 그리고 상기 바운딩 박스뿐만 아니라 관심 영역의 크기, 위치를 고려해 복수의 새로운 바운딩 박스를 설정하여 복수의 이미지 패치를 추출할 수 있다. 즉, 객체의 위치를 나타내는 바운딩 박스뿐만 아니라 상기 바운딩 박스와 가로 세로 비율과 크기를 달리하고, 관심 영역과 객체 위치의 차를 오프셋으로 하여 위치를 이동한 여러 개의 바운딩 박스들을 얻는다. 그리고 상기 바운딩 박스들 내의 이미지들을 추출하여 복수개의 이미지 패치들을 생성할 수 있다.The object detection unit extracts the type and location of the object of interest in the input frame based on the motion map generated by the frame preprocessor (340), and extracts a plurality of image patches 350 including the image of the object of interest from the input frame. create The type of object found in FIG. 3 is a person, and the location of the object can be detected by a bounding box painted in gray of the input frame 340 from which the object of interest is extracted. In addition, a plurality of image patches may be extracted by setting a plurality of new bounding boxes in consideration of the size and location of the region of interest as well as the bounding box. That is, in addition to the bounding box indicating the position of the object, a plurality of moving bounding boxes are obtained by changing the aspect ratio and size of the bounding box and using the difference between the ROI and the object position as an offset. In addition, images in the bounding boxes may be extracted to generate a plurality of image patches.

객체 탐지부는 앞서 설명한 바와 같이 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성한다. 이때 상기 복수개의 이미지 패치들을 생성하는 동작은 상기 입력 프레임으로부터 관심객체를 추출하고, 상기 입력 프레임으로부터 상기 관심객체의 이미지가 포함된 복수개의 이미지 패치를 생성하는 동작을 포함할 수 있다. As described above, the object detector generates a plurality of image patches including the image of the object of interest from the input frame. In this case, the operation of generating the plurality of image patches may include extracting the object of interest from the input frame and generating a plurality of image patches including the image of the object of interest from the input frame.

우선 상기 관심객체를 추출하는 동작은 사전 정의된 객체집합에 대한 이미지 비교 탐색 알고리즘이나 SSD, YOLO 등과 같은 객체 탐지용 딥러닝망을 이용하고, 관심객체의 위치와 종류를 검출하게 된다. 이렇게 검출된 관심객체들은 각 관심객체마다 관심객체의 위치를 중심으로 일정한 크기의 바운딩 박스내의 이미지를 추출하여 이미지 패치를 생성할 수 있다. 이미지 패치를 생성할 때 가로와 세로의 비율을 달리한 여러 개의 바운딩 박스들을 사용할 수도 있다. First, the operation of extracting the object of interest uses an image comparison search algorithm for a predefined object set or a deep learning network for object detection such as SSD or YOLO, and detects the location and type of the object of interest. The detected objects of interest may generate an image patch by extracting an image in a bounding box of a predetermined size based on the location of the object of interest for each object of interest. When creating an image patch, you can also use multiple bounding boxes with different horizontal and vertical ratios.

또한 객체 탐지부가 이미지 패치를 생성하는 과정에서 프레임 전처리부에서 생성된 모션 지도의 정보를 이용하여, 상기 모션 지도에서 해당 입력 프레임 내의 움직임이 큰 지역을 관심 영역으로 정하고, 관심객체의 위치와 상기 관심 영역의 거리 차를 오프셋으로 하여 상기 바운딩 박스의 위치를 움직이거나 상기 관심객체의 위치에서 관심영역 방향으로 이동할 수도 있다. In addition, using the motion map information generated by the frame preprocessor in the process of generating the image patch by the object detection unit, the region of interest in the corresponding input frame is determined as the region of interest in the motion map, the position of the object of interest and the position of the interest The position of the bounding box may be moved by using the distance difference between regions as an offset, or the position of the object of interest may be moved in the direction of the region of interest.

객체 검증부는 상기 복수개의 이미지 패치들 중 상기 관심객체의 이미지가 가장 온전히 포함된 이미지 패치인 객체 중심 이미지 패치(360)를 적어도 하나 이상 선택한다. 즉, 상기 복수개의 이미지 패치들 중 선택된 객체인 사람에 맞는 이미지가 최대한 온전히 포함된, 높은 신뢰도의 이미지 패치를 객체 중심 이미지 패치로 선택한다. 도 3에서는 두 개의 객체 중심 이미지 패치가 선택되는 예를 보여주고 있다.The object verifier selects at least one object-centered image patch 360 that is an image patch including the image of the object of interest most completely from among the plurality of image patches. That is, from among the plurality of image patches, an image patch with high reliability including an image suitable for a selected object, a person, is selected as the object-oriented image patch. 3 shows an example in which two object-oriented image patches are selected.

이와 같이 객체 검증부는 객체 탐지부에서 생성한 상기 복수개의 이미지 패치들 중 상기 관심 객체의 이미지가 가장 온전히 포함된 이미지 패치를 객체 중심 이미지 패치로 선택하고, 상기 객체 중심 이미지 패치는 적어도 하나 이상 선택할 수 있다. As such, the object verification unit selects an image patch including the most complete image of the object of interest from among the plurality of image patches generated by the object detection unit as an object-oriented image patch, and the object-centered image patch can select at least one have.

상기 객체 중심 이미지 패치를 선택하는 동작은 관심객체의 집합을 사전 정의하여 이미지 패치와 대조하는 알고리즘이나 이미지 분류를 위해 사전학습된 딥러닝망(Image Classifier)을 이용할 수 있다. 예를 들어 딥러닝망을 사용하여 객체 중심 이미지 패치를 선택하는 경우, 입력 이미지가 주어질 때 결과물로써 특정 클래스에 현재 이미지가 속할 확률로부터 분류 신뢰도가 높은 이미지 패치들을 객체 중심 이미지 패치로 선별하는 것이다.The operation of selecting the object-oriented image patch may use a pre-trained deep learning network (Image Classifier) for image classification or an algorithm that pre-defines a set of objects of interest and contrasts it with an image patch. For example, when object-oriented image patches are selected using a deep learning network, image patches with high classification reliability are selected as object-oriented image patches from the probability that the current image belongs to a specific class as a result when an input image is given.

이상과 같은 방법으로 객체검증부에서 선택된 객체 중심 이미지 패치는 이후 생성형 신경망 학습과 이상 검출에 이용된다. As described above, the object-oriented image patch selected by the object verification unit is subsequently used for generative neural network learning and anomaly detection.

도 4는 실시예에 따른 객체검증을 이용한 모니터링 영상 분석 방법의 일 예를 나타낸 동작 흐름도이다. 4 is an operation flowchart illustrating an example of a monitoring image analysis method using object verification according to an embodiment.

도 4를 참조하면, 영상 촬영부에서 수집한 영상이 객체 중심 이미지 생성부로 입력된다(S410).Referring to FIG. 4 , the image collected by the image capturing unit is input to the object-oriented image generating unit ( S410 ).

객체 중심 이미지 생성부의 프레임 전처리부는 프레임 전처리 동작을 수행하는데(S420), 영상 촬영부에서 수집한 영상으로부터 모니터링 영상의 영상 프레임을 추출하여 입력 프레임을 생성한다.The frame pre-processing unit of the object-oriented image generating unit performs a frame pre-processing operation ( S420 ), and extracts an image frame of the monitoring image from the image collected by the image capturing unit to generate an input frame.

상기 입력 프레임을 생성하는 추출 프레임률(Frame Per Second = FPS)은 사용자나 응용에 따라 달리 정할 수 있고, 상기 입력 프레임을 생성하는 과정에서 상기 입력 프레임으로부터 모션 지도를 생성하는 동작을 수행할 수도 있다. 그리고 상기 모션 지도로부터 얻은 정보를 이후 이미지 패치를 생성하는 데 이용할 수 있다. 이 때 모션 지도는 연속적인 프레임의 차이, 광학 흐름, 배경 제거 등의 기법을 통해 생성할 수 있다. The extraction frame rate (Frame Per Second = FPS) for generating the input frame may be determined differently according to a user or application, and an operation for generating a motion map from the input frame may be performed in the process of generating the input frame. . The information obtained from the motion map may then be used to generate an image patch. At this time, the motion map can be generated through techniques such as successive frame difference, optical flow, and background removal.

이후, 객체 중심 이미지 생성부의 객체 탐지부는 객체 탐지 동작을 수행한다(S430). 즉, 상기 객체 탐지부는 상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성한다. 이때 상기 복수개의 이미지 패치들을 생성하는 동작은 상기 입력 프레임으로부터 관심객체를 추출하고, 상기 입력 프레임으로부터 상기 관심객체의 이미지가 포함된 복수개의 이미지 패치를 생성하는 동작을 포함할 수 있다. Thereafter, the object detection unit of the object-oriented image generator performs an object detection operation (S430). That is, the object detector generates a plurality of image patches including the image of the object of interest from the input frame. In this case, the operation of generating the plurality of image patches may include extracting the object of interest from the input frame and generating a plurality of image patches including the image of the object of interest from the input frame.

우선 상기 관심객체를 추출하는 동작은 사전 정의 된 객체집합에 대한 이미지 비교 탐색 알고리즘 등과 같은 객체 탐지용 딥러닝망을 이용하여 수행하고, 관심객체의 위치와 종류를 검출한다. 이렇게 검출된 관심객체들은 각 관심객체마다 관심객체의 위치를 중심으로 일정한 크기의 바운딩 박스내의 이미지를 추출하여 이미지 패치를 생성할 수 있다. 상기 바운딩 박스는 가로와 세로의 비율을 달리한 여러 개일 수 있다. First, the operation of extracting the object of interest is performed using a deep learning network for object detection such as an image comparison search algorithm for a predefined object set, and the location and type of the object of interest are detected. The detected objects of interest may generate an image patch by extracting an image in a bounding box of a predetermined size based on the location of the object of interest for each object of interest. The bounding box may be a plurality of different ratios of width and length.

또한 이미지 패치를 생성하는 과정에서 앞서 생성된 모션 지도의 정보를 이용할 수 있다. 상기 모션 지도에서 해당 입력 프레임 내의 움직임이 큰 지역인 관심 영역의 정보를 이용하여 바운딩박스들을 이동하는 것이다. In addition, in the process of generating the image patch, information on the previously generated motion map may be used. In the motion map, bounding boxes are moved by using information on a region of interest, which is a region with a large motion within a corresponding input frame.

이상과 같이 객체 탐지가 이루어지고 나면, 객체 중심 이미지 생성부의 객체 검증부는 객체 검증 동작을 수행하는데(S440), 상기 복수개의 이미지 패치들 중 적어도 하나 이상의 객체 중심 이미지 패치를 선택한다. 즉, 각 관심객체마다 생성된 복수개의 이미지 패치들 중 상기 관심 객체의 이미지가 가장 온전히 포함된 이미지 패치를 객체 중심 이미지 패치로 선택할 수 있다. 상기 객체 중심 이미지 패치는 적어도 하나 이상 선택할 수 있다. After the object detection is performed as described above, the object verification unit of the object-oriented image generator performs an object verification operation (S440), and selects at least one object-oriented image patch from among the plurality of image patches. That is, from among a plurality of image patches generated for each object of interest, an image patch including the image of the object of interest most completely may be selected as the object-oriented image patch. At least one object-oriented image patch may be selected.

상기 객체 중심 이미지 패치를 선택하는 동작은 관심객체의 집합을 사전 정의하여 이미지 패치와 대조하는 알고리즘이나 이미지 분류를 위해 사전학습된 딥러닝망(Image Classifier)을 이용할 수 있다. 예를 들어 딥러닝망을 사용하여 객체 중심 이미지 패치를 선택하는 경우, 입력 이미지가 주어질 때 결과물로써 특정 클래스에 현재 이미지가 속할 확률로부터 분류 신뢰도가 높은, 즉 실제 검출 클래스에 속할 확률이 높은 이미지 패치들을 객체 중심 이미지 패치로 선별할 수 있다. 그리고 이렇게 선별된 객체 중심 이미지 패치들은 이상 검출을 위한 모델의 학습과 이상 검출에 이용할 수 있다. The operation of selecting the object-oriented image patch may use a pre-trained deep learning network (Image Classifier) for image classification or an algorithm that pre-defines a set of objects of interest and contrasts it with an image patch. For example, when an object-oriented image patch is selected using a deep learning network, an image patch with high classification reliability, that is, a high probability of belonging to an actual detection class, from the probability that the current image belongs to a specific class as a result when an input image is given can be selected as object-oriented image patches. In addition, the object-oriented image patches selected in this way can be used to train a model for anomaly detection and to detect anomaly.

모델 학습부는 상기 객체 중심 이미지 중 정상 상황 이미지를 이용해 생성형 신경망을 학습하는 정상 모델 학습 동작을 수행한다(S450).The model learning unit performs a normal model learning operation of learning a generative neural network using a normal situation image among the object-oriented images (S450).

모델 학습부는 상기 객체 중심 이미지 패치 중 정상 상황을 나타내는 정상 이미지 패치를 적어도 하나 이상 선택한다. 또한 상기 정상 이미지 패치를 이용하여 생성형 신경망을 학습한다. 그리고 학습된 생성형 신경망 모델을 저장하는 동작을 수행할 수 있다. The model learning unit selects at least one normal image patch representing a normal situation from among the object-oriented image patches. Also, a generative neural network is trained using the normal image patch. In addition, an operation of storing the learned generative neural network model may be performed.

이때 상기 생성형 신경망은 VAE, GAN, AAE 등의 구조에 컨볼루션 레이어를 사용해 구성될 수도 있다. 상기 생성형 신경망은 주어진 객체 중심 이미지 패치를 인코딩해 잠재 변수를 추출하고 이로부터 다시 원래 이미지를 디코딩하는 학습을 통해, 정상 상황만을 학습데이터로 받아 정상의 정의와 기준을 설정하고 이렇게 정상 상황만을 학습한 생성형 신경망 모델을 저장하는 것이다.In this case, the generative neural network may be configured by using a convolutional layer in a structure such as VAE, GAN, and AAE. The generative neural network encodes a given object-oriented image patch, extracts latent variables, and decodes the original image from it again. To store a generative neural network model.

이상점수 산출부는 상기 객체 중심 이미지로부터 이상점수를 산출하는 이상점수 산출 동작을 수행한다(S460).The outlier score calculation unit performs an outlier point calculation operation of calculating an outlier score from the object-centered image (S460).

이상점수 산출부는 상기 객체 중심 이미지 패치를 상기 학습된 생성형 신경망 모델을 이용하여 재생성 이미지를 생성하고, 상기 객체 중심 이미지 패치를 상기 재생성 이미지와 비교하여 차이값을 계산한다. 상기 복수개의 객체 중심 이미지 패치에 대하여 상기 차이값들을 수집하는 동작을 수행한다. 그리고 수집한 차이값들을 종합하여 개별 이상 점수를 계산할 수 있다. The outlier score calculator generates a reconstructed image of the object-oriented image patch using the learned generative neural network model, and calculates a difference value by comparing the object-oriented image patch with the regenerated image. An operation of collecting the difference values for the plurality of object-oriented image patches is performed. In addition, an individual anomaly score can be calculated by synthesizing the collected difference values.

이상점수 산출부는 복수의 관심객체에 대하여 상기 개별 이상 점수들을 수집하는 동작을 더 수행하여 상기 개별 이상 점수들 중 가장 큰 점수를 상기 입력 프레임에 대한 이상 점수로 정할 수 있다. The anomaly score calculator may further perform an operation of collecting the individual anomaly scores with respect to a plurality of objects of interest, and determine the highest score among the individual abnormal scores as the abnormal score for the input frame.

학습된 생성형 신경망 모델에 기존에 학습된 정상 상황을 나타내는 입력 이미지가 입력된 경우 인코딩과 디코딩이 원할하게 이루어져 원본 입력 이미지와 유사한 재생성 이미지가 만들어진다. 반면 이상 상황을 나타내는 입력 이미지가 입력된 경우 원본 입력 이미지와 차이가 큰 재생성 이미지가 만들어진다. 따라서 상기 생성형 신경망 모델의 입력인 원본 이미지와 상기 생성형 신경망 모델의 출력인 재생성 이미지의 차이값을 계산하면, 상기 차이값은 입력 이미지가 모델이 학습한 정상 상황 이미지(또는 정상 기준)와 유사할수록 낮아지고, 상기 정상 상황 이미지와 다를수록 높아진다.When an input image representing a previously learned normal situation is input to the learned generative neural network model, encoding and decoding are performed smoothly to create a regenerated image similar to the original input image. On the other hand, when an input image representing an abnormal situation is input, a regenerated image with a large difference from the original input image is created. Therefore, when calculating the difference value between the original image that is the input of the generative neural network model and the regenerated image that is the output of the generative neural network model, the difference value is similar to the normal situation image (or normal reference) in which the input image is learned by the model It becomes lower as it increases, and becomes higher as it differs from the normal situation image.

모든 객체에 대하여 개별 이상 점수를 계산하게 되면, 해당 입력 프레임의 이상 점수는 해당 프레임에 속한 개별 이상 점수 중 가장 높은 점수로 결정할 수 있다. When individual abnormality scores are calculated for all objects, the abnormality score of the corresponding input frame may be determined as the highest score among the individual abnormality scores included in the corresponding frame.

이상 검출부는 이상점수 산출부에서 생성한 상기 이상 점수를 설정된 임계값과 비교한다(S470). 상기 이상 점수가 임계값보다 작거나 같으면 다시 프레임 전처리 단계(S420)로 돌아가 다음 프레임에 대해 영상 분석을 시작한다. 만약 상기 이상 점수가 임계값보다 크다면 이상 신호를 발생하는 이상 검출 동작을 수행하고(S480), 사용자에게 경고음, 알림 메시지 등의 방법을 통해 이상 상황의 발생 사실을 알릴 수도 있다.The abnormality detection unit compares the abnormality score generated by the abnormality score calculation unit with a set threshold value (S470). If the abnormality score is less than or equal to the threshold value, the process returns to the frame pre-processing step S420 again to start image analysis for the next frame. If the abnormality score is greater than the threshold, an abnormality detection operation for generating an abnormal signal may be performed (S480), and the occurrence of an abnormality may be notified to the user through a method such as a warning sound or a notification message.

도 5는 실시예에 따른 컴퓨터 시스템 구성을 나타낸 도면이다. 5 is a diagram showing the configuration of a computer system according to an embodiment.

실시예에 따른 객체검증을 이용한 모니터링 영상 분석 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(500)에서 구현될 수 있다.The monitoring image analysis apparatus using object verification according to the embodiment may be implemented in the computer system 500 such as a computer-readable recording medium.

컴퓨터 시스템(500)은 버스(520)를 통하여 서로 통신하는 하나 이상의 프로세서(510), 메모리(530), 사용자 인터페이스 입력 장치(540), 사용자 인터페이스 출력 장치(550) 및 스토리지(560)를 포함할 수 있다. 또한, 컴퓨터 시스템(500)은 네트워크(580)에 연결되는 네트워크 인터페이스(570)를 더 포함할 수 있다. 프로세서(510)는 중앙 처리 장치 또는 메모리(530)나 스토리지(560)에 저장된 프로그램 또는 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(530) 및 스토리지(560)는 휘발성 매체, 비휘발성 매체, 분리형 매체, 비분리형 매체, 통신 매체, 또는 정보 전달 매체 중에서 적어도 하나 이상을 포함하는 저장 매체일 수 있다. 예를 들어, 메모리(530)는 ROM(531)이나 RAM(532)을 포함할 수 있다.Computer system 500 may include one or more processors 510 , memory 530 , user interface input device 540 , user interface output device 550 and storage 560 that communicate with each other via bus 520 . can In addition, computer system 500 may further include a network interface 570 coupled to network 580 . The processor 510 may be a central processing unit or a semiconductor device that executes programs or processing instructions stored in the memory 530 or storage 560 . The memory 530 and the storage 560 may be a storage medium including at least one of a volatile medium, a non-volatile medium, a removable medium, a non-removable medium, a communication medium, and an information delivery medium. For example, the memory 530 may include a ROM 531 or a RAM 532 .

이상에서 설명된 실시예에 따르면, 객체검증을 이용한 모니터링 영상 분석 방법 및 그 장치를 제공함으로써 촬영된 영상에서 객체 중심의 이미지 패치를 추출하고 객체검증을 통해 선별된 데이터를 이용하여 오경보율을 낮추고 이상 검출의 정확도를 높인 이상 감지 시스템을 제공할 수 있다. According to the embodiment described above, by providing a monitoring image analysis method and apparatus using object verification, an object-centered image patch is extracted from a photographed image, and a false alarm rate is lowered by using data selected through object verification and abnormality It is possible to provide an anomaly detection system with increased detection accuracy.

이상에서 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can practice the present invention in other specific forms without changing its technical spirit or essential features. You will understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

110: 영상 촬영부
120: 객체 중심 이미지 생성부
130: 모델 학습부
140: 이상점수 산출부
150: 이상 검출부110: video recording unit
120: object-oriented image generator
130: model learning unit
140: outlier score calculation unit
150: abnormality detection unit

Claims

모니터링 영상의 영상 프레임을 추출하여 입력 프레임을 생성하는 단계;
상기 입력 프레임으로부터 관심객체의 이미지가 포함된 복수개의 이미지 패치들을 생성하는 단계;
상기 복수개의 이미지 패치들 중 적어도 하나 이상의 객체 중심 이미지 패치를 선택하는 단계;
상기 객체 중심 이미지 패치를 기반으로 선택된 정상 상황 이미지 패치를 이용하여 생성형 신경망을 학습하는 단계;
상기 객체 중심 이미지 패치를 상기 생성형 신경망에 입력하여 생성된 재생성 이미지와 상기 객체 중심 이미지 패치를 비교하여 이상 점수를 산출하는 단계; 및
상기 이상 점수와 임계값을 비교하여 이상신호를 생성하는 단계를 포함하는, 객체검증을 이용한 모니터링 영상 분석 방법.generating an input frame by extracting an image frame of a monitoring image;
generating a plurality of image patches including an image of the object of interest from the input frame;
selecting at least one object-oriented image patch from among the plurality of image patches;
learning a generative neural network using a normal situation image patch selected based on the object-oriented image patch;
calculating an abnormality score by comparing the object-oriented image patch with the regenerated image generated by inputting the object-oriented image patch into the generative neural network; and
Comprising the step of generating an abnormal signal by comparing the abnormal score and a threshold value, monitoring image analysis method using object verification.