KR102309111B1

KR102309111B1 - Ststem and method for detecting abnomalous behavior based deep learning

Info

Publication number: KR102309111B1
Application number: KR1020200162900A
Authority: KR
Inventors: 조영임; 오강환; 만교월
Original assignee: 가천대학교 산학협력단
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-10-06
Also published as: KR102309111B9; WO2022114731A1

Abstract

An objective of the present invention is to provide a deep learning-based abnormal behavior detection system and method by detecting and recognizing an abnormal behavior, which can detect and recognize a person who truly acts an abnormal behavior and a person who falsely acts an abnormal behavior among people who act abnormally when analyzing behaviors of people and detecting a normal behavior and an abnormal behavior from the behaviors of people. In order to achieve the above objective, according to the present invention, the deep learning-based abnormal behavior detection system for detecting and recognizing an abnormal behavior includes a first step of performing a convolution calculation on image data provided by a photographing unit and analyzing the behavior of pedestrians.

Description

딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템 및 탐지 방법{STSTEM AND METHOD FOR DETECTING ABNOMALOUS BEHAVIOR BASED DEEP LEARNING}Abnormal behavior detection system and detection method for detecting and recognizing abnormal behavior based on deep learning

본 발명은 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템 및 탐지 방법에 관한 것으로, 더욱 상세하게는 동작하는 인체의 비정상적인 행동을 탐지하여 인식하는 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템 및 탐지 방법에 관한 것이다.The present invention relates to an abnormal behavior detection system and detection method for detecting and recognizing deep learning-based abnormal behavior, and more particularly, to detecting and recognizing abnormal behavior based on deep learning that detects and recognizes abnormal behavior of a moving human body. It relates to a behavior detection system and detection method.

대규모 CCTV 시스템의 경우, 수십 개에서 수백 개에 이르는 영상들을 관리 요원들이 모니터링하여야 하므로 대단히 많은 인원이 소요되면서도 감시 요원의 집중력 저하나 피로, 임의적인 판단에 따라 중요한 상황을 놓치는 경우가 종종 있을 뿐 아니라, 엄청난 시간의 영상을 저장하는 문제에서도 곤란한 점이 많다.In the case of a large-scale CCTV system, management personnel have to monitor dozens to hundreds of images, so it takes a lot of people, but often misses important situations due to reduced concentration, fatigue, or arbitrary judgment of monitoring personnel. , there are many difficulties in the problem of storing a huge amount of time.

따라서, 공공 장소를 감시하기 위해 CCTV 카메라로 공공 장소를 촬영하고 자동으로 영상을 분석하여 불특정 다수의 인간들을 추출하고 행동을 분석하여 비정상적인 행동이 감지될 경우 자동으로 관리자에게 경고하거나 그 밖의 연계된 자동화 시스템에 정보를 전달하는 지능형 영상 감시 시스템에 대한 요구가 점점 커지고 있다.Therefore, in order to monitor public places, public places are filmed with CCTV cameras, and images are automatically analyzed to extract a large number of unspecified people, and by analyzing behaviors, when abnormal behavior is detected, it automatically warns the manager or other related automation. The demand for an intelligent video surveillance system that delivers information to the system is growing.

종래의 지능형 영상 감시 시스템들은 먼저 영상으로부터 개별적인 인간들을 추출하고 추출된 개별 인간들을 추적하여 궤적을 획득하며, 궤적을 분석하여 개별 인간의 행동을 추정한다.Conventional intelligent video surveillance systems first extract individual humans from an image, track the extracted individual humans to obtain a trajectory, and analyze the trajectory to estimate individual human behavior.

이어서, 인간의 추정된 행동이 정상적인지 아닌지 여부가 분석된다.Then, it is analyzed whether the estimated behavior of the human is normal or not.

이러한 종래의 시스템들은 인간의 추출, 추적, 궤적 분석 및 행동 추정 단계를 거치면서 대단히 많은 연산이 필요하고 오류도 적지 않다.These conventional systems require a lot of calculations and errors while going through the steps of human extraction, tracking, trajectory analysis, and behavior estimation.

예를 들어, 광장이나 공원, 기차역, 운동장 등 군중들이 밀집하는 곳을 감시하는 경우에, 종래의 지능형 영상 감시 시스템들로써 오브젝트들을 추출하고 분석하는 것은 용이하지 않다.For example, when monitoring a place where crowds are concentrated, such as a square, a park, a train station, or a playground, it is not easy to extract and analyze objects using conventional intelligent video surveillance systems.

특히, 인간의 행동으로부터 정상 행동과 비정상 행동을 탐지시, 비정상 행동을 행하는 인간 중 진실로 비정상 행동을 행하는 인간과, 거짓으로 비정상 행동을 행하는 인간들을 탐지하여 인식하기가 용이하지 않은 문제점이 있었다.In particular, when detecting normal behavior and abnormal behavior from human behavior, there is a problem in that it is not easy to detect and recognize genuinely abnormal people and falsely abnormal behaviors among people who perform abnormal behaviors.

국내 등록특허공보 제10-1472674호Domestic Registered Patent Publication No. 10-1472674

상기한 바와 같은 종래의 문제점을 해결하기 위한 본 발명의 목적은 인간의 행동을 분석하여 인간의 행동으로부터 정상 행동과 비정상 행동을 탐지시, 비정상 행동을 행하는 인간 중 진실로 비정상 행동을 행하는 인간과, 거짓으로 비정상 행동을 행하는 인간들을 탐지하여 인식할 수 있는 딥러닝 기반 비정상 행동을 탐지하여 인식할 수 있는 비정상 행동 탐지 시스템 및 탐지 방법을 제공하는 것이다.An object of the present invention to solve the problems of the prior art as described above is to analyze human behavior to detect normal behavior and abnormal behavior from human behavior. To provide an abnormal behavior detection system and detection method that can detect and recognize abnormal behavior based on deep learning that can detect and recognize humans performing abnormal behavior with

상기 목적을 달성하기 위해, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템은, 촬영부에 의해 제공된 영상 데이터에 대한 컨볼루션 연산(Convolution Calculation)을 수행하여 보행자의 행동을 분석하는 제 1 단계;를 포함하는 것을 특징으로 한다.In order to achieve the above object, the abnormal behavior detection system for detecting and recognizing abnormal behavior based on deep learning according to the present invention performs a convolution calculation on the image data provided by the photographing unit to determine the behavior of pedestrians. A first step of analyzing; characterized in that it includes.

상기 목적을 달성하기 위해, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법은, 촬영부에 의해 제공된 영상 데이터에 대한 컨볼루션 연산(Convolution Calculation)을 수행하여 보행자의 행동을 분석하는 제 1 단계; 및 어텐션 매커니즘 모델(Attention Mechanism Model)을 통해 상기 보행자의 행동을 상세 분석하는 제 2 단계;를 포함하는 것을 특징으로 한다.In order to achieve the above object, the abnormal behavior detection method for detecting and recognizing abnormal behavior based on deep learning according to the present invention performs a convolution calculation on the image data provided by the photographing unit to determine the behavior of pedestrians. a first step of analyzing; and a second step of analyzing the pedestrian's behavior in detail through an Attention Mechanism Model.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 컨볼루션 연산은 덴스 콘볼루션 신경망(Dense Convolutional Network)에 의해 수행되며, 상기 영상 데이터가 224 × 22 이미지 크기의 RGB 3 채널 형식이 특징 블록 처리를 통해 상기 덴스 콘볼루션 신경망에 입력되는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, the convolution operation is performed by a dense convolutional network, and the image data has a 224 × 22 image size. The RGB 3 channel format of is input to the dense convolutional neural network through feature block processing.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 각각의 상기 영상 데이터가 상기 덴스 콘볼루션 신경망에 입력되면, 상기 덴스 콘볼루션 신경망은 입력된 각각의 상기 영상 데이터를 이어맞추고 변환 블록으로 입력되어 학습 처리를 지속하는 것을 특징으로 한다.In addition, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to the present invention, when each of the image data is input to the dense convolutional neural network, the dense convolutional neural network receives each of the input image data It is characterized in that it is input to the transform block and continues the learning process.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 특징 블록과, 상기 변환 블록은 콘볼루션 레이어(Convolution Layer) 및 풀링 레이어(Pooling Layers)인 것을 특징으로 한다.In addition, in the abnormal behavior detection method for detecting and recognizing a deep learning-based abnormal behavior according to the present invention, the feature block and the transform block are a convolution layer and a pooling layer. .

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 어텐션 매커니즘 모델은 채널 어텐션 모델(Channel Attention Model)과, 스페셜 어텐션 모델(Spatial Attention Model)을 포함하며, 상기 어텐션 매커니즘 모델은 어텐션이 관련없는 정보를 무시하고 핵심 정보에 집중하도록 학습을 수행하는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, the attention mechanism model includes a channel attention model and a special attention model (Spatial Attention Model), and the The attention mechanism model is characterized in that the attention conducts learning to ignore irrelevant information and focus on key information.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 채널 어텐션 모델을 통해 상기 영상 데이터에서 보행자의 행동의 정상 여부를 타임 페리어드로 분석하는 제 3 단계;를 포함하는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, a third step of analyzing whether the pedestrian behavior is normal in the image data through the channel attention model as a time period; characterized by including.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 채널 어텐션 모델은 상기 영상에서 어텐션이 요구되는 지점의 사진을 샘플링하고, 샘플링된 사진으로부터 보행자의 행동 이미지를 처리하는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, the channel attention model samples a picture of a point requiring attention in the image, and a behavior image of a pedestrian from the sampled picture characterized by processing.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 채널 어텐션 모델은 특정 채널 사이의 연속성을 갖는 것을 특징으로 한다.In addition, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to the present invention, the channel attention model is characterized in that it has continuity between specific channels.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 최대 풀링 레이어와 평균 풀링 레이어의 조합을 사용하여 입력 특성 맵의 스페셜 차원을 압축하여 채널 어텐션의 효율을 증가시키는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, the special dimension of the input feature map is compressed using a combination of the maximum pooling layer and the average pooling layer to increase the efficiency of channel attention. characterized in that

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 2개의 풀링 레이어 뒤에 다층 퍼셉트론(Perceptron)의 히든층이 추가되어 매개 변수의 오버 헤드를 감소시키는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, a hidden layer of a multi-layer perceptron is added after two pooling layers to reduce parameter overhead. .

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 2개의 디스크립터(Descriptor)는 히든층으로 전송된 채널 어텐션 맵을 생성하는 것을 특징으로 한다.In addition, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to the present invention, two descriptors are characterized in that the channel attention map transmitted to the hidden layer is generated.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 스페셜 어텐션 모델과 관련된 상기 채널 어텐션 모델을 통해 상기 보행자의 행동의 정상 여부에 대해 분석하는 제 4 단계;를 포함하는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, a fourth step of analyzing whether the behavior of the pedestrian is normal through the channel attention model related to the special attention model; includes; characterized in that

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 스페셜 어텐션 모델은 위치에 집중하여 분석하는 것을 특징으로 한다.In addition, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to the present invention, the special attention model is characterized in that the analysis is concentrated on the location.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 의미 분석을 통해 보행자의 행동을 정상 행동과 비정상 행동으로 구분하고, 비정상 행동을 비의도적 정상 행동과 의도적 비정상 행동으로 분석하는 제 5 단계;를 포함하는 것을 특징으로 한다.In addition, in the abnormal behavior detection method for detecting and recognizing abnormal behavior based on deep learning according to the present invention, the behavior of pedestrians is divided into normal behavior and abnormal behavior through semantic analysis, and the abnormal behavior is classified into unintentional normal behavior and intentional abnormal behavior. A fifth step of analyzing as; characterized in that it includes.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 상기 제 1 단계 이전에, 촬영부에 의해 해당 지역 보행자의 영상 데이터를 수집하는 제 1 전처리 단계; 상기 영상 데이터의 해상도를 조정하는 제 2 전처리 단계; 및 상기 영상 데이터를 클라우드 서버로 전송하는 제 3 전처리 단계;를 포함하는 것을 특징으로 한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, before the first step, a first pre-processing step of collecting image data of a pedestrian in the corresponding area by a photographing unit; a second pre-processing step of adjusting the resolution of the image data; and a third pre-processing step of transmitting the image data to a cloud server.

상기 목적을 달성하기 위해, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템은, 입력되는 영상 데이터를 수신하는 수신부; 상기 영상 데이터에 대한 컨볼루션 연산을 수행하여 보행자의 행동을 분석하는 연산부; 상기 보행자의 행동을 상세 분석하는 어텐션 매커니즘 모델부; 및 의미 분석을 통해 보행자의 행동을 정상 행동과 비정상 행동으로 구분하고, 비정상 행동을 비의도적 정상 행동과 의도적 비정상 행동으로 분석하는 의미 분석부;를 포함하는 클라우딩 서버를 포함하는 것을 특징으로 한다.In order to achieve the above object, an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to the present invention comprises: a receiver for receiving input image data; a calculation unit for performing a convolution operation on the image data to analyze a pedestrian's behavior; an attention mechanism model unit for detailed analysis of the pedestrian's behavior; and a semantic analysis unit that classifies the pedestrian's behavior into normal behavior and abnormal behavior through semantic analysis, and analyzes the abnormal behavior into unintentional normal behavior and intentional abnormal behavior; characterized in that it includes a clouding server comprising a.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템은, 해당 지역 보행자의 영상 데이터를 수집하는 카메라부; 상기 영상 데이터의 해상도를 조정하는 해상도 조정부; 및 상기 영상 데이터를 클라우드 서버로 전송하는 통신부;를 포함하는 촬영부를 포함하는 것을 특징으로 한다.In addition, an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to the present invention includes: a camera unit for collecting image data of pedestrians in a corresponding area; a resolution adjusting unit for adjusting the resolution of the image data; and a communication unit that transmits the image data to a cloud server.

기타 실시 예의 구체적인 사항은 "발명을 실시하기 위한 구체적인 내용" 및 첨부 "도면"에 포함되어 있다.Specific details of other embodiments are included in "Details for carrying out the invention" and the accompanying "drawings".

본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 각종 실시 예를 참조하면 명확해질 것이다.Advantages and/or features of the present invention, and methods for achieving them, will become apparent with reference to the various embodiments described below in detail in conjunction with the accompanying drawings.

그러나 본 발명은 이하에서 개시되는 각 실시 예의 구성만으로 한정되는 것이 아니라 서로 다른 다양한 형태로도 구현될 수도 있으며, 단지 본 명세서에서 개시한 각각의 실시 예는 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구범위의 각 청구항의 범주에 의해 정의될 뿐임을 알아야 한다.However, the present invention is not limited to the configuration of each embodiment disclosed below, but may also be implemented in a variety of different forms, and each embodiment disclosed herein only makes the disclosure of the present invention complete, It is provided to fully inform those of ordinary skill in the art to which the scope of the present invention belongs, and it should be understood that the present invention is only defined by the scope of each of the claims.

본 발명에 의하면, 인간의 행동을 분석하여 인간의 행동으로부터 정상 행동과 비정상 행동을 탐지시, 비정상 행동을 행하는 인간 중 진실로 비정상 행동을 행하는 인간과, 거짓으로 비정상 행동을 행하는 인간들을 탐지하여 인식할 수 있는 딥러닝 기반 비정상 행동을 탐지하여 인식할 수 있는 효과가 있다.According to the present invention, when human behavior is analyzed to detect normal behavior and abnormal behavior from human behavior, it is possible to detect and recognize genuinely abnormal humans and falsely abnormal behaviors among humans performing abnormal behaviors. It has the effect of detecting and recognizing abnormal behavior based on deep learning.

도 1은 덴스 블록에서 복합 특징의 연결 구조를 나타내는 블록도.
도 2는 4개의 블록이 있는 덴스 신경망의 전체 아키텍처를 나타내는 블록도.
도 3은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법의 전체 개념을 나타내는 개념도.
도 4는 채널 어텐션 모델 3D 덴스 컨볼루션 신경망에 기반한 전체 아키텍처를 나타내는 블록도.
도 5는 채널 어텐션 모듈을 나타내는 블록도.
도 6은 스페셜 어텐션 모듈을 나타내는 블록도.
도 7은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템의 전처리 단계의 흐름을 나타내는 플로우 차트.
도 8은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법의 전체 흐름을 나타내는 플로우 차트.
도 9는 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템의 전체 구성을 나타내는 블록도.
도 10은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템에서 촬영부의 구성을 나타내는 블록도.
도 11은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템에서 클라우딩 서버의 구성을 나타내는 블록도.1 is a block diagram showing a connection structure of complex features in a dense block.
Fig. 2 is a block diagram showing the overall architecture of a dense neural network with four blocks.
3 is a conceptual diagram illustrating the overall concept of an abnormal behavior detection method for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.
4 is a block diagram showing an overall architecture based on a channel attention model 3D dense convolutional neural network.
5 is a block diagram illustrating a channel attention module;
6 is a block diagram illustrating a special attention module;
7 is a flowchart illustrating a flow of a pre-processing step of an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.
8 is a flowchart illustrating an overall flow of an abnormal behavior detection method for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.
9 is a block diagram showing the overall configuration of an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.
10 is a block diagram illustrating a configuration of a photographing unit in an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention;
11 is a block diagram showing the configuration of a clouding server in an abnormal behavior detection system that detects and recognizes a deep learning-based abnormal behavior according to an embodiment of the present invention.

본 발명을 상세하게 설명하기 전에, 본 명세서에서 사용된 용어나 단어는 통상적이거나 사전적인 의미로 무조건 한정하여 해석되어서는 아니 되며, 본 발명의 발명자가 자신의 발명을 가장 최선의 방법으로 설명하기 위해서 각종 용어의 개념을 적절하게 정의하여 사용할 수 있고, 더 나아가 이들 용어나 단어는 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 함을 알아야 한다.Before describing the present invention in detail, the terms or words used herein should not be construed as being unconditionally limited to their ordinary or dictionary meanings, and in order for the inventor of the present invention to describe his invention in the best way It should be understood that the concepts of various terms can be appropriately defined and used, and further, these terms or words should be interpreted as meanings and concepts consistent with the technical idea of the present invention.

즉, 본 명세서에서 사용된 용어는 본 발명의 바람직한 실시 예를 설명하기 위해서 사용되는 것일 뿐이고, 본 발명의 내용을 구체적으로 한정하려는 의도로 사용된 것이 아니며, 이들 용어는 본 발명의 여러 가지 가능성을 고려하여 정의된 용어임을 알아야 한다.That is, the terms used herein are only used to describe preferred embodiments of the present invention, and are not used for the purpose of specifically limiting the content of the present invention, and these terms represent various possibilities of the present invention. It should be understood that the term has been defined with consideration in mind.

또한, 본 명세서에서, 단수의 표현은 문맥상 명확하게 다른 의미로 지시하지 않는 이상, 복수의 표현을 포함할 수 있으며, 유사하게 복수로 표현되어 있다고 하더라도 단수의 의미를 포함할 수 있음을 알아야 한다.Also, in the present specification, it should be understood that, unless the context clearly indicates otherwise, the expression in the singular may include a plurality of expressions, and even if it is similarly expressed in plural, it should be understood that the meaning of the singular may be included. .

본 명세서의 전체에 걸쳐서 어떤 구성 요소가 다른 구성 요소를 "포함"한다고 기재하는 경우에는, 특별히 반대되는 의미의 기재가 없는 한 임의의 다른 구성 요소를 제외하는 것이 아니라 임의의 다른 구성 요소를 더 포함할 수도 있다는 것을 의미할 수 있다.In the case where it is stated throughout this specification that a component "includes" another component, it does not exclude any other component, but further includes any other component unless otherwise stated. It could mean that you can.

더 나아가서, 어떤 구성 요소가 다른 구성 요소의 "내부에 존재하거나, 연결되어 설치된다"라고 기재한 경우에는, 이 구성 요소가 다른 구성 요소와 직접적으로 연결되어 있거나 접촉하여 설치되어 있을 수 있고, 일정한 거리를 두고 이격되어 설치되어 있을 수도 있으며, 일정한 거리를 두고 이격되어 설치되어 있는 경우에 대해서는 해당 구성 요소를 다른 구성 요소에 고정 내지 연결하기 위한 제 3의 구성 요소 또는 수단이 존재할 수 있으며, 이 제 3의 구성 요소 또는 수단에 대한 설명은 생략될 수도 있음을 알아야 한다.Furthermore, when it is described that a component is "exists in or is connected to" of another component, this component may be directly connected to or installed in contact with another component, and a certain It may be installed spaced apart by a distance, and in the case of being installed spaced apart by a certain distance, a third component or means for fixing or connecting the corresponding component to another component may exist, and now It should be noted that the description of the components or means of 3 may be omitted.

반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결"되어 있다거나, 또는 "직접 접속"되어 있다고 기재되는 경우에는, 제 3의 구성 요소 또는 수단이 존재하지 않는 것으로 이해하여야 한다.On the other hand, when it is described that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the third element or means does not exist.

마찬가지로, 각 구성 요소 간의 관계를 설명하는 다른 표현들, 즉 " ~ 사이에"와 "바로 ~ 사이에", 또는 " ~ 에 이웃하는"과 " ~ 에 직접 이웃하는" 등도 마찬가지의 취지를 가지고 있는 것으로 해석되어야 한다.Similarly, other expressions describing the relationship between components, such as "between" and "immediately between", or "neighboring to" and "directly adjacent to", have the same meaning. should be interpreted as

또한, 본 명세서에서 "일면", "타면", "일측", "타측", "제 1", "제 2" 등의 용어가, 사용된다면, 하나의 구성 요소에 대해서 이 하나의 구성 요소가 다른 구성 요소로부터 명확하게 구별될 수 있도록 하기 위해서 사용되며, 이와 같은 용어에 의해서 해당 구성 요소의 의미가 제한적으로 사용되는 것은 아님을 알아야 한다.In addition, if terms such as "one side", "other side", "one side", "other side", "first", "second" are used in this specification, for one component, this one component is It is used to be clearly distinguished from other components, and it should be understood that the meaning of the component is not limitedly used by such terms.

또한, 본 명세서에서 "상", "하", "좌", "우" 등의 위치와 관련된 용어가, 사용된다면, 해당 구성 요소에 대해서 해당 도면에서의 상대적인 위치를 나타내고 있는 것으로 이해하여야 하며, 이들의 위치에 대해서 절대적인 위치를 특정하지 않는 이상은, 이들 위치 관련 용어가 절대적인 위치를 언급하고 있는 것으로 이해하여서는 아니된다.In addition, if used in this specification, terms related to positions such as "upper", "lower", "left", "right", it should be understood as indicating a relative position in the drawing with respect to the corresponding component, Unless an absolute position is specified with respect to their position, these position-related terms should not be construed as referring to an absolute position.

또한, 본 명세서에서는 각 도면의 각 구성 요소에 대해서 그 도면 부호를 명기함에 있어서, 동일한 구성 요소에 대해서는 이 구성 요소가 비록 다른 도면에 표시되더라도 동일한 도면 부호를 가지고 있도록, 즉 명세서 전체에 걸쳐 동일한 참조 부호는 동일한 구성 요소를 지시하고 있다.In addition, in this specification, in specifying the reference numerals for each component of each drawing, the same component has the same reference number even if the component is shown in different drawings, that is, the same reference is made throughout the specification. Symbols indicate identical components.

본 명세서에 첨부된 도면에서 본 발명을 구성하는 각 구성 요소의 크기, 위치, 결합 관계 등은 본 발명의 사상을 충분히 명확하게 전달할 수 있도록 하기 위해서 또는 설명의 편의를 위해서 일부 과장 또는 축소되거나 생략되어 기술되어 있을 수 있고, 따라서 그 비례나 축척은 엄밀하지 않을 수 있다.In the drawings attached to this specification, the size, position, coupling relationship, etc. of each component constituting the present invention are partially exaggerated, reduced, or omitted for convenience of explanation or in order to sufficiently clearly convey the spirit of the present invention. may be described, and therefore the proportion or scale may not be exact.

또한, 이하에서, 본 발명을 설명함에 있어서, 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 구성, 예를 들어, 종래 기술을 포함하는 공지 기술에 대해 상세한 설명은 생략될 수도 있다.In addition, in the following, in describing the present invention, a detailed description of a configuration determined that may unnecessarily obscure the subject matter of the present invention, for example, a detailed description of a known technology including the prior art may be omitted.

이하, 본 발명의 실시 예에 대해 관련 도면들을 참조하여 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the related drawings.

최근 몇 년 동안 딥 러닝의 발전과 함께 동영상에서 동작 인식이 주목받고 있다.In recent years, with the development of deep learning, motion recognition in video has been attracting attention.

강력한 기능은 감시, 로봇 공학, 의료, 비디오 검색, 가상 현실 및 인간-컴퓨터 상호 작용과 같은 다수의 애플리케이션에서 입증되었다.Its powerful capabilities have been demonstrated in numerous applications such as surveillance, robotics, medical care, video search, virtual reality, and human-computer interaction.

정적 이미지 이해와 달리 동영상 이해에는 동영상에서 발생하는 동적 변화를 반영하기 위해 보다 안정적인 동작 기능이 필요하다.Unlike static image comprehension, moving image comprehension requires a more stable motion function to reflect the dynamic changes that occur in the movie.

행위 인식을 수작업과 딥 러닝 기반 방법의 두 가지 카테고리로 간략하게 나누었습니다. 수작업의 표현 학습 방법에서는, 일반적으로 시공간적 관심 지점을 감지하고, 이러한 지점을 STIP(Space-Time Interest Points), 그래디언트 히스토그램 및 광학 흐름의 히스토그램, 그래디언트의 3D 히스토그램, 최첨단 수공예 방법의 밀도 궤적(iDT)과 같은 로컬 표현으로 설명한다.We have briefly divided behavior recognition into two categories: manual and deep learning-based methods. In the manual expression learning method, in general, spatiotemporal points of interest are detected, and these points are identified as STIP (Space-Time Interest Points), gradient histograms and optical flow histograms, gradient 3D histograms, and density trajectories (iDTs) of state-of-the-art handicraft methods. Described in a local expression such as

덴스 궤적을 따라 풍부한 디스크립터를 풀링하여 움직임 특징을 명시적으로 고려하고 카메라 모션을 보정다.By pooling rich descriptors along the density trajectory, we explicitly take into account motion features and compensate for camera motion.

이후, 인코딩 방법에 의해, 디스크립터가 비디오 수준 표현으로 집계된다.Then, by the encoding method, the descriptor is aggregated into a video-level representation.

도 1은 덴스 블록에서 복합 특징의 연결 구조를 나타내는 블록도이다.1 is a block diagram illustrating a connection structure of complex features in a dense block.

도 1을 참조하면, 최근 몇 년 동안 심층 학습, 특히 CNN 신경망의 출현과 강력하고 우수한 처리 능력의 출현으로, 점점 더 많은 CNN 신경망을 사용하여 비디오에서 행동 인식에 대한 연구가 이루어지고 있다.Referring to Figure 1, in recent years, with the advent of deep learning, especially CNN neural networks and the emergence of powerful and excellent processing power, more and more CNN neural networks are used to study behavioral recognition in video.

예를 들어, 3D 컨볼루션 신경망(C3D)은 3D 컨볼루션 커널을 사용하여 일련의 덴스 RGB 프레임에서 특징을 추출한다.For example, a 3D convolutional neural network (C3D) uses a 3D convolution kernel to extract features from a series of dense RGB frames.

활동 인식을 위한 정보를 추출하기 위해 서로 다른 시간 세그먼트에 대한 TSN(Temporal Segment Networks) 샘플 프레임 및 광학 흐름에서, I3D 신경망은 덴스 RGB 및 광학 흐름 시퀀스 모두에서 팽창된 3D 컨볼루션이 있는 2개의 스트림 CNN을 사용하여 카이네틱(Kinetics) 데이터 세트에서 최첨단 성능을 달성한다.In Temporal Segment Networks (TSN) sample frames and optical flows for different time segments to extract information for activity recognition, the I3D neural network is a two-stream CNN with 3D convolutions dilated in both dense RGB and optical flow sequences. to achieve state-of-the-art performance on kinetic data sets.

그리고 최근에 널리 사용되는 CNN 신경망에서는 행동 인식에 대해 시공간 효과, 광학 흐름 추정, 두 스트림 융합을 결합하여 비디오를 분류한다.And recently widely used CNN neural networks classify video by combining spatiotemporal effects, optical flow estimation, and fusion of two streams for behavioral recognition.

또한 전이 학습을 사용하여 사전 훈련된 네트워크 모델을 추가하면 양식 내에서 또는 양식간에 지식을 전달하는 것이 효과적이며 성능이 크게 향상된다.Additionally, adding a pre-trained network model using transfer learning makes it effective to transfer knowledge within or between modalities and greatly improves performance.

도 2는 4개의 블록이 있는 덴스 신경망의 전체 아키텍처를 나타내는 블록도이다.2 is a block diagram showing the overall architecture of a dense neural network with four blocks.

도 2를 참조하면, 딥 러닝 신경망에서, 신경망 깊이가 깊어짐에 따라 그래디언트 소멸 문제가 점점 더 분명해질 것이다.Referring to Fig. 2, in a deep learning neural network, the problem of gradient extinction will become more and more apparent as the neural network deepens.

본 발명에서는 덴스 컨벌루션 신경망을 기본 신경망으로 사용한다.In the present invention, a dense convolutional neural network is used as a basic neural network.

덴스 신경망은 동일한 크기의 특징 맵을 사용하여 신경망의 모든 레이어를 직접 연결하여 레이어 간의 최대 정보 흐름을 보장한다.Dense neural networks directly connect all layers of a neural network using feature maps of the same size, ensuring maximum information flow between layers.

각 레이어는 입력으로 모든 이전 레이어의 정보를 필요로 하며, 그 후 특징 맵을 전달한다.Each layer needs information from all previous layers as input, and then passes a feature map.

그 구조는 도 1을 참조하여 확인할 수 있다.The structure can be confirmed with reference to FIG. 1 .

기존의 컨볼루션 신경망에서 L 레이어가 있으면 L 연결이 있지만, 덴스 신경망에서는 L(L + 1) / 2 연결이 있다.In conventional convolutional neural networks, if there is an L layer, there are L connections, but in dense neural networks there are L(L + 1) / 2 connections.

각 레이어는 손실 함수과 오리지널 입력 신호로부터 그래디언트에 직접 액세스할 수 있어, 묵시적인 심층 감시로 리드할 수 있다.Each layer has direct access to the loss function and gradients from the original input signal, leading to implicit deep monitoring.

입력 근처 레이어와 출력 근처 레이어 간의 연결이 짧을수록 컨볼루션 신경망은 더 깊고 정확하며 효과적일 수 있다.The shorter the connections between the near-input and near-output layers, the deeper, more accurate and more effective the convolutional neural network can be.

덴스 블록 레이어로 인해 덴스 신경망은 레스 신경망(ResNet)보다 좁은 네트워크와 적은 매개 변수를 가진다.Due to the dense block layer, dense neural networks have narrower networks and fewer parameters than ResNets.

동시에, 이 연결은 특징과 기울기의 전송을 더 효율적으로 만들고 네트워크를 훈련하기 더 용이하다.At the same time, this connection makes the transmission of features and gradients more efficient and easier to train the network.

오리지널 덴스 신경망 프레임 워크의 표준 변환 레이어를 대체하고, 컨볼루션 후 특징 영역에 집중하고, 신경망 훈련 매개 변수를 감소시키고, 더 나은 결과를 획득할 수 있도록 어텐션 모델을 추가한다.It replaces the standard transformation layer of the original Dense neural network framework, focuses on the feature region after convolution, reduces the neural network training parameters, and adds an attention model to achieve better results.

도 3은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법의 전체 개념을 나타내는 개념도이다.3 is a conceptual diagram illustrating the overall concept of an abnormal behavior detection method for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서는, 어텐션 매커니즘 모델(Attention Mechanism Model)과 결합된 덴스 콘볼루션 신경망(Dense Convolution Network)을 이용한다.Referring to Figure 3, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to an embodiment of the present invention, a dense convolutional network combined with an attention mechanism model (Attention Mechanism Model) use the

단일 어텐션 매커니즘 모델의 경우 정보량이 많은 영상 정보의 움직임 특징을 추출하는데 효과적이지 않다.In the case of a single attention mechanism model, it is not effective to extract the motion characteristics of image information with a large amount of information.

따라서, 시간과 공간의 이중 채널 어텐션 모델(Channel Attention Model)을 채택하여 이용한다.Therefore, a dual channel attention model of time and space is adopted and used.

특징 추출을 위해 이와 같이 강화된 모델은 영상 데이터에서 움직임 특징의 추출을 개선한다.This enhanced model for feature extraction improves the extraction of motion features from image data.

도 4는 채널 어텐션 모델 3D 덴스 컨볼루션 신경망에 기반한 전체 아키텍처를 나타내는 블록도이다.4 is a block diagram illustrating an overall architecture based on a channel attention model 3D dense convolutional neural network.

도 4를 참조하면, 3D 덴스 컨볼루션 신경망을 기반으로 하며, 어텐션 모델을 추가하여 오리지널 신경망를 수정한다.Referring to FIG. 4 , it is based on a 3D dense convolutional neural network, and the original neural network is modified by adding an attention model.

어텐션 모델에서, 채널 어텐션과, 스페셜 어텐션의 조합을 기반으로 효과적인 어텐션 매커니즘을 이용한다.In the attention model, an effective attention mechanism is used based on a combination of channel attention and special attention.

- 3D 덴스 신경망 -- 3D Dense Neural Network -

딥 컨볼루션 신경망을 사용하면 양쪽의 스트리밍 방법의 전반적인 성능을 향상시킬 수 있다.Using deep convolutional neural networks can improve the overall performance of both streaming methods.

본 발명에서는 덴스 신경망을 주요 네트워크 분기(Branch)로 사용하고 대응하는 3D 모듈을 추가하여 3D 덴스 신경망을 형성한다.In the present invention, a 3D dense neural network is formed by using a dense neural network as a main network branch and adding a corresponding 3D module.

오리지널 3D 덴스 네트워크를 기반으로 덴스 블록 사이의 변환 레이어에, 어텐션 모듈을 추가하여 특징 인식 효과를 향상시킨다.Based on the original 3D dense network, an attention module is added to the transformation layer between dense blocks to improve the feature recognition effect.

그리고 조밀하게 연결된 다중 덴스 블록을 가져온다.And bring densely connected multiple density blocks.

더욱 밀집된 덴스 블록은 신경망을 더 깊고 더 양호하게 할 수 있지만, 네트워크의 매개 변수와 복잡성을 증가시킨다.A denser dense block can make the neural network deeper and better, but it increases the parameters and complexity of the network.

반대로 선택된 덴스 블록이 너무 작을 경우, 신경망의 레이어 수가 감소하여 모델의 정확도에 영향을 미친다.Conversely, if the selected density block is too small, the number of layers of the neural network decreases, affecting the accuracy of the model.

본 발명에서는 일례로 4개의 덴스 블록을 사용한다.In the present invention, four dense blocks are used as an example.

각 덴스 블록은 피드 포워드 방식(Feed-Forward Manner)으로 연결된 여러 복합 함수가 포함되어 있다.Each density block contains several complex functions connected in a Feed-Forward Manner.

그 구조는 도 4를 참조하도록 한다.The structure will be referred to FIG.

어텐션 모듈은 특징 인식 효과를 높이기 위해 인접한 2개의 덴스 블록 사이에 추가된다.An attention module is added between two adjacent dense blocks to enhance the feature recognition effect.

본 신경망에서, 3D 덴스 블록과 유사한 2D 덴스 신경망을 사용한다.In this neural network, a 2D dense neural network similar to a 3D dense block is used.

이는 어떤 레이어의 3D 출력을 3D 덴스 블록의 모든 후속 레이어에 직접 연결한다.It connects the 3D output of a layer directly to all subsequent layers of the 3D dense block.

l번째 레이어에서 복합 함수 H_l은 모든 이전 (l-1) 레이어의

특징 맵(Map)을 입력으로 수신한다.In the lth layer, the complex function H _l is the value of all previous (l-1) layers.

Receives a feature map as an input.

l번째 레이어에서 H_l의 특징 맵은 하기 수학식 1로 주어진다. _{The feature map of H l} in the l-th layer is given by Equation 1 below.

[수학식 1][Equation 1]

여기서,

는 특징 맵과 연결되도록 표시된다.here,

is marked to be connected with the feature map.

특징 맵의 공간 사이즈는 동일하다.

The spatial size of the feature map is the same.

H_l은 BN-ReLU-3DConv 연산의 복합 함수이다.H _l is a complex function of the BN-ReLU-3DConv operation.

- 어텐션 모델 -- Attention model -

어텐션은 인간의 지각에 중요한 역할을 한다.Attention plays an important role in human perception.

최근 몇 년 동안, 이미지 인식에 탁월하였다.In recent years, it has excelled in image recognition.

어텐션 메커니즘은 원하는 결과를 더 잘 캡처하기 위해 특정 영역에 선택적 초점을 사용하기 때문에, 비디오 인식 작업에 어텐션 메커니즘을 적용한다.Because the attention mechanism uses selective focus on a specific area to better capture the desired result, we apply the attention mechanism to the video recognition task.

어텐션 모델에서는, 채널 어텐션과 스페셜 어텐션(Spatial Attention)을 결합한 어텐션 모델을 사용한다.In the attention model, an attention model that combines channel attention and special attention (Spatial Attention) is used.

- 채널 어텐션 모듈 -- Channel Attention Module -

도 5는 채널 어텐션 모듈을 나타내는 블록도이다.5 is a block diagram illustrating a channel attention module.

도 5를 참조하면, 채널 어텐션 맵은 특징의 채널간 관계를 활용하는 맵이다.Referring to FIG. 5 , the channel attention map is a map utilizing the relationship between channels of features.

그 구조는 도 5를 참조하면 확인할 수 있다.The structure can be confirmed with reference to FIG. 5 .

특징 맵의 각 채널은 특징 탐지기로 고려되므로, 채널 어텐션은 입력 이미지에서 주어진 의미있는 '무엇(What)'에 초점을 맞춘다.Since each channel in the feature map is considered a feature detector, the channel attention focuses on a meaningful 'What' given in the input image.

채널 어텐션을 효율적으로 계산하기 위해, 입력 특징 맵의 스페셜 차원을 압축한다.In order to efficiently calculate the channel attention, the special dimension of the input feature map is compressed.

이때, 평균 풀링과 최대 풀링 특징을 동시에 사용한다.In this case, the average pooling and maximum pooling features are used at the same time.

평균 풀링 특징인

와, 최대 풀링 특징인

의 두 가지 다른 스페셜 컨텍트 디스크립터(Descriptor)를 생성하여, 양쪽 특징을 모두 활용하면 신경망의 표현력이 크게 향상된다.average pooling characteristic

Wow, the max pooling feature

By creating two different special contact descriptors of

두 개의 풀링 레이어 뒤에, 다층 퍼셉트론(Multilayer Perceptron)의 히든층이 추가되어 매개 변수의 오버 헤드를 감소시킨다.Behind the two pooling layers, a hidden layer of Multilayer Perceptron is added to reduce parametric overhead.

두 개의 디스크립터는 히든층으로 전송된 다음 채널 어텐션 맵인 Mc를 생성한다.The two descriptors are transmitted to the hidden layer and then generate Mc, which is a channel attention map.

요컨대, 채널 어텐션은 하기 수학식 2로 계산된다.In other words, the channel attention is calculated by the following Equation (2).

[수학식 2][Equation 2]

여기서,

는 시그모이드 함수(Sigmoid Function)이다.here,

is a sigmoid function.

- 스페셜 어텐션 모듈(Spatial Attention Model) -- Special attention module (Spatial Attention Model) -

도 6은 스페셜 어텐션 모듈을 나타내는 블록도이다.6 is a block diagram illustrating a special attention module.

도 6을 참조하면, 스페셜 어텐션 맵은 특징의 스페셜 간 관계를 활용하여 스페셜 어텐션 맵을 생성한다.Referring to FIG. 6 , the special attention map generates a special attention map by using a relationship between specials of features.

그 구조는 도 6을 참조하여 확인할 수 있다.The structure can be confirmed with reference to FIG. 6 .

채널 어텐션과 달리, 스페셜 어텐션은 채널 어텐션을 보완하는 정보 부분인 '어디(Where)'에 초점을 맞춘다.Unlike channel attention, special attention focuses on 'where', an information part that complements channel attention.

우선 채널축을 따라 평균 풀링 및 최대 풀링 연산을 적용하고, 이를 연결하여 효율적인 특징 디스크립터를 생성한다.First, average pooling and maximum pooling operations are applied along the channel axis, and an efficient feature descriptor is generated by connecting them.

연결된 특징 디스크립터에, 컨볼루션 레이어를 적용하여 스페셜 어텐션 맵인 Ms(F)를 생성합니다.A convolution layer is applied to the connected feature descriptor to generate a special attention map, Ms(F).

다음, 평균 풀링 및 최대 풀링을 사용하여 두 개의 풀링 특징

을 획득하고, 두 개의 풀링은 특징 맵의 채널 정보 연산을 집계하여 두 개의 2D 맵을 생성한다.Next, two pooling features using average pooling and max pooling

, and the two pooling aggregates the channel information operation of the feature map to generate two 2D maps.

이후, 표준 컨볼루션 레이어에 의해 연결되고 컨볼루션되어 2D 스페셜 어텐션 맵이 생성된다.After that, it is connected and convolved by the standard convolutional layer to generate a 2D special attention map.

요컨대, 스페셜 어텐션은 하기 수학식 3으로 계산된다.In other words, the special attention is calculated by Equation 3 below.

[수학식 3][Equation 3]

여기서,

는 시그모이드 함수이고,

은 필터 크기가 7 × 7 사이즈를 갖는 컨볼루션 연산을 나타낸다.here,

is the sigmoid function,

denotes a convolution operation with a filter size of 7 × 7.

채널과 스페셜이라는 두 개의 어텐션 모듈은 각각 '무엇'과 '어디'에 초점을 맞추어 보완 어텐션을 계산한다.Two attention modules, Channel and Special, calculate complementary attention by focusing on 'what' and 'where', respectively.

이를 고려하여 두 개의 모듈을 병렬 또는 순차적으로 배치할 수 있다.Considering this, two modules can be arranged in parallel or sequentially.

본 발명에서, 채널 어텐션과 스페셜 어텐션을 결합하기 위해 연결 접근법을 사용하는데, 이는 병렬보다 더 나은 효과를 가진다.In the present invention, a concatenated approach is used to combine channel attention and special attention, which has a better effect than parallel.

도 7은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템의 전처리 단계의 흐름을 나타내는 플로우 차트이다.7 is a flowchart illustrating a flow of a pre-processing step of an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.

본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 촬영부(100)에 의해 제공된 영상 데이터에 대한 컨볼루션 연산(Convolution Calculation)을 수행하여 보행자의 행동을 분석하는 제 1 단계 이전에, 3개의 전처리 단계를 포함한다.In the abnormal behavior detection method for detecting and recognizing abnormal behavior based on deep learning according to the present invention, the first method for analyzing the behavior of a pedestrian by performing a convolution calculation on the image data provided by the photographing unit 100 Before the step, it includes three pre-treatment steps.

제 1 전처리 단계(S10)에서는, 촬영부(100)에 의해 해당 지역 보행자의 영상 데이터를 수집한다.In the first pre-processing step ( S10 ), image data of a pedestrian in the corresponding area is collected by the photographing unit 100 .

제 2 전처리 단계(S20)에서는, 영상 데이터의 해상도를 조정한다.In the second pre-processing step S20, the resolution of the image data is adjusted.

제 3 전처리 단계(S30)에서는, 영상 데이터를 클라우드 서버(200)로 전송한다.In the third pre-processing step S30 , the image data is transmitted to the cloud server 200 .

이와 같은 전처리 단계에 의해, 촬영부(100)는 촬영한 데이터의 해상도를 조정한 후, 클라우드 서버(200)에 제공하게 된다.By this pre-processing step, the photographing unit 100 adjusts the resolution of the photographed data and then provides it to the cloud server 200 .

도 8은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법의 전체 흐름을 나타내는 플로우 차트이다.8 is a flowchart illustrating an overall flow of an abnormal behavior detection method for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법은 5개의 단계를 포함한다.Referring to FIG. 8 , the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to an embodiment of the present invention includes five steps.

제 1 단계(S100)에서는, 촬영부(100)에 의해 제공된 영상 데이터에 대한 컨볼루션 연산(Convolution Calculation)을 수행하여 보행자의 행동을 분석한다.In the first step ( S100 ), the behavior of the pedestrian is analyzed by performing a convolution calculation on the image data provided by the photographing unit 100 .

여기서, 컨볼루션 연산은 덴스 콘볼루션 신경망(Dense Convolutional Network)에 의해 수행되며, 영상 데이터가 224 × 22 이미지 크기의 RGB 3 채널 형식이 특징 블록 처리를 통해 덴스 콘볼루션 신경망에 입력된다.Here, the convolution operation is performed by a dense convolutional network, and image data is input to the dense convolutional neural network in RGB 3-channel format with a 224 × 22 image size through feature block processing.

여기서, 콘볼루션 신경망은 시각적 영상을 분석하는데 사용되는 다층의 피드-포워드적인 인공 신경망의 한 종류이다.Here, the convolutional neural network is a kind of multi-layer feed-forward artificial neural network used to analyze visual images.

딥 러닝에서 심층 신경망으로 분류되며, 시각적 영상 분석에 주로 적용된다.It is classified as a deep neural network in deep learning, and is mainly applied to visual image analysis.

또한, 공유 가중치 구조와 변환 불변성 특성에 기초하여 변이 불변 또는 공간 불변 인공 신경망(SIANN)으로도 알려져 있다.It is also known as mutant invariant or spatially invariant artificial neural network (SIANN) based on shared weight structure and transform invariant properties.

영상 및 동영상 인식, 추천 시스템, 영상 분류, 의료 영상 분석 및 자연어 처리 등에 응용된다. It is applied to image and video recognition, recommendation system, image classification, medical image analysis, and natural language processing.

콘볼루션 신경망은 정규화된 버전의 다층 퍼셉트론이다.A convolutional neural network is a normalized version of a multilayer perceptron.

다층 퍼셉트론은 일반적으로 완전히 연결된 네트워크, 즉 한 계층의 각 뉴런이 다음 계층의 모든 뉴런에 연결되는 신경망 구조이다.A multilayer perceptron is usually a fully connected network, i.e. a neural network structure in which each neuron in one layer connects to all neurons in the next layer.

이와 같이 네트워크가 완전 연결된 경우 주어진 데이터에 과적합 되는 경향이 있다.When the network is fully connected like this, it tends to overfit to the given data.

일반적인 정규화를 위해 최적화 함수에 특정 척도를 추가하는 방법이 흔이 쓰이지만, 콘볼루션 신경망 정규화를 위한 다른 접근 방식을 취한다.For general regularization, adding a specific measure to the optimization function is common, but a different approach for regularizing convolutional neural networks is taken.

데이터에서 계층적 패턴을 활용하고 더 작고 간단한 패턴을 사용하여 더 복잡한 패턴을 표현함으로써 정규화와 같은 효과를 내는 것이다.By leveraging hierarchical patterns in your data and using smaller, simpler patterns to represent more complex patterns, you achieve the same effect as regularization.

따라서, 콘볼루션 신경망의 연결 구조의 복잡성은 유사한 기능의 다층 퍼셉트론에 비해 극단적으로 낮다.Therefore, the complexity of the connection structure of the convolutional neural network is extremely low compared to multilayer perceptrons with similar functions.

콘볼루션 신경망은 뉴런 사이의 연결 패턴이 동물 시각 피질의 조직과 유사하다는 점에 영감을 받았다.Convolutional neural networks are inspired by the fact that the pattern of connections between neurons resembles the organization of the animal visual cortex.

개별 피질 뉴런은 수용장(receptive field)으로 알려진 시야의 제한된 영역에서만 자극에 반응한다.Individual cortical neurons respond to stimuli only in a limited area of the field of view known as the receptive field.

상이한 뉴런의 수용 필드는 전체 시야를 볼 수 있도록 부분적으로 중첩된다. The receptive fields of different neurons partially overlap to see the entire field of view.

콘볼루션 신경망을 이용한 영상 분류는 다른 영상 분류 알고리즘에 비해 상대적으로 전처리를 거의 사용하지 않는다.Image classification using a convolutional neural network uses relatively little preprocessing compared to other image classification algorithms.

이는 신경망이 기존 알고리즘에서 수작업으로 제작된 필터를 학습한다는 것을 의미한다.This means that the neural network learns hand-crafted filters from existing algorithms.

기존 영상 분류 알고리듬에서 설계자가 영상의 특징들을 미리 이해해 알고리듬을 만드는 과정이 없는 것이 합성곱 신경망의 주요한 장점이다. The main advantage of convolutional neural networks is that there is no process in which the designer understands the characteristics of the image in advance and creates the algorithm in the existing image classification algorithm.

콘볼루션 신경망은 크게 콘볼루션 레이어(Convolution Layer)와 풀링 레이어(Pooling Layer)로 구성된다.A convolutional neural network is largely composed of a convolution layer and a pooling layer.

제 2 단계(S200)에서는, 어텐션 매커니즘 모델(Attention Mechanism Model: 230)을 통해 보행자의 행동을 상세 분석한다.In the second step (S200), the behavior of the pedestrian is analyzed in detail through the attention mechanism model (Attention Mechanism Model: 230).

각각의 영상 데이터가 덴스 콘볼루션 신경망에 입력되면, 덴스 콘볼루션 신경망은 입력된 각각의 영상 데이터를 이어맞추고 변환 블록으로 입력되어 학습 처리를 지속하게 된다.When each image data is input to the dense convolutional neural network, the dense convolutional neural network connects each input image data and is input to a transform block to continue the learning process.

이때, 특징 블록과, 변환 블록은 콘볼루션 레이어(Convolution Layer) 및 풀링 레이어(Pooling Layers) 일 수 있다.In this case, the feature block and the transform block may be a convolution layer and a pooling layer.

또한, 어텐션 매커니즘 모델(230)은 채널 어텐션 모델(Channel Attention Model: 231)과, 스페셜 어텐션 모델(Spatial Attention Model: 232)을 포함한다.In addition, the attention mechanism model 230 includes a channel attention model (Channel Attention Model: 231) and a special attention model (Spatial Attention Model: 232).

여기서, 어텐션 매커니즘 모델(230)은 어텐션이 관련없는 정보를 무시하고 핵심 정보에 집중하도록 학습을 수행한다.Here, the attention mechanism model 230 performs learning such that the attention ignores irrelevant information and concentrates on the core information.

또한, 채널 어텐션 모델은 Hu et al.이 처음으로 squeeze-and-excitation block이란 이름으로 제안하였다.In addition, the channel attention model was first proposed by Hu et al. under the name of squeeze-and-excitation block.

일반적인 컨볼루션 연산은 국부적 정보만 이용하지만, 채널 어텐션 모듈은 GAP(Global Average Pooling)를 적용하여 비국부적인 정보까지 이용할 수 있게 한다.A general convolution operation uses only local information, but the channel attention module applies global average pooling (GAP) to make it possible to use even non-local information.

제 3 단계(S300)에서는, 채널 어텐션 모델(231)을 통해 영상 데이터에서 보행자의 행동의 정상 여부를 타임 페리어드로 분석한다.In the third step ( S300 ), whether the pedestrian's behavior is normal in the image data is analyzed as a time period through the channel attention model 231 .

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 채널 어텐션 모델(231)은 영상에서 어텐션이 요구되는 지점의 사진을 샘플링하고, 샘플링된 사진으로부터 보행자의 행동 이미지를 처리한다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, the channel attention model 231 samples a picture of a point requiring attention in the image, and a behavior image of a pedestrian from the sampled picture to process

이와 같은 채널 어텐션 모델(231)은 특정 채널 사이의 연속성을 가진다.Such a channel attention model 231 has continuity between specific channels.

또한, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 최대 풀링 레이어와 평균 풀링 레이어의 조합을 사용하여 입력 특성 맵의 스페셜 차원을 압축하여 채널 어텐션의 효율을 증가시킬 수 있다.In addition, in the deep learning-based abnormal behavior detection method for detecting and recognizing abnormal behavior according to the present invention, the special dimension of the input feature map is compressed using a combination of the maximum pooling layer and the average pooling layer to increase the efficiency of channel attention. can

이러한 2개의 풀링 레이어 뒤에 다층 퍼셉트론(Perceptron)의 히든층이 추가되어 매개 변수의 오버 헤드를 감소시킬 수 있다.Behind these two pooling layers, a hidden layer of multi-layer perceptrons can be added to reduce parameter overhead.

한편, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 2개의 디스크립터(Descriptor)는 히든층으로 전송된 채널 어텐션 맵을 생성하는 것을 특징으로 한다.On the other hand, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to the present invention, two descriptors are characterized in that the channel attention map transmitted to the hidden layer is generated.

제 4 단계(S400)에서는, 스페셜 어텐션 모델(232)과 관련된 채널 어텐션 모델(231)을 통해 보행자의 행동의 정상 여부에 대해 분석한다.In the fourth step ( S400 ), it is analyzed whether the pedestrian's behavior is normal through the channel attention model 231 related to the special attention model 232 .

특히, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법에서, 스페셜 어텐션 모델(232)은 위치에 집중하여 분석한다.In particular, in the abnormal behavior detection method for detecting and recognizing the deep learning-based abnormal behavior according to the present invention, the special attention model 232 analyzes by focusing on the location.

제 5 단계(S500)에서는, 의미 분석을 통해 보행자의 행동을 정상 행동과 비정상 행동으로 구분하고, 비정상 행동을 비의도적 정상 행동과 의도적 비정상 행동으로 분석한다.In the fifth step (S500), the pedestrian's behavior is divided into normal behavior and abnormal behavior through semantic analysis, and the abnormal behavior is analyzed into unintentional normal behavior and intentional abnormal behavior.

도 9는 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템의 전체 구성을 나타내는 블록도이다.9 is a block diagram showing the overall configuration of an abnormal behavior detection system for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention.

도 9를 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템(1000)은 촬영부(100)와, 클라우딩 서버(200)를 포함한다.Referring to FIG. 9 , an abnormal behavior detection system 1000 for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention includes a photographing unit 100 and a clouding server 200 .

촬영부(100)는 카메라 등을 통해 촬영한 영상 데이터를 분석하여 불특정 다수의 인간들을 추출하고 행동을 분석하여 비정상적인 행동의 여부를 분석할 수 있도록 클라우딩 서버(200)에 촬영한 영상 데이터를 제공하는 역할을 수행한다.The photographing unit 100 provides the photographed image data to the clouding server 200 so that an unspecified number of humans can be extracted by analyzing the image data photographed through a camera, etc., and whether there is an abnormal behavior by analyzing the behavior perform the role

클라우딩 서버(200)는 촬영부(100)로부터 제공받은 영상 데이터를 분석하여 불특정 다수의 인간들을 추출하고 행동을 분석하여 비정상적인 행동의 여부를 분석하는 역할을 수행한다.The clouding server 200 performs a role of analyzing the image data provided from the photographing unit 100 to extract a large number of unspecified humans and analyzing the behavior to analyze whether there is an abnormal behavior.

도 10은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템에서 촬영부의 구성을 나타내는 블록도이다.10 is a block diagram showing the configuration of a photographing unit in the abnormal behavior detection system for detecting and recognizing abnormal behavior based on deep learning according to an embodiment of the present invention.

도 10을 참조하면, 본 발명에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템(1000)에서, 촬영부(100)는 카메라부(110)와, 해상도 조정부(120)와, 통신부(130)를 포함한다.Referring to FIG. 10 , in the abnormal behavior detection system 1000 for detecting and recognizing abnormal behavior based on deep learning according to the present invention, the photographing unit 100 includes a camera unit 110 , a resolution adjusting unit 120 , and a communication unit. (130).

여기서, 카메라부(110)는 해당 지역 보행자의 영상 데이터를 수집하는 역할을 수행한다.Here, the camera unit 110 serves to collect image data of pedestrians in the corresponding area.

해상도 조정부(120)는 카메라부(110)에 의해 촬영되어 수집된 영상 데이터의 해상도를 조정하는 역할을 수행한다.The resolution adjusting unit 120 serves to adjust the resolution of the image data captured and collected by the camera unit 110 .

통신부(130)는 해상도 조정부(120)에 의해 해상도가 조정된 영상 데이터를 클라우드 서버(200)로 전송하는 역할을 수행한다.The communication unit 130 serves to transmit the image data whose resolution is adjusted by the resolution adjustment unit 120 to the cloud server 200 .

도 11은 본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템에서 클라우딩 서버의 구성을 나타내는 블록도이다.11 is a block diagram showing the configuration of a clouding server in an abnormal behavior detection system that detects and recognizes a deep learning-based abnormal behavior according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템(1000)에서, 클라우딩 서버(200)는 수신부(210)와, 연산부(220)와, 어텐션 매커니즘 모델부(230)와, 의미 분석부(240)와, 출력부(250)를 포함한다.In the abnormal behavior detection system 1000 for detecting and recognizing a deep learning-based abnormal behavior according to an embodiment of the present invention, the clouding server 200 includes a receiving unit 210, a calculating unit 220, and an attention mechanism model unit. 230 , a semantic analysis unit 240 , and an output unit 250 .

여기서, 수신부(210)는 촬영부(100)에 의해 촬영된 영상 데이터를 통신부(130)에 의해 수신하는 역할을 수행한다.Here, the receiving unit 210 serves to receive the image data photographed by the photographing unit 100 by the communication unit 130 .

즉, 수신부(210)는 입력되는 영상 데이터를 수신한다.That is, the receiver 210 receives input image data.

연산부(220)는 수신부(210)에 의해 수신된 영상 데이터에 대한 컨볼루션 연산을 수행하여 보행자의 행동을 분석한다.The calculator 220 analyzes the behavior of the pedestrian by performing a convolution operation on the image data received by the receiver 210 .

어텐션 매커니즘 모델부(230)는 보행자의 행동을 상세 분석하는 역할을 수행한다.The attention mechanism model unit 230 performs a role of detailed analysis of the pedestrian's behavior.

즉, 어텐션 매커니즘 모델부(230)의 어텐션 매커니즘 모델은 어텐션이 관련없는 정보를 무시하고 핵심 정보에 집중하도록 학습을 수행한다.That is, the attention mechanism model of the attention mechanism model unit 230 performs learning so that the attention ignores irrelevant information and focuses on the core information.

이러한 어텐션 매커니즘 모델부(230)는 채널 어텐션 모델부(231)와, 스페셜 어텐션 모델부(232)를 포함한다.The attention mechanism model unit 230 includes a channel attention model unit 231 and a special attention model unit 232 .

채널 어텐션 모델부(231)의 채널 어텐션 모델은 영상 데이터에서 보행자의 행동의 정상 여부를 타임 페리어드로 분석한다.The channel attention model of the channel attention model unit 231 analyzes whether the pedestrian's behavior is normal in the image data as a time period.

채널 어텐션 모델부(231)의 채널 어텐션 모델은 이러한 영상에서 어텐션이 요구되는 지점의 사진을 샘플링하고, 샘플링된 사진으로부터 보행자의 행동 이미지를 처리한다.The channel attention model of the channel attention model unit 231 samples a picture of a point requiring attention in the image, and processes a behavior image of a pedestrian from the sampled picture.

이러한 채널 어텐션 모델부(231)의 채널 어텐션 모델은 특정 채널 사이의 연속성을 가진다.The channel attention model of the channel attention model unit 231 has continuity between specific channels.

스페셜 어텐션 모델(232)은 채널 어텐션 모델을 통해 보행자의 행동의 정상 여부에 대해 분석한다.The special attention model 232 analyzes whether the pedestrian's behavior is normal through the channel attention model.

이러한 스페셜 어텐션 모델(232)은 위치에 집중하여 분석을 수행한다.The special attention model 232 performs analysis by focusing on the location.

의미 분석부(240)는 의미 분석을 통해 보행자의 행동을 정상 행동과 비정상 행동으로 구분하고, 비정상 행동을 비의도적 정상 행동과 의도적 비정상 행동으로 분석한다.The semantic analysis unit 240 divides the pedestrian's behavior into a normal behavior and an abnormal behavior through semantic analysis, and analyzes the abnormal behavior into an unintentional normal behavior and an intentional abnormal behavior.

출력부(250)는 의미 분석부(240)에 의해 의미 분석이 수행된 최종 결과 데이터를 출력하는 역할을 수행한다.The output unit 250 serves to output the final result data on which the semantic analysis is performed by the semantic analysis unit 240 .

이와 같이 본 발명에 의하면, 인간의 행동을 분석하여 인간의 행동으로부터 정상 행동과 비정상 행동을 탐지시, 비정상 행동을 행하는 인간 중 진실로 비정상 행동을 행하는 인간과, 거짓으로 비정상 행동을 행하는 인간들을 탐지하여 인식할 수 있는 딥러닝 기반 비정상 행동을 탐지하여 인식할 수 있는 효과가 있다.As described above, according to the present invention, when human behavior is analyzed and normal behavior and abnormal behavior are detected from human behavior, among humans performing abnormal behavior, genuinely abnormal people and falsely abnormal behaviors are detected. It has the effect of detecting and recognizing an abnormal behavior based on deep learning that can be recognized.

이상, 일부 예를 들어서 본 발명의 바람직한 여러 가지 실시 예에 대해서 설명하였지만, 본 "발명을 실시하기 위한 구체적인 내용" 항목에 기재된 여러 가지 다양한 실시 예에 관한 설명은 예시적인 것에 불과한 것이며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 이상의 설명으로부터 본 발명을 다양하게 변형하여 실시하거나 본 발명과 균등한 실시를 행할 수 있다는 점을 잘 이해하고 있을 것이다.In the above, although several preferred embodiments of the present invention have been described with some examples, the descriptions of various various embodiments described in the "Specific Contents for Carrying Out the Invention" item are merely exemplary, and the present invention Those of ordinary skill in the art will understand well that the present invention can be practiced with various modifications or equivalents to the present invention from the above description.

또한, 본 발명은 다른 다양한 형태로 구현될 수 있기 때문에 본 발명은 상술한 설명에 의해서 한정되는 것이 아니며, 이상의 설명은 본 발명의 개시 내용이 완전해지도록 하기 위한 것으로 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이며, 본 발명은 청구범위의 각 청구항에 의해서 정의될 뿐임을 알아야 한다.In addition, since the present invention can be implemented in various other forms, the present invention is not limited by the above description, and the above description is intended to complete the disclosure of the present invention, and is usually It should be understood that this is only provided to fully inform those with knowledge of the scope of the present invention, and that the present invention is only defined by each of the claims.

100 : 촬영부
110 : 카메라부
120 : 해상도 조정부
130 : 통신부
200 : 클라우딩 서버
210 : 수신부
220 : 연산부
230 : 어텐션 매카니즘 모델부
231 : 채널 어텐션 모델부
232 : 스페셜 어텐션 모델부
240 : 의미 분석부
250 : 출력부
1000 : 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템100: shooting department
110: camera unit
120: resolution adjustment unit
130: communication department
200 : Cloud Server
210: receiver
220: arithmetic unit
230: attention mechanism model part
231: channel attention model unit
232: special attention model part
240: semantic analysis unit
250: output unit
1000: Abnormal behavior detection system that detects and recognizes abnormal behavior based on deep learning

Claims

삭제delete

클라우딩 서버에 의해 수행되는 딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법으로서,
촬영부에 의해 제공된 영상 데이터에 대한 컨볼루션 연산(Convolution Calculation)을 수행하여 보행자의 행동을 분석하는 제 1 단계;
어텐션 매커니즘 모델(Attention Mechanism Model)을 통해 상기 보행자의 행동을 상세 분석하는 제 2 단계;
채널 어텐션 모델을 통해 상기 영상 데이터에서 보행자의 행동의 정상 여부를 타임 페리어드로 분석하는 제 3 단계;
스페셜 어텐션 모델과 관련된 상기 채널 어텐션 모델을 통해 상기 보행자의 행동의 정상 여부에 대해 분석하는 제 4 단계; 및
의미 분석을 통해 보행자의 행동을 정상 행동과 비정상 행동으로 구분하고, 비정상 행동을 비의도적 정상 행동과 의도적 비정상 행동으로 분석하는 제 5 단계;를 포함하며,
상기 제 2 단계는 상기 채널 어텐션 모델을 통해 상기 영상 데이터에서 보행자의 행동의 정상 여부를 타임 페리어드로 분석하는 제 3 단계 및 스페셜 어텐션 모델과 관련된 상기 채널 어텐션 모델을 통해 상기 보행자의 행동의 정상 여부에 대해 분석하는 제 4 단계를 포함하여 구성되고,
상기 컨볼루션 연산은 덴스 콘볼루션 신경망(Dense Convolutional Network)에 의해 수행되며,
상기 영상 데이터가 224 × 22 이미지 크기의 RGB 3 채널 형식이 특징 블록 처리를 통해 상기 덴스 콘볼루션 신경망에 입력되고,
각각의 상기 영상 데이터가 상기 덴스 콘볼루션 신경망에 입력되면, 상기 덴스 콘볼루션 신경망은 입력된 각각의 상기 영상 데이터를 이어맞추고 변환 블록으로 입력되어 학습 처리를 지속하며,
상기 특징 블록과, 상기 변환 블록은 콘볼루션 레이어(Convolution Layer) 및 풀링 레이어(Pooling Layers)이고,
상기 어텐션 매커니즘 모델은 채널 어텐션 모델(Channel Attention Model)과, 스페셜 어텐션 모델(Spatial Attention Model)을 포함하며,
상기 어텐션 매커니즘 모델은 어텐션이 관련없는 정보를 무시하고 핵심 정보에 집중하도록 학습을 수행하고,
상기 채널 어텐션 모델은 상기 영상에서 어텐션이 요구되는 지점의 사진을 샘플링하고, 샘플링된 사진으로부터 보행자의 행동 이미지를 처리하며,
상기 채널 어텐션 모델은 특정 채널 사이의 연속성을 가지며,
상기 덴스 콘볼루션 신경망은 인접하는 2개의 덴스 블록 사이의 변환 레이어에 상기 어텐션 매커니즘 모델을 추가하되, l번째 레이어에서 복합 함수 Hl은 모든 이전 (l-1) 레이어의 특징 맵(Map)을 입력으로 수신하며, l번째 레이어에서 Hl의 특징 맵은 하기 수학식 1인 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법.
[수학식 1]

- 여기서,

는 특징 맵과 연결되도록 표시되고,

특징 맵의 공간 사이즈는 동일하며, H_l은 BN-ReLU-3DConv 연산의 복합 함수임 -
As an abnormal behavior detection method for detecting and recognizing deep learning-based abnormal behavior performed by a clouding server,
A first step of analyzing the behavior of the pedestrian by performing a convolution operation (Convolution Calculation) on the image data provided by the photographing unit;
a second step of analyzing the pedestrian's behavior in detail through an attention mechanism model;
a third step of analyzing whether a pedestrian's behavior is normal in the image data using a channel attention model as a time period;
a fourth step of analyzing whether the pedestrian's behavior is normal through the channel attention model related to the special attention model; and
A fifth step of classifying pedestrian behavior into normal behavior and abnormal behavior through semantic analysis, and analyzing the abnormal behavior into unintentional normal behavior and intentional abnormal behavior;
The second step is a third step of analyzing whether the pedestrian's behavior is normal in the image data through the channel attention model as a time period and whether the pedestrian's behavior is normal through the channel attention model related to the special attention model Consists of including a fourth step to analyze for
The convolution operation is performed by a dense convolutional network,
The image data is input to the dense convolutional neural network through feature block processing in RGB 3-channel format with a 224 × 22 image size,
When each of the image data is input to the dense convolutional neural network, the dense convolutional neural network splices each of the input image data and is input to a transform block to continue the learning process,
The feature block and the transform block are a convolution layer and a pooling layer,
The attention mechanism model includes a channel attention model (Channel Attention Model) and a special attention model (Spatial Attention Model),
The attention mechanism model performs learning so that attention ignores irrelevant information and focuses on key information,
The channel attention model samples a picture of a point where attention is required in the image, and processes a behavior image of a pedestrian from the sampled picture,
The channel attention model has continuity between specific channels,
The dense convolutional neural network adds the attention mechanism model to the transform layer between two adjacent dense blocks, but in the l-th layer, the complex function Hl inputs the feature maps of all previous (l-1) layers as input. Received, characterized in that the feature map of Hl in the l-th layer is Equation 1 below,
An abnormal behavior detection method that detects and recognizes abnormal behavior based on deep learning.
[Equation 1]

- here,

is marked to be connected with the feature map,

The spatial size of the feature map is the same, and H _l is a complex function of the BN-ReLU-3DConv operation -

삭제delete

제 2 항에 있어서,
최대 풀링 레이어와 평균 풀링 레이어의 조합을 사용하여 입력 특성 맵의 스페셜 차원을 압축하여 채널 어텐션의 효율을 증가시키는 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법.
3. The method of claim 2,
Characterized in increasing the efficiency of channel attention by compressing the special dimension of the input feature map using a combination of the maximum pooling layer and the average pooling layer,
An abnormal behavior detection method that detects and recognizes abnormal behavior based on deep learning.

제 10 항에 있어서,
2개의 풀링 레이어 뒤에 다층 퍼셉트론(Perceptron)의 히든층이 추가되어 매개 변수의 오버 헤드를 감소시키는 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법.
11. The method of claim 10,
A hidden layer of multi-layer perceptron is added behind the two pooling layers to reduce the overhead of parameters,
An abnormal behavior detection method that detects and recognizes abnormal behavior based on deep learning.

제 11 항에 있어서,
2개의 디스크립터(Descriptor)는 히든층으로 전송된 채널 어텐션 맵을 생성하는 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법.
12. The method of claim 11,
Two descriptors (Descriptor) characterized in that it generates a channel attention map transmitted to the hidden layer,
An abnormal behavior detection method that detects and recognizes abnormal behavior based on deep learning.

삭제delete

제 2 항에 있어서,
상기 스페셜 어텐션 모델은 위치에 집중하여 분석하는 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법.
3. The method of claim 2,
The special attention model is characterized in that the analysis is concentrated on the location,
An abnormal behavior detection method that detects and recognizes abnormal behavior based on deep learning.

삭제delete

제 2 항에 있어서,
상기 제 1 단계 이전에,
촬영부에 의해 해당 지역 보행자의 영상 데이터를 수집하는 제 1 전처리 단계;
상기 영상 데이터의 해상도를 조정하는 제 2 전처리 단계; 및
상기 영상 데이터를 클라우딩 서버로 전송하는 제 3 전처리 단계;를 포함하는 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 방법.
3. The method of claim 2,
Prior to the first step,
a first pre-processing step of collecting image data of pedestrians in the corresponding area by the photographing unit;
a second pre-processing step of adjusting the resolution of the image data; and
A third pre-processing step of transmitting the image data to the clouding server; characterized in that it comprises,
An abnormal behavior detection method that detects and recognizes abnormal behavior based on deep learning.

입력되는 영상 데이터를 수신하는 수신부;
상기 영상 데이터에 대한 컨볼루션 연산을 수행하여 보행자의 행동을 분석하는 연산부;
상기 보행자의 행동을 상세 분석하는 어텐션 매커니즘 모델부; 및
의미 분석을 통해 보행자의 행동을 정상 행동과 비정상 행동으로 구분하고, 비정상 행동을 비의도적 정상 행동과 의도적 비정상 행동으로 분석하는 의미 분석부;를 포함하는 클라우딩 서버와,
해당 지역 보행자의 영상 데이터를 수집하는 카메라부;
상기 영상 데이터의 해상도를 조정하는 해상도 조정부; 및
상기 영상 데이터를 상기 클라우딩 서버로 전송하는 통신부;를 포함하는 촬영부를 포함하며,
상기 컨볼루션 연산은 덴스 콘볼루션 신경망(Dense Convolutional Network)에 의해 수행되며,
상기 영상 데이터가 224 × 22 이미지 크기의 RGB 3 채널 형식이 특징 블록 처리를 통해 상기 덴스 콘볼루션 신경망에 입력되고,
각각의 상기 영상 데이터가 상기 덴스 콘볼루션 신경망에 입력되면, 상기 덴스 콘볼루션 신경망은 입력된 각각의 상기 영상 데이터를 이어맞추고 변환 블록으로 입력되어 학습 처리를 지속하며,
상기 특징 블록과, 상기 변환 블록은 콘볼루션 레이어(Convolution Layer) 및 풀링 레이어(Pooling Layers)이고,
상기 어텐션 매커니즘 모델부는 채널 어텐션 모델(Channel Attention Model)과, 스페셜 어텐션 모델(Spatial Attention Model)을 포함하며,
상기 어텐션 매커니즘 모델부는 어텐션이 관련없는 정보를 무시하고 핵심 정보에 집중하도록 학습을 수행하고,
상기 채널 어텐션 모델은 상기 영상에서 어텐션이 요구되는 지점의 사진을 샘플링하고, 샘플링된 사진으로부터 보행자의 행동 이미지를 처리하며,
상기 채널 어텐션 모델은 특정 채널 사이의 연속성을 가지며,
상기 덴스 콘볼루션 신경망은 인접하는 2개의 덴스 블록 사이의 변환 레이어에 상기 어텐션 매커니즘 모델부를 추가하되, l번째 레이어에서 복합 함수 Hl은 모든 이전 (l-1) 레이어의 특징 맵(Map)을 입력으로 수신하며, l번째 레이어에서 Hl의 특징 맵은 하기 수학식 1인 것을 특징으로 하는,
딥러닝 기반 비정상 행동을 탐지하여 인식하는 비정상 행동 탐지 시스템.
[수학식 1]

- 여기서,

는 특징 맵과 연결되도록 표시되고,

특징 맵의 공간 사이즈는 동일하며, H_l은 BN-ReLU-3DConv 연산의 복합 함수임 -
a receiver for receiving input image data;
a calculation unit for performing a convolution operation on the image data to analyze a pedestrian's behavior;
an attention mechanism model unit for detailed analysis of the pedestrian's behavior; and
A clouding server comprising; a semantic analysis unit that classifies pedestrian behavior into normal behavior and abnormal behavior through semantic analysis, and analyzes the abnormal behavior into unintentional normal behavior and intentional abnormal behavior;
a camera unit that collects image data of pedestrians in the corresponding area;
a resolution adjusting unit for adjusting the resolution of the image data; and
a communication unit for transmitting the image data to the clouding server; and a photographing unit comprising a,
The convolution operation is performed by a dense convolutional network,
The image data is input to the dense convolutional neural network through feature block processing in RGB 3-channel format with a 224 × 22 image size,
When each of the image data is input to the dense convolutional neural network, the dense convolutional neural network splices each of the input image data and is input to a transform block to continue the learning process,
The feature block and the transform block are a convolution layer and a pooling layer,
The attention mechanism model unit includes a channel attention model (Channel Attention Model) and a special attention model (Spatial Attention Model),
The attention mechanism model unit performs learning so that attention ignores irrelevant information and focuses on key information,
The channel attention model samples a picture of a point where attention is required in the image, and processes a behavior image of a pedestrian from the sampled picture,
The channel attention model has continuity between specific channels,
The dense convolutional neural network adds the attention mechanism model part to the transformation layer between two adjacent dense blocks, but in the l-th layer, the complex function Hl inputs the feature maps of all previous (l-1) layers as input. Received, characterized in that the feature map of Hl in the l-th layer is Equation 1 below,
An abnormal behavior detection system that detects and recognizes abnormal behavior based on deep learning.
[Equation 1]

- here,

is marked to be connected with the feature map,

삭제delete