KR20240018161A

KR20240018161A - Skeleton graph-based action recognition system and method with data augmentation and contrastive learning

Info

Publication number: KR20240018161A
Application number: KR1020220096090A
Authority: KR
Inventors: 고병철; 이경현
Original assignee: 계명대학교 산학협력단
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2024-02-13

Abstract

본 발명은 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에 관한 것으로서, 보다 구체적으로는 골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 시스템으로서, 행동 분류 분기(classification branch) 및 대조 학습 분기(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델을 학습하는 학습부; 및 상기 학습부에서 학습 완료된 행동 인식 모델의 상기 행동 분류 분기를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식부를 포함하여 구성되며, 상기 행동 인식 모델은, 그래프 기반의 학습을 하는 복수의 그래프 학습 모듈을 포함하여 구성되며, 입력된 골격 그래프로부터 행동 인식 확률을 출력하는 상기 행동 분류 분기; 및 원본 골격 그래프에 데이터 증강 기법을 적용해 증강된 그래프를 생성하고, 상기 원본 골격 그래프와 증강된 그래프를 사용해 대조 학습을 수행하는 상기 대조 학습 분기를 포함하는 것을 그 구성상의 특징으로 한다.
또한, 본 발명은 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법에 관한 것으로서, 보다 구체적으로는 컴퓨터에서 각 단계가 수행되는 골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 방법으로서, (1) 행동 분류 분기(classification branch) 및 대조 학습 분기(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델을 학습하는 학습 단계; 및 (2) 상기 학습 단계에서 학습 완료된 행동 인식 모델의 상기 행동 분류 분기를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식 단계를 포함하여 구성되며, 상기 행동 인식 모델은, 그래프 기반의 학습을 하는 복수의 그래프 학습 모듈을 포함하여 구성되며, 입력된 골격 그래프로부터 행동 인식 확률을 출력하는 상기 행동 분류 분기; 및 원본 골격 그래프에 데이터 증강 기법을 적용해 증강된 그래프를 생성하고, 상기 원본 골격 그래프와 증강된 그래프를 사용해 대조 학습을 수행하는 상기 대조 학습 분기를 포함하는 것을 그 구성상의 특징으로 한다.
본 발명에서 제안하고 있는 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법에 따르면, 행동 분류 분기 및 대조 학습 분기로 구성되는 행동 인식 모델을 사용하되, 데이터 증강 기법을 통해 다양한 변형 골격 그래프를 생성하고, 대조 학습을 통해 같은 행동에 대한 다양한 변형 골격 그래프의 유사도를 학습함으로써, 적은 양의 데이터와 파라미터로도 높은 일반화를 달성할 수 있다.The present invention relates to a skeletal graph-based action recognition system using data augmentation techniques and contrast learning. More specifically, it is an action recognition system that recognizes human behavior using a skeletal graph, and includes action classification branch and contrast learning. A learning unit that learns an action recognition model including a contrastive learning branch; and an action recognition unit that estimates an action recognition probability from a skeleton graph using the action classification branch of the action recognition model learned in the learning unit, wherein the action recognition model is configured to learn a plurality of graphs that perform graph-based learning. The action classification branch includes a module and outputs a probability of action recognition from the input skeleton graph; and a contrast learning branch that generates an augmented graph by applying a data augmentation technique to the original skeletal graph and performs contrast learning using the original skeletal graph and the augmented graph.
In addition, the present invention relates to a skeletal graph-based action recognition method using data augmentation techniques and contrastive learning. More specifically, an action recognition method that recognizes human behavior using a skeletal graph in which each step is performed on a computer, ( 1) A learning step of learning an action recognition model consisting of an action classification branch and a contrastive learning branch; and (2) an action recognition step of estimating an action recognition probability from a skeleton graph using the action classification branch of the action recognition model learned in the learning step, wherein the action recognition model performs graph-based learning. The action classification branch includes a plurality of graph learning modules and outputs a probability of action recognition from an input skeleton graph; and a contrast learning branch that generates an augmented graph by applying a data augmentation technique to the original skeletal graph and performs contrast learning using the original skeletal graph and the augmented graph.
According to the skeletal graph-based action recognition system and method using data augmentation techniques and contrast learning proposed in the present invention, an action recognition model consisting of an action classification branch and a contrast learning branch is used, but various modifications are made through data augmentation techniques. By creating a skeleton graph and learning the similarity of various modified skeleton graphs for the same behavior through contrast learning, high generalization can be achieved even with a small amount of data and parameters.

Description

데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법{SKELETON GRAPH-BASED ACTION RECOGNITION SYSTEM AND METHOD WITH DATA AUGMENTATION AND CONTRASTIVE LEARNING}SKELETON GRAPH-BASED ACTION RECOGNITION SYSTEM AND METHOD WITH DATA AUGMENTATION AND CONTRASTIVE LEARNING}

본 발명은 행동 인식 시스템 및 방법에 관한 것으로서, 보다 구체적으로는 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법에 관한 것이다.The present invention relates to an action recognition system and method, and more specifically, to a skeletal graph-based action recognition system and method using data augmentation techniques and contrastive learning.

행동 인식(Action Recognition)은 연속적인 비디오 프레임을 이용해 사람의 행동을 분류하는 전통적인 연구 주제로 로봇, 헬스케어, 지능형 CCTV 등의 다양한 응용 분야에 적용되고 있다. 컴퓨터 비전 분야의 발전에 따라 영상 분석 기반 행동 인식 관련 연구 또한 꾸준히 증가하고 있다. 최근 딥러닝의 발전과 함께 행동 인식 연구는 크게 골격 데이터와 영상 데이터 기반 행동 인식으로 나뉜다.Action recognition is a traditional research topic that classifies human behavior using consecutive video frames and is being applied to various application fields such as robots, healthcare, and intelligent CCTV. With the advancement of the computer vision field, research on image analysis-based action recognition is also steadily increasing. With the recent development of deep learning, action recognition research is largely divided into action recognition based on skeletal data and image data.

첫 번째로 영상 기반 인식 방법은 딥러닝 모델이 연속적인 영상들만을 이용해 최종 타겟 행동 인식을 수행한다. 이 경우, 일반적으로 영상의 배경이 노이즈로 작용하거나 객체의 폐색(occlusion)으로 인해 성능이 크게 떨어지는 단점이 있다.First, in the image-based recognition method, a deep learning model performs final target action recognition using only continuous images. In this case, there is a disadvantage that performance is generally greatly reduced due to the image background acting as noise or object occlusion.

두 번째는 골격(skeleton) 기반 방식이다. 도 1은 골격 데이터 기반의 행동 인식을 설명하기 위해 도시한 도면이다. 골격 데이터를 기반으로 한 행동 인식은 골격 구조를 그래프로 변형해 신체의 구조적 변화를 바탕으로 행동 인식을 수행한다. 그래프 데이터를 이용하면 실세계의 다양한 노이즈에 영향을 적게 받으면서도 높은 성능을 달성할 수 있지만 골격 데이터를 추출하기 위한 별도의 모델을 필요로 한다는 단점이 있다.The second is a skeleton-based method. Figure 1 is a diagram illustrating skeletal data-based action recognition. Action recognition based on skeletal data transforms the skeletal structure into a graph and performs action recognition based on structural changes in the body. Using graph data, high performance can be achieved while being less affected by various noises in the real world, but it has the disadvantage of requiring a separate model to extract skeletal data.

최신 그래프 기반 행동 인식 신경망인 ST-GCN(Spatial-temporal graph convolutional network)은 행동 데이터의 공간적 정보와 시간적 정보를 결합해 우수한 성능을 보였다. 하지만 같은 동작임에도 자세가 다양하게 표현될 수 있는 자세의 다양성에 대한 학습 방법은 여전히 부족한 한계가 있다.ST-GCN (Spatial-temporal graph convolutional network), the latest graph-based action recognition neural network, showed excellent performance by combining spatial and temporal information of action data. However, there are still limitations in learning methods for the diversity of postures, where postures can be expressed in various ways even in the same movement.

한편, 본 발명과 관련된 선행기술로서, 공개특허 제10-2022-0078893호(발명의 명칭: 영상 속 사람의 행동 인식 장치 및 방법, 공개일자: 2022년 06월 13일) 등이 개시된 바 있다.Meanwhile, as prior art related to the present invention, Patent Publication No. 10-2022-0078893 (title of the invention: Apparatus and method for recognizing human behavior in an image, publication date: June 13, 2022) has been disclosed.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 행동 분류 분기 및 대조 학습 분기로 구성되는 행동 인식 모델을 사용하되, 데이터 증강 기법을 통해 다양한 변형 골격 그래프를 생성하고, 대조 학습을 통해 같은 행동에 대한 다양한 변형 골격 그래프의 유사도를 학습함으로써, 적은 양의 데이터와 파라미터로도 높은 일반화를 달성할 수 있는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법을 제공하는 것을 그 목적으로 한다.The present invention was proposed to solve the above problems of previously proposed methods. It uses an action recognition model consisting of an action classification branch and a contrast learning branch, and generates various modified skeleton graphs through data augmentation techniques. , A skeletal graph-based action recognition system using data augmentation techniques and contrast learning that can achieve high generalization even with a small amount of data and parameters by learning the similarity of various modified skeletal graphs for the same action through contrast learning. The purpose is to provide methods and methods.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템은,A skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to the characteristics of the present invention to achieve the above object,

골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 시스템으로서,An action recognition system that recognizes human actions using a skeleton graph, comprising:

행동 분류 분기(classification branch) 및 대조 학습 분기(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델을 학습하는 학습부; 및A learning unit that learns an action recognition model including an action classification branch and a contrastive learning branch; and

상기 학습부에서 학습 완료된 행동 인식 모델의 상기 행동 분류 분기를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식부를 포함하여 구성되며,It is configured to include an action recognition unit that estimates the probability of action recognition from a skeleton graph using the action classification branch of the action recognition model learned in the learning section,

상기 행동 인식 모델은,The action recognition model is,

그래프 기반의 학습을 하는 복수의 그래프 학습 모듈을 포함하여 구성되며, 입력된 골격 그래프로부터 행동 인식 확률을 출력하는 상기 행동 분류 분기; 및The action classification branch includes a plurality of graph learning modules that perform graph-based learning and outputs a probability of action recognition from the input skeleton graph; and

원본 골격 그래프에 데이터 증강 기법을 적용해 증강된 그래프를 생성하고, 상기 원본 골격 그래프와 증강된 그래프를 사용해 대조 학습을 수행하는 상기 대조 학습 분기를 포함하는 것을 그 구성상의 특징으로 한다.Its structural feature includes generating an augmented graph by applying a data augmentation technique to the original skeletal graph, and including the contrast learning branch that performs contrast learning using the original skeletal graph and the augmented graph.

바람직하게는, 상기 행동 분류 분기는,Preferably, the behavioral classification branch is:

입력된 골격 그래프로부터 골격 정점(node)들 간의 거리를 이용해 상기 골격 그래프의 자세를 학습하고, 프레임마다 달라지는 상기 골격 그래프의 자세를 바탕으로 연속되는 동작의 변화를 학습할 수 있다.The posture of the skeleton graph can be learned using the distance between skeleton vertices (nodes) from the input skeleton graph, and changes in continuous motion can be learned based on the posture of the skeleton graph that changes for each frame.

그래프 기반의 CTR-GC(Channel-wise Topology Refinement Graph Convolution) 모듈을 복수 포함하여 구성될 수 있다.It may be configured to include multiple graph-based CTR-GC (Channel-wise Topology Refinement Graph Convolution) modules.

바람직하게는, 상기 대조 학습 분기는,Preferably, the contrastive learning branch is:

첫 프레임의 원본 골격 그래프로부터 미리 정해진 비율의 정점을 무작위로 선택하고, 선택된 정점의 좌표를 미리 정해진 크기만큼 임의의 이동 방향으로 이동시키는 데이터 증강 기법에 의해 상기 증강된 그래프를 생성할 수 있다.The augmented graph can be generated by a data augmentation technique that randomly selects a predetermined ratio of vertices from the original skeleton graph of the first frame and moves the coordinates of the selected vertices by a predetermined size in a random direction.

더욱 바람직하게는, 상기 대조 학습 분기는,More preferably, the contrastive learning branch is:

상기 첫 프레임 이후의 연속된 프레임에서, 상기 첫 프레임의 원본 골격 그래프에서 선택된 정점과 같은 정점을 상기 미리 정해진 크기만큼 상기 이동 방향으로 이동시켜 상기 연속된 프레임에서의 증강된 그래프를 생성할 수 있다.In successive frames after the first frame, an augmented graph in the successive frames can be generated by moving a vertex, such as a vertex selected in the original skeleton graph of the first frame, in the movement direction by the predetermined size.

더욱 바람직하게는, 상기 미리 정해진 크기는,More preferably, the predetermined size is:

상기 첫 프레임의 원본 골격 그래프에서 정점간 최대 거리의 미리 정해진 비율일 수 있다.It may be a predetermined ratio of the maximum distance between vertices in the original skeleton graph of the first frame.

상기 원본 골격 그래프와 증강된 그래프를 각각 두 개의 계층으로 구성된 GCN(Graph Convolutional Network) 인코더를 통해 학습할 수 있다.The original skeleton graph and the augmented graph can be learned through a GCN (Graph Convolutional Network) encoder composed of two layers each.

바람직하게는, 상기 학습부는,Preferably, the learning unit,

상기 행동 분류 분기의 손실함수인 교차 엔트로피 함수와, 상기 대조 학습 분기에서 원본 골격 그래프와 증강된 그래프의 유사도 학습을 수행하는 손실함수인 대조적 손실함수(Contrastive Loss)의 결합인 최종 손실함수를 사용해 상기 행동 인식 모델을 학습할 수 있다.The final loss function is used as a combination of the cross-entropy function, which is the loss function of the action classification branch, and the contrastive loss function, which is a loss function that performs similarity learning of the original skeleton graph and the augmented graph in the contrastive learning branch. You can learn an action recognition model.

또한, 상기한 목적을 달성하기 위한 본 발명의 특징에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법은,In addition, a skeletal graph-based action recognition method using data augmentation techniques and contrastive learning according to the characteristics of the present invention to achieve the above-mentioned purpose,

컴퓨터에서 각 단계가 수행되는 골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 방법으로서,An action recognition method that recognizes human actions using a skeletal graph in which each step is performed on a computer, comprising:

(1) 행동 분류 분기(classification branch) 및 대조 학습 분기(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델을 학습하는 학습 단계; 및(1) A learning step of learning an action recognition model consisting of an action classification branch and a contrastive learning branch; and

(2) 상기 학습 단계에서 학습 완료된 행동 인식 모델의 상기 행동 분류 분기를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식 단계를 포함하여 구성되며,(2) comprising an action recognition step of estimating the action recognition probability from the skeleton graph using the action classification branch of the action recognition model learned in the learning step,

상기 행동 인식 모델은,The action recognition model is,

바람직하게는, 상기 학습 단계에서는,Preferably, in the learning step,

본 발명에서 제안하고 있는 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법에 따르면, 행동 분류 분기 및 대조 학습 분기로 구성되는 행동 인식 모델을 사용하되, 데이터 증강 기법을 통해 다양한 변형 골격 그래프를 생성하고, 대조 학습을 통해 같은 행동에 대한 다양한 변형 골격 그래프의 유사도를 학습함으로써, 적은 양의 데이터와 파라미터로도 높은 일반화를 달성할 수 있다.According to the skeletal graph-based action recognition system and method using data augmentation techniques and contrast learning proposed in the present invention, an action recognition model consisting of an action classification branch and a contrast learning branch is used, but various modifications are made through data augmentation techniques. By creating a skeleton graph and learning the similarity of various modified skeleton graphs for the same behavior through contrast learning, high generalization can be achieved even with a small amount of data and parameters.

도 1은 골격 데이터 기반의 행동 인식을 설명하기 위해 도시한 도면.
도 2는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템의 구성을 도시한 도면.
도 3은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서, 행동 인식 모델의 구성을 도시한 도면.
도 4는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서, 행동 인식 모델의 구조를 나타낸 도면.
도 5는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템의 행동 인식 모델에서, 대조 학습 분기를 상세하게 도시한 도면.
도 6은 같은 행동임에도 자세가 서로 다르게 표현되는 것을 설명하기 위해 도시한 도면.
도 7은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서, 데이터 증강 기법을 설명하기 위해 도시한 도면.
도 8은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법의 흐름을 도시한 도면.
도 9는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법과 SOTA 모델의 분류 성능을 비교해 도시한 도면.
도 10은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법과 SOTA 모델의 파라미터 총수를 비교해 도시한 도면.1 is a diagram illustrating skeletal data-based action recognition.
Figure 2 is a diagram illustrating the configuration of a skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention.
Figure 3 is a diagram illustrating the configuration of an action recognition model in a skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention.
Figure 4 is a diagram showing the structure of an action recognition model in a skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention.
Figure 5 is a diagram illustrating in detail the contrastive learning branch in the action recognition model of the skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention.
Figure 6 is a diagram to explain that postures are expressed differently even though the same action is performed.
FIG. 7 is a diagram illustrating a data augmentation technique in a skeletal graph-based action recognition system using a data augmentation technique and contrastive learning according to an embodiment of the present invention.
Figure 8 is a diagram illustrating the flow of a skeletal graph-based action recognition method using data augmentation techniques and contrastive learning according to an embodiment of the present invention.
Figure 9 is a diagram comparing the classification performance of a skeletal graph-based action recognition system and method using data augmentation techniques and contrast learning according to an embodiment of the present invention and the SOTA model.
Figure 10 is a diagram comparing the total number of parameters of the SOTA model and a skeletal graph-based action recognition system and method using data augmentation techniques and contrastive learning according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, with reference to the attached drawings, preferred embodiments will be described in detail so that those skilled in the art can easily practice the present invention. However, when describing preferred embodiments of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the same symbols are used throughout the drawings for parts that perform similar functions and actions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’ 되어 있다고 할 때, 이는 ‘직접적으로 연결’ 되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’ 되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’ 한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.Additionally, throughout the specification, when a part is said to be 'connected' to another part, this is not only the case when it is 'directly connected', but also when it is 'indirectly connected' with another element in between. Includes. Additionally, ‘including’ a certain component does not mean excluding other components, but rather including other components, unless specifically stated to the contrary.

본 발명은 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법에 관한 것으로서, 본 발명의 특징에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법은, 메모리 및 프로세서를 포함한 하드웨어에서 기록되는 소프트웨어로 구성될 수 있다. 예를 들어, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법은, 개인용 컴퓨터, 노트북 컴퓨터, 서버 컴퓨터, PDA, 스마트폰, 태블릿 PC 등에 저장 및 구현될 수 있다. 이하에서는 설명의 편의를 위해, 각 단계를 수행하는 주체는 생략될 수 있다.The present invention relates to a skeletal graph-based action recognition system and method using data augmentation techniques and contrastive learning. The skeletal graph-based action recognition system and method using data augmentation techniques and contrastive learning according to the characteristics of the present invention includes a memory and software recorded on hardware including a processor. For example, a skeleton graph-based action recognition system and method using data augmentation techniques and contrastive learning can be stored and implemented in personal computers, laptop computers, server computers, PDAs, smartphones, tablet PCs, etc. Below, for convenience of explanation, the subject performing each step may be omitted.

도 2는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템의 구성을 도시한 도면이다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템은, 행동 분류 분기(classification branch) 및 대조 학습 분기(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델을 학습하는 학습부(10); 및 학습부(10)에서 학습 완료된 행동 인식 모델의 행동 분류 분기를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식부(20)를 포함하여 구성될 수 있다.Figure 2 is a diagram illustrating the configuration of a skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention. As shown in Figure 2, the skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention includes an action classification branch and a contrastive learning branch. A learning unit 10 that learns an action recognition model including; and an action recognition unit 20 that estimates the probability of action recognition from the skeleton graph using the action classification branch of the action recognition model learned in the learning unit 10.

즉, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서는, 학습 시에는 행동 분류 분기(110)와 대조 학습 분기(120)의 두 개의 분기를 사용하지만, 테스트 시에는 행동 분류 분기(110)만을 사용함으로써, 모델 구성에 필요한 파라미터 수를 크게 줄일 수 있다.That is, in the skeletal graph-based action recognition system using data augmentation techniques and contrast learning according to an embodiment of the present invention, two branches, the action classification branch 110 and the contrast learning branch 120, are used during learning. , By using only the behavior classification branch 110 during testing, the number of parameters required to construct the model can be greatly reduced.

도 3은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서, 행동 인식 모델(100)의 구성을 도시한 도면이고, 도 4는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서, 행동 인식 모델(100)의 구조를 나타낸 도면이다. 도 3 및 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템의 행동 인식 모델(100)은, 행동 분류 분기(110) 및 대조 학습 분기(120)를 포함하여 구성될 수 있다.FIG. 3 is a diagram illustrating the configuration of an action recognition model 100 in a skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention, and FIG. 4 is an embodiment of the present invention. This is a diagram showing the structure of the action recognition model 100 in a skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an example. As shown in Figures 3 and 4, the action recognition model 100 of the skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention includes an action classification branch 110 and It may be configured to include a contrast learning branch 120.

보다 구체적으로, 행동 인식 모델(100)은 행동 분류 분기(110)와 대조 학습 분기(120)로 구성되며, 대조 학습 분기(120)의 첫 번째 계층에 데이터 증강 기법이 적용될 수 있다. 두 분기의 마지막에 각각 교차 엔트로피 손실 함수(Cross-Entropy Loss)와 대조 손실 함수(Contrastive Loss)를 계산해 모델 최적화를 수행할 수 있다.More specifically, the action recognition model 100 is composed of an action classification branch 110 and a contrast learning branch 120, and a data augmentation technique may be applied to the first layer of the contrast learning branch 120. At the end of the two quarters, model optimization can be performed by calculating the cross-entropy loss function and contrastive loss function, respectively.

행동 분류 분기(110)는, 그래프 기반의 학습을 하는 복수의 그래프 학습 모듈을 포함하여 구성되며, 입력된 골격 그래프로부터 행동 인식 확률을 출력할 수 있다. 보다 구체적으로, 행동 분류 분기(110)는, 입력된 골격 그래프로부터 골격 정점(node)들 간의 거리를 이용해 골격 그래프의 자세를 학습하고, 프레임마다 달라지는 골격 그래프의 자세를 바탕으로 연속되는 동작의 변화를 학습할 수 있다.The action classification branch 110 is comprised of a plurality of graph learning modules that perform graph-based learning, and can output action recognition probabilities from the input skeletal graph. More specifically, the action classification branch 110 learns the posture of the skeleton graph using the distance between skeleton vertices (nodes) in the input skeleton graph, and changes continuous motion based on the posture of the skeleton graph that changes from frame to frame. You can learn.

도 4의 (a)에 도시된 바와 같이, 행동 분류 분기(110)는, 그래프 기반의 CTR-GC(Channel-wise Topology Refinement Graph Convolution) 모듈을 복수 포함하여 구성될 수 있으며, 보다 구체적으로는 3개의 CTR-GC 모듈로 구성될 수 있다. 연속된 CTR-GC 모듈은 입력 그래프 로부터 골격 정점(node) 간의 거리를 이용해 골격 그래프의 자세를 학습하고, 프레임마다 달라지는 특징들을 바탕으로 연속되는 동작의 변화를 학습할 수 있다. 여기서, m은 프레임에서 관측된 객체의 수, t는 전체 프레임의 수, v와 c는 골격 정점의 수와 좌표를 나타낸다. 전체 CTR-GC 계층을 거쳐 최종 추출된 특징은 마지막 Fully Connected Layer를 통해 행동 인식 확률값 y로 출력된다.As shown in (a) of FIG. 4, the behavior classification branch 110 may be configured to include a plurality of graph-based CTR-GC (Channel-wise Topology Refinement Graph Convolution) modules, and more specifically, 3 It can be composed of CTR-GC modules. Sequential CTR-GC modules use the input graph You can learn the posture of the skeleton graph using the distance between skeleton vertices (nodes), and learn changes in continuous motion based on features that vary from frame to frame. Here, m represents the number of objects observed in a frame, t represents the number of entire frames, and v and c represent the number and coordinates of skeletal vertices. The features finally extracted through the entire CTR-GC layer are output as the action recognition probability value y through the last Fully Connected Layer.

대조 학습 분기(120)는, 원본 골격 그래프에 데이터 증강 기법을 적용해 증강된 그래프를 생성하고, 원본 골격 그래프와 증강된 그래프를 사용해 대조 학습을 수행할 수 있다.The contrastive learning branch 120 may apply a data augmentation technique to the original skeletal graph to generate an augmented graph, and perform contrastive learning using the original skeletal graph and the augmented graph.

도 5는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템의 행동 인식 모델(100)에서, 대조 학습 분기(120)를 상세하게 도시한 도면이다. 도 5에 도시된 바와 같이, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템의 행동 인식 모델(100)의 대조 학습 분기(120)는, 원본 그래프 g와 g로부터 증강된 그래프 , 총 두 개의 그래프가 사용될 수 있다. 이 과정을 통해 같은 행동에서 나타날 수 있는 다양한 자세를 학습할 수 있다.FIG. 5 is a diagram illustrating in detail the contrastive learning branch 120 in the action recognition model 100 of the skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention. As shown in FIG. 5, the contrast learning branch 120 of the action recognition model 100 of the skeletal graph-based action recognition system using data augmentation techniques and contrast learning according to an embodiment of the present invention is the original graph g. Graph augmented from and g , a total of two graphs can be used. Through this process, you can learn various postures that can appear in the same action.

도 6은 같은 행동임에도 자세가 서로 다르게 표현되는 것을 설명하기 위해 도시한 도면이다. 도 6에 도시된 바와 같이, 농구 슈팅이라는 같은 인간 행동에서 영상이 촬영된 카메라 각도나 방향, 행동을 하는 사람의 신체 특성 등에 따라서 골격 그래프는 서로 다르게 표현될 수 있다. 행동 인식 모델(100)이 정확도 높은 인식을 하기 위해서는 이러한 표현의 다양성을 학습해야 하는데, 데이터 증강 기법을 사용하면 필요한 다양한 표현의 학습 데이터를 생성할 수 있으며, 대조 학습을 통해 이러한 데이터를 효과적으로 학습할 수 있다.Figure 6 is a diagram to explain that postures are expressed differently even though the same action is performed. As shown in FIG. 6, in the same human action such as basketball shooting, the skeletal graph may be expressed differently depending on the camera angle or direction in which the image was captured, the physical characteristics of the person performing the action, etc. In order for the action recognition model 100 to recognize with high accuracy, it must learn the diversity of these expressions. Using data augmentation techniques, it is possible to generate training data of the necessary various expressions, and such data can be effectively learned through contrast learning. You can.

대조 학습 분기(120)는, 첫 프레임의 원본 골격 그래프로부터 미리 정해진 비율의 정점을 무작위로 선택하고, 선택된 정점의 좌표를 미리 정해진 크기만큼 임의의 이동 방향으로 이동시키는 데이터 증강 기법에 의해 증강된 그래프를 생성할 수 있다. 대조 학습 분기(120)는, 첫 프레임 이후의 연속된 프레임에서, 첫 프레임의 원본 골격 그래프에서 선택된 정점과 같은 정점을 미리 정해진 크기만큼 이동 방향으로 이동시켜 연속된 프레임에서의 증강된 그래프를 생성할 수 있다. 여기서, 미리 정해진 크기는, 첫 프레임의 원본 골격 그래프에서 정점 간 최대 거리의 미리 정해진 비율일 수 있다.The contrastive learning branch 120 is a graph augmented by a data augmentation technique that randomly selects a predetermined ratio of vertices from the original skeleton graph of the first frame and moves the coordinates of the selected vertices by a predetermined amount in a random direction. can be created. The contrast learning branch 120 generates an augmented graph in consecutive frames by moving vertices, such as those selected in the original skeleton graph of the first frame, in the direction of movement by a predetermined size in consecutive frames after the first frame. You can. Here, the predetermined size may be a predetermined ratio of the maximum distance between vertices in the original skeleton graph of the first frame.

도 7은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서, 데이터 증강 기법을 설명하기 위해 도시한 도면이다. 도 7에 도시된 바와 같이, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템에서는, 첫 프레임으로부터 일부 정점을 무작위로 선택하고 선택된 정점의 좌표를 정해진 크기만큼 이동시키는 증강 방법을 적용할 수 있다. 먼저, 원본 골격 그래프(ORIGINAL)에서 전체 정점들 중에서 일부를 무작위로 선택하는데, 본 실험에서는 전체 정점의 20%에 해당하는 개수의 정점을 선택하였다(Step 1). 원본 골격 그래프에서 전체 정점들 사이의 거리 중 최댓값을 계산하고(Step 2), 최댓값 중 미리 정해진 비율, 여기서는 25%를 이동 거리로 산정하였다(Step 3). Step 1에서 선택된 정점들을 이동 거리만큼 임의의 방향으로 이동시켜 증강된 그래프를 생성할 수 있다(Step 4). 선택된 정점과 이동 거리는 이후의 연속적인 프레임에 동일하게 적용되며, 모든 프레임에서 증강된 그래프를 생성할 수 있다(Step 5).FIG. 7 is a diagram illustrating a data augmentation technique in a skeletal graph-based action recognition system using a data augmentation technique and contrastive learning according to an embodiment of the present invention. As shown in Figure 7, in the skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention, some vertices are randomly selected from the first frame and the coordinates of the selected vertices are set to a predetermined size. An augmentation method that moves as much as possible can be applied. First, some of the vertices are randomly selected from the original skeleton graph (ORIGINAL), and in this experiment, vertices corresponding to 20% of the total vertices were selected (Step 1). The maximum value among the distances between all vertices in the original skeleton graph was calculated (Step 2), and a predetermined percentage of the maximum values, in this case 25%, was calculated as the moving distance (Step 3). An augmented graph can be created by moving the vertices selected in Step 1 in a random direction by the movement distance (Step 4). The selected vertex and movement distance are applied equally to subsequent successive frames, and an augmented graph can be created in every frame (Step 5).

대조 학습 분기(120)는, 원본 골격 그래프와 증강된 그래프를 각각 두 개의 계층으로 구성된 GCN(Graph Convolutional Network) 인코더를 통해 학습할 수 있다. 즉, 도 5에 도시된 바와 같이, 원본 그래프 g와 증강된 그래프 은 각각 GCN (Graph Convolutional Network) 인코더를 통해 학습될 수 있다. 이때, GCN 인코더는 각각 두 개의 계층으로 구성될 수 있다. 각 그래프의 t 프레임은 Temporal Pooling을 통해 벡터로 압축되어 Projection Head에 입력될 수 있다. 전체 대조 학습 분기(120)의 수행 결과 새로운 잠재 공간(Latent space)으로 표상된 벡터 가 출력될 수 있다. 모든 학습이 완료되면 대조 학습 분기(120)를 제외한 행동 분류 분기(110)만을 사용하여 행동 인식 성능 테스트를 수행할 수 있다.The contrast learning branch 120 can learn the original skeleton graph and the augmented graph through a Graph Convolutional Network (GCN) encoder composed of two layers each. That is, as shown in Figure 5, the original graph g and the augmented graph Each can be learned through a GCN (Graph Convolutional Network) encoder. At this time, each GCN encoder may be composed of two layers. The t-frame of each graph can be compressed into a vector through Temporal Pooling and input into the Projection Head. Vector represented as a new latent space as a result of performing the entire contrast learning branch (120) can be output. When all learning is completed, an action recognition performance test can be performed using only the action classification branch (110) excluding the contrast learning branch (120).

한편, 학습부(10)는, 행동 분류 분기(110)의 손실함수인 교차 엔트로피 함수와, 대조 학습 분기(120)에서 원본 골격 그래프와 증강된 그래프의 유사도 학습을 수행하는 손실함수인 대조적 손실함수(Contrastive Loss)의 결합인 최종 손실함수를 사용해 행동 인식 모델(100)을 학습할 수 있다. 즉, 도 4에 도시된 바와 같이, 행동 인식 모델(100)은 단일 골격 그래프에 대해 행동 분류 분기(110)와 대조 학습 분기(120)를 이용한 행동 인식을 수행할 수 있다. 두 가지 분기의 결과를 통합하고 손실함수 L을 계산하기 위해 다음과 같은 손실함수를 사용할 수 있다.Meanwhile, the learning unit 10 uses a cross-entropy function, which is a loss function of the action classification branch 110, and a contrastive loss function, which is a loss function that performs similarity learning between the original skeleton graph and the augmented graph in the contrastive learning branch 120. The action recognition model (100) can be learned using the final loss function, which is a combination of (Contrastive Loss). That is, as shown in FIG. 4, the action recognition model 100 can perform action recognition using the action classification branch 110 and the contrast learning branch 120 for a single skeleton graph. To integrate the results of the two branches and calculate the loss function L, we can use the following loss function:

다음 수학식 1은 행동 분류 분기(110)의 행동 인식 확률값 y에 대해 손실함수를 계산하는 교차 엔트로피 함수이다. n번째 데이터 샘플에 대해 행동 분류 분기(110)의 출력 y(n)와 GT(Ground Truth) 레이블 y^*(n)를 이용해 손실함수 L_CE를 계산할 수 있다.The following equation 1 is a cross-entropy function that calculates a loss function for the action recognition probability value y of the action classification branch 110. For the nth data sample, the loss function L _CE can be calculated using the output y(n) of the action classification branch 110 and the GT (Ground Truth) label y ^* (n).

수학식 2는 대조 학습 분기(120)의 출력인 두 잠재 벡터 z와 의 유사도 학습을 수행할 수 있다. 두 벡터의 유사도 L_Con는 cosin을 이용하며 동일한 행동으로부터 발생할 수 있는 서로 다른 자세의 유사도를 높임으로써 적은 데이터를 이용하더라도 높은 일반화를 달성할 수 있다.Equation 2 gives the two latent vectors z and Similarity learning can be performed. The similarity L _Con of two vectors uses cosin, and by increasing the similarity of different postures that can occur from the same action, high generalization can be achieved even when using small data.

두 손실함수는 다음 수학식 3에 따라 최종 손실함수 L로 결합할 수 있다.The two loss functions can be combined into the final loss function L according to Equation 3 below.

도 8은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법의 흐름을 도시한 도면이다. 도 8에 도시된 바와 같이, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법은, 컴퓨터에서 각 단계가 수행되는 골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 방법으로서, 행동 분류 분기(110) 및 대조 학습 분기(120)를 포함하여 구성되는 행동 인식 모델(100)을 학습하는 학습 단계(S10); 및 학습 단계에서 학습 완료된 행동 인식 모델(100)의 행동 분류 분기(110)를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식 단계(S20)를 포함하여 구성될 수 있다.Figure 8 is a diagram illustrating the flow of a skeletal graph-based action recognition method using data augmentation techniques and contrastive learning according to an embodiment of the present invention. As shown in FIG. 8, the skeletal graph-based action recognition method using data augmentation techniques and contrast learning according to an embodiment of the present invention recognizes human behavior using a skeletal graph in which each step is performed on a computer. As a recognition method, a learning step (S10) of learning an action recognition model (100) consisting of an action classification branch (110) and a contrast learning branch (120); and an action recognition step (S20) in which the action recognition probability is estimated from the skeleton graph using the action classification branch 110 of the action recognition model 100 that has been trained in the learning step.

각각의 단계들과 관련된 상세한 내용들은, 앞서 본 발명의 일실시예에 따른 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템과 관련하여 충분히 설명되었으므로, 상세한 설명은 생략하기로 한다.The details related to each step have been sufficiently described in relation to the skeletal graph-based action recognition system using data augmentation techniques and contrastive learning according to an embodiment of the present invention, according to an embodiment of the present invention. The explanation will be omitted.

실험Experiment

Northwestern-UCLA 데이터셋은 3개의 Kinect 카메라로 다양한 각도에서 동시에 촬영된 3차원 행동 데이터셋이다. 이 데이터셋은 총 1494개의 비디오 클립과 10개의 행동 카테고리로 이루어져 있으며 각각의 행동은 10개의 다른 주제로 구성된다.The Northwestern-UCLA dataset is a 3D behavioral dataset captured simultaneously from various angles with three Kinect cameras. This dataset consists of a total of 1494 video clips and 10 behavior categories, and each behavior consists of 10 different subjects.

본 실험은 Epoch는 65, Training batch size는 16, Test batch size는 64, Learning rate는 0.1, Optimizer는 SGD, Momentum은 0.9로 설정하여 수행되었다.This experiment was performed by setting Epoch to 65, Training batch size to 16, Test batch size to 64, Learning rate to 0.1, Optimizer to SGD, and Momentum to 0.9.

도 9는 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법과 SOTA 모델의 분류 성능을 비교해 도시한 도면이다. 도 9의 실험 결과에서, 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법은 DC-GCN-ADG과 같은 정확성을 나타냈으며, 다른 최신의 비교 방법들에 비해서 대체로 우수한 성능을 보여준다. 본 발명의 행동 인식 모델(100)은 학습 시에는 두 개의 모듈을 사용하지만, 테스트 시에는 하나의 모듈만 사용하므로 모델 구성에 필요한 파라미터 수를 크게 줄일 수 있다는 장점이 있다.Figure 9 is a diagram comparing the classification performance of a SOTA model and a skeletal graph-based action recognition system and method using data augmentation techniques and contrastive learning according to an embodiment of the present invention. In the experimental results of FIG. 9, the skeleton graph-based action recognition system and method using data augmentation techniques and contrastive learning according to an embodiment of the present invention showed the same accuracy as DC-GCN-ADG, and other state-of-the-art comparative methods. It generally shows superior performance compared to others. The action recognition model 100 of the present invention uses two modules during training, but uses only one module during testing, so it has the advantage of greatly reducing the number of parameters required to construct the model.

도 10은 본 발명의 일실시예에 따른 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법과 SOTA 모델의 파라미터 총수를 비교해 도시한 도면이다. 본 발명에서 제시하는 행동 인식 모델(100)은 다른 최신 모델들, 특히 동일한 성능을 보여주는 DC-GCN-ADG 방법에 비해 약 85%, Shift-GCN 방법과 CTR-GCN 방법에 비해서도 각각 28%, 66% 줄어든 파라미터를 사용하였다. 실험 결과로부터 우리는 제안하는 방법이 상대적으로 적은 파라미터 수로도 높은 정확성을 나타낼 수 있음을 알 수 있다. 이는 데이터 증강 기법과 대조 손실함수를 사용하여 일반적인 모델들보다 더 다양한 데이터를 학습할 수 있었기 때문이다.Figure 10 is a diagram comparing the total number of parameters of the SOTA model and a skeletal graph-based action recognition system and method using data augmentation techniques and contrastive learning according to an embodiment of the present invention. The action recognition model 100 presented in the present invention is about 85% faster than other latest models, especially the DC-GCN-ADG method showing the same performance, and 28% and 66% better than the Shift-GCN method and the CTR-GCN method, respectively. % reduced parameters were used. From the experimental results, we can see that the proposed method can show high accuracy even with a relatively small number of parameters. This is because we were able to learn more diverse data than general models by using data augmentation techniques and contrast loss functions.

전술한 바와 같이, 본 발명에서 제안하고 있는 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법에 따르면, 행동 분류 분기(110) 및 대조 학습 분기(120)로 구성되는 행동 인식 모델(100)을 사용하되, 데이터 증강 기법을 통해 다양한 변형 골격 그래프를 생성하고, 대조 학습을 통해 같은 행동에 대한 다양한 변형 골격 그래프의 유사도를 학습함으로써, 적은 양의 데이터와 파라미터로도 높은 일반화를 달성할 수 있다.As described above, according to the skeletal graph-based action recognition system and method using the data augmentation technique and contrast learning proposed in the present invention, the action recognition model consists of the action classification branch 110 and the contrast learning branch 120. (100), but by generating various modified skeleton graphs through data augmentation techniques and learning the similarity of various modified skeleton graphs for the same behavior through contrast learning, high generalization is achieved even with a small amount of data and parameters. can do.

한편, 본 발명은 다양한 통신 단말기로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터에서 판독 가능한 매체를 포함할 수 있다. 예를 들어, 컴퓨터에서 판독 가능한 매체는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD_ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.Meanwhile, the present invention may include a computer-readable medium containing program instructions for performing operations implemented in various communication terminals. For example, computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD_ROM and DVD, and floptical disks. It may include magneto-optical media and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.

이와 같은 컴퓨터에서 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 이때, 컴퓨터에서 판독 가능한 매체에 기록되는 프로그램 명령은 본 발명을 구현하기 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예를 들어, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Such computer-readable media may include program instructions, data files, data structures, etc., singly or in combination. At this time, program instructions recorded on a computer-readable medium may be specially designed and configured to implement the present invention, or may be known and available to those skilled in the computer software art. For example, it may include not only machine language code such as that produced by a compiler, but also high-level language code that can be executed by a computer using an interpreter, etc.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention described above can be modified or applied in various ways by those skilled in the art, and the scope of the technical idea according to the present invention should be determined by the claims below.

10: 학습부
20: 행동 인식부
100: 행동 인식 모델
110: 행동 분류 분기
120: 대조 학습 분기
S10: 학습 단계
S20: 행동 인식 단계10: Learning Department
20: Behavior recognition unit
100: Action recognition model
110: Behavioral classification branch
120: Contrast learning branch
S10: Learning phase
S20: Action recognition phase

Claims

골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 시스템으로서,
행동 분류 분기(110)(classification branch) 및 대조 학습 분기(120)(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델(100)을 학습하는 학습부(10); 및
상기 학습부(10)에서 학습 완료된 행동 인식 모델(100)의 상기 행동 분류 분기(110)를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식부(20)를 포함하여 구성되며,
상기 행동 인식 모델(100)은,
그래프 기반의 학습을 하는 복수의 그래프 학습 모듈을 포함하여 구성되며, 입력된 골격 그래프로부터 행동 인식 확률을 출력하는 상기 행동 분류 분기(110); 및
원본 골격 그래프에 데이터 증강 기법을 적용해 증강된 그래프를 생성하고, 상기 원본 골격 그래프와 증강된 그래프를 사용해 대조 학습을 수행하는 상기 대조 학습 분기(120)를 포함하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
An action recognition system that recognizes human actions using a skeleton graph, comprising:
A learning unit 10 that learns an action recognition model 100 including a classification branch 110 and a contrastive learning branch 120; and
It is configured to include an action recognition unit 20 that estimates the probability of action recognition from a skeleton graph using the action classification branch 110 of the action recognition model 100 learned in the learning unit 10,
The action recognition model 100 is,
The action classification branch 110 includes a plurality of graph learning modules that perform graph-based learning and outputs a probability of action recognition from the input skeleton graph; and
A data augmentation technique comprising a contrast learning branch 120 that generates an augmented graph by applying a data augmentation technique to the original skeleton graph and performs contrast learning using the original skeleton graph and the augmented graph. A skeleton graph-based action recognition system using contrastive learning.

제1항에 있어서, 상기 행동 분류 분기(110)는,
입력된 골격 그래프로부터 골격 정점(node)들 간의 거리를 이용해 상기 골격 그래프의 자세를 학습하고, 프레임마다 달라지는 상기 골격 그래프의 자세를 바탕으로 연속되는 동작의 변화를 학습하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 1, wherein the behavioral classification branch (110) is:
Data augmentation, characterized by learning the posture of the skeleton graph using the distance between skeleton vertices (nodes) from the input skeleton graph and learning changes in continuous motion based on the posture of the skeleton graph that changes for each frame. A skeletal graph-based action recognition system using techniques and contrastive learning.

제1항에 있어서, 상기 행동 분류 분기(110)는,
그래프 기반의 CTR-GC(Channel-wise Topology Refinement Graph Convolution) 모듈을 복수 포함하여 구성되는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 1, wherein the behavioral classification branch (110) is:
A skeletal graph-based action recognition system using data augmentation techniques and contrastive learning, characterized by comprising a plurality of graph-based CTR-GC (Channel-wise Topology Refinement Graph Convolution) modules.

제1항에 있어서, 상기 대조 학습 분기(120)는,
첫 프레임의 원본 골격 그래프로부터 미리 정해진 비율의 정점을 무작위로 선택하고, 선택된 정점의 좌표를 미리 정해진 크기만큼 임의의 이동 방향으로 이동시키는 데이터 증강 기법에 의해 상기 증강된 그래프를 생성하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 1, wherein the contrastive learning branch (120) is:
Characterized in generating the augmented graph by a data augmentation technique that randomly selects a predetermined ratio of vertices from the original skeleton graph of the first frame and moves the coordinates of the selected vertices by a predetermined size in a random movement direction. , A skeletal graph-based action recognition system using data augmentation techniques and contrastive learning.

제4항에 있어서, 상기 대조 학습 분기(120)는,
상기 첫 프레임 이후의 연속된 프레임에서, 상기 첫 프레임의 원본 골격 그래프에서 선택된 정점과 같은 정점을 상기 미리 정해진 크기만큼 상기 이동 방향으로 이동시켜 상기 연속된 프레임에서의 증강된 그래프를 생성하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 4, wherein the contrastive learning branch 120,
In successive frames after the first frame, an augmented graph in the successive frames is generated by moving a vertex, such as a vertex selected in the original skeleton graph of the first frame, in the movement direction by the predetermined size. A skeletal graph-based action recognition system using data augmentation techniques and contrastive learning.

제4항에 있어서, 상기 미리 정해진 크기는,
상기 첫 프레임의 원본 골격 그래프에서 정점간 최대 거리의 미리 정해진 비율인 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 4, wherein the predetermined size is:
An action recognition system based on a skeletal graph using data augmentation techniques and contrastive learning, characterized in that it is a predetermined ratio of the maximum distance between vertices in the original skeletal graph of the first frame.

제1항에 있어서, 상기 대조 학습 분기(120)는,
상기 원본 골격 그래프와 증강된 그래프를 각각 두 개의 계층으로 구성된 GCN(Graph Convolutional Network) 인코더를 통해 학습하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 1, wherein the contrastive learning branch (120) is:
A skeletal graph-based action recognition system using data augmentation techniques and contrastive learning, characterized in that the original skeletal graph and the augmented graph are each learned through a GCN (Graph Convolutional Network) encoder consisting of two layers.

제1항에 있어서, 상기 학습부(10)는,
상기 행동 분류 분기(110)의 손실함수인 교차 엔트로피 함수와, 상기 대조 학습 분기(120)에서 원본 골격 그래프와 증강된 그래프의 유사도 학습을 수행하는 손실함수인 대조적 손실함수(Contrastive Loss)의 결합인 최종 손실함수를 사용해 상기 행동 인식 모델(100)을 학습하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템.
The method of claim 1, wherein the learning unit 10,
A combination of the cross-entropy function, which is the loss function of the action classification branch 110, and the contrastive loss function, which is a loss function that performs similarity learning of the original skeleton graph and the augmented graph in the contrastive learning branch 120. A skeletal graph-based action recognition system using data augmentation techniques and contrast learning, characterized in that the action recognition model (100) is learned using a final loss function.

컴퓨터에서 각 단계가 수행되는 골격 그래프를 사용해 인간 행동을 인식하는 행동 인식 방법으로서,
(1) 행동 분류 분기(110)(classification branch) 및 대조 학습 분기(120)(contrastive learning branch)를 포함하여 구성되는 행동 인식 모델(100)을 학습하는 학습 단계; 및
(2) 상기 학습 단계에서 학습 완료된 행동 인식 모델(100)의 상기 행동 분류 분기(110)를 사용해 골격 그래프로부터 행동 인식 확률을 추정하는 행동 인식 단계를 포함하여 구성되며,
상기 행동 인식 모델(100)은,
그래프 기반의 학습을 하는 복수의 그래프 학습 모듈을 포함하여 구성되며, 입력된 골격 그래프로부터 행동 인식 확률을 출력하는 상기 행동 분류 분기(110); 및
원본 골격 그래프에 데이터 증강 기법을 적용해 증강된 그래프를 생성하고, 상기 원본 골격 그래프와 증강된 그래프를 사용해 대조 학습을 수행하는 상기 대조 학습 분기(120)를 포함하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
An action recognition method that recognizes human actions using a skeletal graph in which each step is performed on a computer, comprising:
(1) a learning step of learning an action recognition model (100) consisting of a classification branch (110) and a contrastive learning branch (120); and
(2) comprising an action recognition step of estimating the action recognition probability from a skeleton graph using the action classification branch 110 of the action recognition model 100 learned in the learning step,
The action recognition model 100 is,
The action classification branch 110 includes a plurality of graph learning modules that perform graph-based learning and outputs a probability of action recognition from the input skeleton graph; and
A data augmentation technique comprising a contrast learning branch 120 that generates an augmented graph by applying a data augmentation technique to the original skeleton graph and performs contrast learning using the original skeleton graph and the augmented graph. A skeleton graph-based action recognition method using contrastive learning.

제9항에 있어서, 상기 행동 분류 분기(110)는,
입력된 골격 그래프로부터 골격 정점(node)들 간의 거리를 이용해 상기 골격 그래프의 자세를 학습하고, 프레임마다 달라지는 상기 골격 그래프의 자세를 바탕으로 연속되는 동작의 변화를 학습하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
The method of claim 9, wherein the behavior classification branch (110) is:
Data augmentation, characterized by learning the posture of the skeleton graph using the distance between skeleton vertices (nodes) from the input skeleton graph and learning changes in continuous motion based on the posture of the skeleton graph that changes for each frame. A skeletal graph-based action recognition method using techniques and contrastive learning.

제9항에 있어서, 상기 행동 분류 분기(110)는,
그래프 기반의 CTR-GC(Channel-wise Topology Refinement Graph Convolution) 모듈을 복수 포함하여 구성되는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
The method of claim 9, wherein the behavior classification branch (110) is:
A skeletal graph-based action recognition method using data augmentation techniques and contrastive learning, characterized by comprising a plurality of graph-based CTR-GC (Channel-wise Topology Refinement Graph Convolution) modules.

제9항에 있어서, 상기 대조 학습 분기(120)는,
첫 프레임의 원본 골격 그래프로부터 미리 정해진 비율의 정점을 무작위로 선택하고, 선택된 정점의 좌표를 미리 정해진 크기만큼 임의의 이동 방향으로 이동시키는 데이터 증강 기법에 의해 상기 증강된 그래프를 생성하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
The method of claim 9, wherein the contrastive learning branch (120),
Characterized in generating the augmented graph by a data augmentation technique that randomly selects a predetermined ratio of vertices from the original skeleton graph of the first frame and moves the coordinates of the selected vertices by a predetermined size in a random movement direction. , A skeletal graph-based action recognition method using data augmentation techniques and contrastive learning.

제12항에 있어서, 상기 대조 학습 분기(120)는,
상기 첫 프레임 이후의 연속된 프레임에서, 상기 첫 프레임의 원본 골격 그래프에서 선택된 정점과 같은 정점을 상기 미리 정해진 크기만큼 상기 이동 방향으로 이동시켜 상기 연속된 프레임에서의 증강된 그래프를 생성하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
The method of claim 12, wherein the contrastive learning branch (120) is:
In successive frames after the first frame, an augmented graph in the successive frames is generated by moving a vertex, such as a vertex selected in the original skeleton graph of the first frame, in the movement direction by the predetermined size. A skeletal graph-based action recognition method using data augmentation techniques and contrastive learning.

제12항에 있어서, 상기 미리 정해진 크기는,
상기 첫 프레임의 원본 골격 그래프에서 정점간 최대 거리의 미리 정해진 비율인 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
The method of claim 12, wherein the predetermined size is:
An action recognition method based on a skeletal graph using data augmentation techniques and contrastive learning, characterized in that it is a predetermined ratio of the maximum distance between vertices in the original skeletal graph of the first frame.

제9항에 있어서, 상기 대조 학습 분기(120)는,
상기 원본 골격 그래프와 증강된 그래프를 각각 두 개의 계층으로 구성된 GCN(Graph Convolutional Network) 인코더를 통해 학습하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.
The method of claim 9, wherein the contrastive learning branch (120),
A skeletal graph-based action recognition method using data augmentation techniques and contrast learning, characterized in that the original skeletal graph and the augmented graph are each learned through a GCN (Graph Convolutional Network) encoder consisting of two layers.

제9항에 있어서, 상기 학습 단계에서는,
상기 행동 분류 분기(110)의 손실함수인 교차 엔트로피 함수와, 상기 대조 학습 분기(120)에서 원본 골격 그래프와 증강된 그래프의 유사도 학습을 수행하는 손실함수인 대조적 손실함수(Contrastive Loss)의 결합인 최종 손실함수를 사용해 상기 행동 인식 모델(100)을 학습하는 것을 특징으로 하는, 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 방법.The method of claim 9, wherein in the learning step,
A combination of the cross-entropy function, which is the loss function of the action classification branch 110, and the contrastive loss function, which is a loss function that performs similarity learning of the original skeleton graph and the augmented graph in the contrastive learning branch 120. A skeletal graph-based action recognition method using data augmentation techniques and contrastive learning, characterized in that the action recognition model 100 is learned using a final loss function.