KR20170112857A

KR20170112857A - Method for recognizing subtle facial expression using deep learning based analysis of micro facial dynamics and apparatus therefor

Info

Publication number: KR20170112857A
Application number: KR1020160063555A
Authority: KR
Inventors: 노용만; 김대회
Original assignee: 한국과학기술원
Priority date: 2016-03-25
Filing date: 2016-05-24
Publication date: 2017-10-12
Also published as: KR102036955B1

Abstract

미세 얼굴 다이나믹의 딥 러닝 분석을 통한 미세 표정 인식 방법 및 장치가 개시된다. 본 발명의 일 실시예에 따른 미세 표정 학습 방법은 입력 비디오에서 미리 정의된 미세 표정들에 대한 프레임들을 추출하고, 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성하는 단계; 및 상기 생성된 공간적인 학습 모델을 이용하여 상기 입력 비디오의 모든 프레임들에 대한 공간적인 특징을 추출하고, 상기 모든 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 단계를 포함한다.Disclosed is a method and apparatus for recognizing a fine facial expression through deep run analysis of fine facial dynamics. The method for learning micro-facial expression according to an exemplary embodiment of the present invention includes the steps of extracting frames for predefined fine expressions in input video and generating a spatial learning model by learning spatial features of the extracted frames ; Extracting spatial features of all frames of the input video using the generated spatial learning model and generating temporal learning models using extracted spatial features for all of the frames, And learning each of the fine expressions.

Description

미세 얼굴 다이나믹의 딥 러닝 분석을 통한 미세 표정 인식 방법 및 장치{METHOD FOR RECOGNIZING SUBTLE FACIAL EXPRESSION USING DEEP LEARNING BASED ANALYSIS OF MICRO FACIAL DYNAMICS AND APPARATUS THEREFOR}BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for recognizing fine facial expressions using a deep running analysis of fine facial dynamics,

본 발명은 미세 표적 인식 기술에 관한 것으로서, 보다 상세하게는 미세 얼굴 다이나믹의 딥 러닝 분석을 통하여 미세 표정을 인식할 수 있는 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a micro-target recognition technology, and more particularly, to a method and apparatus for recognizing micro-facial expression through deep-run analysis of micro-face dynamics.

얼굴 분석은 생체인식(biometrics), 보안(security), 인간-컴퓨터 상호작용(human-computer interaction)부터 최근에는 헬스케어(healthcare), 스마트 홈 제어, 사람의 감정을 이해하고 인지하는 휴먼 센싱(human sensing)까지 매우 폭넓은 분야에서 광범위하게 주목 받고 있다.Facial analysis is based on biometrics, security, human-computer interaction, healthcare, smart home control, human sensing, which understands and recognizes human emotions. sensing) has received wide attention in a wide range of fields.

현재 대부분의 얼굴분석 기법들은 정지 영상(still image)의 정적인(static) 정보를 이용해 개발되어 왔다. 또한 얼굴 모션 분석에 관한 연구도 눈으로 쉽게 관찰되는 식별 가능한(visible) 움직임에 국한되어 왔다.Currently, most face analysis techniques have been developed using static information of still images. Studies on facial motion analysis have also been limited to visible movements that are readily visible to the eye.

하지만, 최근 연구에서 얼굴의 미세한 다이나믹(dynamic) 정보가 얼굴분석에서 중요한 분별력을 제공한다고 알려지고 있다. 미세 얼굴 다이나믹 정보는 육안으로 식별하기 어려운 수 밀리 초의 시간에서 분포한다. 이 미세 얼굴 다이나믹은 의도적 또는 비의도적인 얼굴 근육의 움직임에 의해 발생하는 것으로 얼굴 표정, 얼굴 인식, 얼굴 상태 감지 등의 중요한 정보를 담고 있다.However, in recent research, it is known that the fine dynamic information of face provides important discriminating power in face analysis. Fine facial dynamic information is distributed over a few millisecond time that is difficult to be visually recognized. This micro-facial dynamics is caused by intentional or unintentional movement of the facial muscles and contains important information such as facial expressions, facial recognition, and facial state detection.

특히, 육안으로 식별 불가능한(invisible) 미세 시간 스케일의 다이나믹은 사람 식별에 유용한 고유 특성추출이나 자연스러운(spontaneous) 얼굴 감정인지 등에서 식별 가능한(visible) 영역에서 제공할 수 없는 매우 핵심정인 정보를 제공할 수 있다.In particular, the invisible fine-time-scale dynamics can provide very critical information that can not be provided in the visible region, such as in the extraction of intrinsic features useful for human identification or spontaneous facial emotion. have.

하지만 얼굴 분석에서 그 중요성이 간과되어 왔으며 이 미세 구간 분석을 위해 기존 얼굴 분석 방법들을 적용하는 것은 불가능하다.However, its significance has been overlooked in face analysis, and it is impossible to apply existing facial analysis methods for this micro segment analysis.

따라서, 미세 얼굴 다이나믹 분석을 통해 미세 표정을 인식할 수 있는 방법의 필요성이 대두된다.Therefore, there is a need for a method for recognizing micro facial expressions through micro facial dynamic analysis.

본 발명의 실시예들은, 미세 얼굴 다이나믹의 딥 러닝 분석을 통하여 미세 표정을 인식할 수 있는 방법 및 장치를 제공한다.Embodiments of the present invention provide a method and apparatus for recognizing micro facial expressions through deep run analysis of micro facial dynamics.

구체적으로, 본 발명의 실시예들은 얼굴을 포함하는 비디오에서의 미세 얼굴 다이나믹 특징을 딥 러닝을 활용하여 분석하고 이를 이용하여 얼굴 표정을 인식할 수 있는 방법 및 장치를 제공한다.In particular, embodiments of the present invention provide a method and apparatus for analyzing micro-facial dynamic features in a video including a face by utilizing deep learning and recognizing the facial expression using the same.

본 발명의 일 실시예에 따른 미세 표정 학습 방법은 입력 비디오에서 미리 정의된 미세 표정들에 대한 프레임들을 추출하고, 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성하는 단계; 및 상기 생성된 공간적인 학습 모델을 이용하여 상기 입력 비디오의 프레임들에 대한 공간적인 특징을 추출하고, 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 단계를 포함한다.The method for learning micro-facial expression according to an exemplary embodiment of the present invention includes the steps of extracting frames for predefined fine expressions in input video and generating a spatial learning model by learning spatial features of the extracted frames ; And extracting spatial features of frames of the input video using the generated spatial learning model and generating temporal learning models using extracted spatial features for the frames, Respectively.

상기 공간적인 학습 모델을 생성하는 단계는 분류 에러 최소화 함수, 특징 공간에서 동일 클래스 내 분산 최소화 함수, 표정 상태 분류 에러 최소화 함수, 특징공간에서 표정 상태 내 분산 최소화 함수 및 특징공간에서 표정 상태의 연속성 보존 함수의 5 개의 목적 함수를 이용하여 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 상기 공간적인 학습 모델을 생성할 수 있다.The step of generating the spatial learning model includes a classification error minimization function, a dispersion minimization function in the same class in the feature space, a minimization function of the expression state classification error, a dispersion minimization function in the expression space in the feature space, The spatial learning model can be generated by learning the spatial characteristics of the extracted frames using the five objective functions of the function.

상기 공간적인 학습 모델을 생성하는 단계는 상기 추출된 프레임들에 대한 공간적인 특징에 대하여 CNN(convolutional neural network)을 학습함으로써, 상기 공간적인 학습 모델을 생성할 수 있다.The step of generating the spatial learning model may generate the spatial learning model by learning a CNN (convolutional neural network) with respect to a spatial feature of the extracted frames.

상기 미세 표정들 각각을 학습하는 단계는 재귀 신경망을 기반으로 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 상기 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습할 수 있다.The step of learning each of the fine facial expressions may learn each of the fine facial expressions by generating the temporal learning model using the spatial feature extracted for the frames based on the recursive neural network.

상기 재귀 신경망은 RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함할 수 있다.The recurrent neural network may include at least one of a recurrent neural network (RNN), a gated recurrent unit (GRU), and a long short-term memory (LSTM).

본 발명의 일 실시예에 따른 미세 표정 인식 방법은 미리 정의된 미세 표정들에 대한 공간적인 특징을 학습한 공간적인 학습 모델을 이용하여 비디오의 프레임들에 대한 공간적인 특징을 추출하는 단계; 상기 프레임들에 대해 추출된 공간적인 특징과 미리 학습된 시간적인 학습 모델을 이용한 재귀 신경망 기반으로 상기 미세 표정들 각각에 대한 인식 값을 계산하는 단계; 및 상기 계산된 인식 값에 기초하여 상기 비디오에서의 미세 표정을 인식하는 단계를 포함한다.According to an embodiment of the present invention, there is provided a method for recognizing fine facial expressions, comprising the steps of: extracting spatial features of frames of a video using a spatial learning model that learns spatial features of predefined fine facial expressions; Calculating recognition values for each of the fine facial expressions based on a recursive neural network using spatial features extracted for the frames and a pre-learned temporal learning model; And recognizing a fine facial expression in the video based on the calculated recognized value.

본 발명의 일 실시예에 따른 미세 표정 학습 장치는 입력 비디오에서 미리 정의된 미세 표정들에 대한 프레임들을 추출하고, 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성하는 공간 학습부; 및 상기 생성된 공간적인 학습 모델을 이용하여 상기 입력 비디오의 프레임들에 대한 공간적인 특징을 추출하고, 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 시간 학습부를 포함한다.The micro-view learning apparatus according to an exemplary embodiment of the present invention includes a space for extracting frames for predefined fine facial expressions in input video, and generating a spatial learning model by learning spatial features of the extracted frames Learning department; And extracting spatial features of frames of the input video using the generated spatial learning model and generating temporal learning models using extracted spatial features for the frames, And a time learning unit for learning each of the plurality of learning units.

상기 공간 학습부는 분류 에러 최소화 함수, 특징 공간에서 동일 클래스 내 분산 최소화 함수, 표정 상태 분류 에러 최소화 함수, 특징공간에서 표정 상태 내 분산 최소화 함수 및 특징공간에서 표정 상태의 연속성 보존 함수의 5 개의 목적 함수를 이용하여 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 상기 공간적인 학습 모델을 생성할 수 있다.The spatial learning unit includes a classification error minimization function, a dispersion minimization function in the same class in the feature space, a facial expression classification error minimization function, a dispersion minimization function in a facial expression in a feature space, and five objective functions The spatial learning model for the extracted frames can be generated by learning the spatial characteristics of the extracted frames.

상기 공간 학습부는 상기 추출된 프레임들에 대한 공간적인 특징에 대하여 CNN(convolutional neural network)을 학습함으로써, 상기 공간적인 학습 모델을 생성할 수 있다.The spatial learning unit can generate the spatial learning model by learning a CNN (convolutional neural network) with respect to spatial features of the extracted frames.

상기 시간 학습부는 재귀 신경망을 기반으로 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 상기 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습할 수 있다.The temporal learning unit may generate the temporal learning model using the spatial features extracted for the frames based on the recursive neural network, thereby learning each of the fine facial expressions.

본 발명의 일 실시예에 따른 미세 표정 인식 장치는 미리 정의된 미세 표정들에 대한 공간적인 특징을 학습한 공간적인 학습 모델을 이용하여 비디오의 프레임들에 대한 공간적인 특징을 추출하는 추출부; 상기 프레임들에 대해 추출된 공간적인 특징과 미리 학습된 시간적인 학습 모델을 이용한 재귀 신경망 기반으로 상기 미세 표정들 각각에 대한 인식 값을 계산하는 계산부; 및 상기 계산된 인식 값에 기초하여 상기 비디오에서의 미세 표정을 인식하는 인식부를 포함한다.An apparatus for recognizing fine facial expressions according to an embodiment of the present invention includes an extractor for extracting spatial features of frames of a video using a spatial learning model that learns spatial features of predefined fine facial expressions; A calculation unit for calculating a recognition value for each of the fine expressions based on a recursive neural network using spatial features extracted for the frames and a pre-learned temporal learning model; And a recognition unit for recognizing a fine expression in the video based on the calculated recognition value.

본 발명의 실시예들에 따르면, 얼굴을 포함하는 비디오에서의 미세 얼굴 다이나믹 특징을 딥 러닝을 활용하여 분석하고 이를 이용하여 얼굴 표정을 인식할 수 있다.According to embodiments of the present invention, fine facial dynamic features in a video including a face can be analyzed by utilizing deep learning and facial expressions can be recognized using the deep facial expressions.

본 발명의 실시예들에 따르면, 효율적인 미세 얼굴 표정 비디오 인식 시스템 프레임워크를 구성할 수 있다.According to embodiments of the present invention, an efficient micro facial expression video recognition system framework can be constructed.

본 발명의 실시예들에 따르면, 얼굴의 미세 다이나믹 특징을 모델링하여 활용할 수 있기 때문에 성능 측면에서 효과적인 표정 인식을 수행할 수 있다.According to the embodiments of the present invention, it is possible to model and utilize the fine dynamic feature of the face, so that the face recognition can be effectively performed in terms of performance.

본 발명의 실시예들에 따르면, 사람의 미세 움직임까지 분석 포착하여 의학, 심리학, 인간-컴퓨터 상호작용 및 멀티미디어, 엔터테인먼트, 휴먼 센싱 등의 다양한 분야에서 폭넓게 응용될 수 있다.According to embodiments of the present invention, human movement can be analyzed and captured to be widely used in various fields such as medicine, psychology, human-computer interaction, multimedia, entertainment, human sensing,

도 1은 본 발명의 일 실시예에 따른 미세 얼굴 다이나믹의 딥 러닝 분석을 통한 미세 표정 학습 방법을 설명하기 위한 예시도를 나타낸 것이다.
도 2는 본 발명에 따른 방법에서 표정 상태 강조 학습 방법의 목적함수에 대한 개념적인 예시도를 나타낸 것이다.
도 3은 본 발명에 따른 방법에서 재귀 신경망 기반 다이나믹 시퀀스 분석에 대한 개념적인 예시도를 나타낸 것이다.
도 4는 본 발명의 일 실시예에 따른 미세 표정 인식 방법에 대한 동작 흐름도를 나타낸 것이다.
도 5는 본 발명의 일 실시예에 미세 표정 학습 장치에 대한 구성을 나타낸 것이다.
도 6은 본 발명의 일 실시예에 미세 표정 인식 장치에 대한 구성을 나타낸 것이다.FIG. 1 is a view illustrating an example of a micro-facial expression learning method through analysis of deep-running dynamics of micro-facial dynamics according to an embodiment of the present invention.
Fig. 2 is a conceptual illustration of an objective function of the facial expression emphasis learning method in the method according to the present invention.
Figure 3 shows a conceptual illustration of a recursive neural network based dynamic sequence analysis in a method according to the present invention.
FIG. 4 is a flowchart illustrating an operation of a micro facial expression recognition method according to an embodiment of the present invention.
FIG. 5 shows a configuration of a micro-facsimile learning apparatus according to an embodiment of the present invention.
FIG. 6 illustrates a configuration of a micro facial expression recognizing apparatus according to an embodiment of the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. In addition, the same reference numerals shown in the drawings denote the same members.

본 발명의 실시예들은, 얼굴을 포함하는 비디오에서의 미세 얼굴 다이나믹 특징을 딥 러닝을 활용하여 분석하고 이를 이용하여 얼굴 표정을 효율적으로 인식하고자 하는 것을 그 요지로 한다.Embodiments of the present invention are intended to analyze fine facial dynamic features in video including a face by utilizing deep learning and to efficiently recognize a facial expression using the analysis.

도 1은 본 발명의 일 실시예에 따른 미세 얼굴 다이나믹의 딥 러닝 분석을 통한 미세 표정 학습 방법을 설명하기 위한 예시도를 나타낸 것이다.FIG. 1 is a view illustrating an example of a micro-facial expression learning method through analysis of deep-running dynamics of micro-facial dynamics according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 미세 표정 학습 방법은 표정 상태 강조 학습(expression state emphasized learning) 과정과 재귀 신경망 기반 다이나믹 시퀀스 분석(dynamic sequence analysis using recurrent neural network) 과정으로 구성되며, 이에 대해 설명하면 다음과 같다.As shown in FIG. 1, the micro-facial expression learning method according to an embodiment of the present invention includes an expression state emphasized learning process and a dynamic sequence analysis using a dynamic sequence analysis (dynamic neural network) This will be described as follows.

표정 상태 강조 학습 과정은 CNN(convolution neural network)에서 표정 상태 강조 학습 과정을 수행할 수 있으며, 각 입력 비디오 예를 들어, long video, short video 각각에서 미리 정의된 5 단계 표정 상태 예를 들어, onset, onset-to-apex, apex, apex-to-offset, offset에 해당하는 영상들만 샘플링하여 학습할 수 있다.The facial expression emphasis learning process can perform a facial expression emphasis learning process in a CNN (convolution neural network). In each input video, for example, a five-step expression state predefined in each of a long video and a short video, , onset-to-apex, apex, apex-to-offset, and offset.

여기서, 표정 상태 강조 학습 과정은 입력 비디오에서 5 단계 표정 상태에 대한 프레임들을 추출하고, 추출된 프레임들에 대한 공간적인 특징에 대하여 CNN을 학습함으로써, 공간적인 학습 모델을 생성할 수 있다.Here, the facial expression emphasis learning process can generate a spatial learning model by extracting frames for a fifth-step facial expression in the input video and learning CNN on spatial features of the extracted frames.

본 발명에 따른 학습 방법에서는 분류 에러 최소화 함수, 특징 공간에서 동일 클래스 내 분산 최소화 함수, 표정 상태 분류 에러 최소화 함수, 특징공간에서 표정 상태 내 분산 최소화 함수 및 특징공간에서 표정 상태의 연속성 보존 함수의 5 개의 목적 함수를 이용하여 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성할 수 있으며, 이러한 5개의 목적 함수에 대해서는 도 2에서 설명한다.In the learning method according to the present invention, the classification error minimization function, the dispersion minimization function in the same class in the feature space, the minimization function of the expression state classification error, the dispersion minimization function in the expression space in the feature space, and the continuity preservation function A spatial learning model can be generated by learning the spatial characteristics of extracted frames using the objective function of FIG. 2. These five objective functions are described in FIG.

재귀 신경망 기반 다이나믹 시퀀스 분석 과정은 표정 상태 강조 학습 과정에 의해 생성된 공간적인 학습 모델을 이용하여 입력 비디오의 모든 프레임들에 대한 공간적인 특징을 추출하고, 모든 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 미세 표정들 각각을 학습한다.The recursive neural network based dynamic sequence analysis process extracts the spatial features of all frames of the input video using the spatial learning model generated by the facial expression emphasis learning process and extracts the extracted spatial features To generate a temporal learning model, thereby learning each of the fine expressions.

여기서, 재귀 신경망 기반 다이나믹 시퀀스 분석 과정은 재귀 신경망을 기반으로 모든 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 미세 표정들 각각을 학습할 수 있으며, 재귀 신경망은 RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함할 수 있다.Here, the recursive neural network-based dynamic sequence analysis process can learn each of the fine expressions by generating a temporal learning model using the extracted spatial features for all the frames based on the recursive neural network, a recurrent neural network, a gated recurrent unit (GRU), and a long short-term memory (LSTM).

재귀 신경망 기반 다이나믹 시퀀스 분석 과정은 공간 정보가 학습된 결과를 시간축으로 재귀 신경망 기반으로 학습하는 것이다.The recursive neural network based dynamic sequence analysis process is based on recursive neural network based learning of spatial information.

이러한 표정 상태 강조 학습 과정과 재귀 신경망 기반 다이나믹 시퀀스 분석 과정에 대해 상세히 설명하면 다음과 같다.The facial expression emphasis learning process and the recursive neural network based dynamic sequence analysis process will be described in detail as follows.

1. 표정 상태 강조 학습1. Expression state emphasis learning

표정 상태 강조 학습 단계에서는 미세 얼굴 모션 변화를 분석 가능하게 하는 특징을 딥 러닝(또는 학습) 기술을 통해 데이터 자체에서 학습한다.Facial expression emphasis In the learning phase, the feature that enables analysis of fine facial motion changes is learned in the data itself through a deep learning (or learning) technique.

이 때, 본 발명에서는 미세한 움직임의 변화에 분별력이 있게 하기 위해 각 표정 별로 5 단계의 표정 상태(expression state) 예를 들어, onset, onset-to-apex, apex, apex-to-offset, offset을 정의할 수 있다.In this case, in order to provide a discriminating power for a fine movement change, the expression states of five levels, for example, onset, onset-to-apex, apex, apex-to- Can be defined.

본 발명에서의 첫 번째 네트워크 학습 시에는 각 표정 비디오에서 5 단계 표정 상태에 해당하는 영상들만(또는 프레임들만) 샘플링하여 학습하고, 테스트 또는 두 번째 단계인 재귀 신경망(RNN; recurrent neural network) 기반 다이나믹 시퀀스 분석을 위한 다이나믹 특징 추출 시에는 입력 비디오의 모든 프레임을 사용할 수 있다.In the first network learning in the present invention, only the images (or frames) corresponding to the fifth-stage facial expression state are sampled and learned in each facial expression video, and a test or a second-stage recurrent neural network (RNN) All frames of the input video can be used for dynamic feature extraction for sequence analysis.

또한, 본 발명의 실시예에서는 특징 공간에서 움직임 상태 사이의 차이를 크게 하기 위하여, 5개의 목적 함수(objective function)를 사용할 수 있으며, 5 개의 목적 함수에 대하여 도 2를 참조하여 설명한다.In addition, in the embodiment of the present invention, five objective functions can be used to increase the difference between the motion states in the feature space, and the five objective functions will be described with reference to FIG.

도 2는 본 발명에 따른 방법에서 표정 상태 강조 학습 방법의 목적함수에 대한 개념적인 예시도를 나타낸 것으로, 도 2에 도시된 바와 같이, 5 개의 목적 함수는 분류 에러 최소화(E1), 특징 공간에서 동일 클래스 내 분산 최소화(E2), 표정 상태 분류 에러 최소화(E3), 특징공간에서 표정 상태 내 분산 최소화(E4) 및 특징공간에서 표정 상태의 연속성 보존(E5)일 수 있다.FIG. 2 is a conceptual illustration of an objective function of a facial expression emphasis learning method in the method according to the present invention. As shown in FIG. 2, five objective functions are classified error minimization (E1) (E2), minimizing facial expression classification error (E3), minimizing dispersion in facial expression (E4) in feature space, and preserving continuity of facial expression state (E5) in feature space.

도 2에 도시된 각 색깔은 표정의 종류를 의미하고, 모양은 표정 상태를 의미할 수 있으며, 각 함수에 대해 설명하면 다음과 같다.Each color shown in FIG. 2 indicates the type of facial expression, and the shape can mean a facial expression state. Each function will be described as follows.

분류 에러 최소화(E1)(minimizing expression classification error) 함수는 각 미세 표정별 즉, 클래스별 분류를 하는데 있어서 그 분류 에러를 최소화하기 위한 함수로서, 분류 에러 최소화 함수는 아래 <수학식 1>과 같이 나타낼 수 있다.The minimization expression classification error (E1) function is a function for minimizing the classification error in classification of each micro facial expression, that is, class, and the classification error minimization function is expressed by Equation (1) below .

[수학식 1][Equation 1]

여기서, c는 클래스 인덱스를 의미하고, i는 트레이닝 샘플의 인덱스를 의미하며,

는 해당 샘플의 참 값(샘플 i의 클래스가 c일때만 1이고 그 외에는 0)을 의미하고,

는 각 클래스 c에 대해 추정된 확률 값을 의미할 수 있다.Here, c denotes a class index, i denotes an index of a training sample,

Denotes the true value of the corresponding sample (1 only when the class of the sample i is c, and 0 otherwise)

Can be an estimated probability value for each class c.

특징 공간에서 동일 클래스 내 분산 최소화(E2)(minimizing intra-class variation) 함수는 아래 <수학식 2>와 같이 나타낼 수 있다.The minimizing intra-class variation (E2) function in the same class in the feature space can be expressed as Equation (2) below.

[수학식 2]&Quot; (2) "

여기서,

는 샘플

에 대한 특징 벡터를 의미하고,

는 클래스 c에 속한 학습 샘플들의 특징 벡터의 평균 벡터를 의미하며,

은 자기와 다른 클래스 중 가장 가까운 클래스와의 거리의 절반을 의미할 수 있다.here,

The sample

A feature vector for a < RTI ID = 0.0 >

Denotes an average vector of feature vectors of training samples belonging to class c,

Can mean half the distance between itself and the closest class of the other classes.

표정 상태 분류 에러 최소화(E3)(minimizing expression state classification error) 함수는 아래 <수학식 3>과 같이 나타낼 수 있다.The minimizing expression state classification error (E3) function can be expressed as Equation (3) below.

[수학식 3]&Quot; (3) "

여기서, p 는 표정 상태 인덱스를 의미하고,

는 해당 샘플의 표정 상태 참값(샘플 i의 표정 상태 인덱스가 p 일때만 1이고 그 외에는 0)을 의미하며,

는 각 표정 상태 p 에 대해 추정된 확률값을 의미할 수 있다.Here, p represents a facial expression index,

Denotes a true value of a facial expression state of the corresponding sample (1 only when the expression index of the sample i is p, and 0 otherwise)

May be an estimated probability value for each facial expression p.

특징공간에서 표정 상태 내 분산 최소화(E4)(minimizing expression state variation) 함수는 아래 <수학식 4>와 같이 나타낼 수 있다.The function minimizing expression state variation (E4) in the feature space can be expressed as Equation (4) below.

[수학식 4]&Quot; (4) "

여기서,

는 표정 클래스 c의 표정 상태 p에 속한 학습 샘플들의 특징 벡터의 평균 벡터를 의미하고,

는 표정 상태의 분포 범위를 결정하는 파라미터를 의미할 수 있다.here,

Denotes an average vector of the feature vectors of the learning samples belonging to the expression state p of the facial expression class c,

May mean a parameter that determines the distribution range of the facial expression state.

상술한 E3, E4를 통해 표정 상태간의 차이를 강조하여 학습할 수 있지만, 인접 프레임간의 특징의 연속성은 보장되지 않는다. 특징공간에서 표정 상태의 연속성 보존(E5)(preserving expression state continuity) 함수는 학습에 사용된 5단계의 표정 상태 중 2개의 표정 상태 사이에 존재하는 표정들을 특징 공간에서도 2개의 표정 상태 사이에 존재하게 만들어 주며, 이는 두 번째 단계의 다이나믹 시퀀스 분석과 연관이 있다. 예를 들어, E5 함수는 apex-to-offset과 offset 사이에 존재하는 프레임에 대해서는, apex-to-offset과 offset의 특징 공간 사이에 존재하도록 만들어줄 수 있다.Although E3 and E4 described above can emphasize the difference between the facial expression states, the continuity of features between adjacent frames is not guaranteed. The preserving expression state continuity (E5) function preserves the expressions existing between the two facial expressions of the five facial expressions used in the learning in the feature space between the two facial expressions , Which is related to the second stage dynamic sequence analysis. For example, the E5 function can be made to exist between the apex-to-offset and offset feature spaces for frames between apex-to-offset and offset.

특징공간에서 표정 상태의 연속성 보존(E5) 함수는 아래 <수학식 5>와 같이 나타낼 수 있다.The continuity preservation (E5) function of the facial expression state in the feature space can be expressed as Equation (5) below.

[수학식 5]&Quot; (5) "

표정 상태 강조 학습 과정에 의해 학습된 다이나믹 특징 또는 공간적인 특징은 특징 공간에서 모션 상태에 따른 차이를 크게 함으로써 두 번째 단계의 재귀 신경망을 통한 다이나믹 시퀀스 분석을 더 용이하게 하며, 다이나믹 시퀀스 분석에 대해 설명하면 다음과 같다.The dynamic feature or spatial feature learned by the facial expression emphasis learning process makes it easier to analyze the dynamic sequence through the second stage reflexive neural network by making the difference according to the motion state in the feature space easier, Then,

2. 재귀 신경망 기반 다이나믹 시퀀스 분석2. Recursive neural network-based dynamic sequence analysis

첫 번째 단계에서 추출한 얼굴 특징은 각 프레임별 미세 모션만 분석이 되었기 때문에 전체 비디오에서 시간 변화에 따른 미세 모션의 변화를 분석할 필요가 있다. 이를 위해 두 번째 단계에서는 재귀 신경망 기반 얼굴 다이나믹 모델링 및 분석 방법을 수행한다.Since the facial features extracted from the first step are analyzed only for the fine motion of each frame, it is necessary to analyze the change of the fine motion according to the time change in the whole video. To do this, the second step is to perform facial dynamic modeling and analysis based on recursive neural network.

재귀 신경망 기반 다이나믹 시퀀스 분석은 재귀 신경망을 활용하여 일련의 순차적인 입력 프레임으로부터 미세 모션에 나타나는 다양한 특징 변화를 모델링한다. Recursive neural network-based dynamic sequence analysis models various feature changes in fine motion from a sequence of sequential input frames using recursive neural networks.

여기서, 본 발명에서의 재귀 신경망은 simple RNN, GRU(gated recurrent unit), LSTM(long short-term memory) 등이 사용될 수 있으며, 도 3에 도시된 재귀 신경망 기반 다이나믹 시퀀스 분석은 LTSM을 활용한 예를 나타낸 것이다.Here, the recursive neural network in the present invention can be a simple RNN, a gated recurrent unit (GRU), a long short-term memory (LSTM) or the like. The recursive neural network based dynamic sequence analysis shown in FIG. .

예를 들어, 도 3에 도시된 바와 같이 재귀 신경망 기반 다이나믹 시퀀스 분석은 표정 상태가 강조된 공간적인 학습 모델(expression state emphasized CNN model)을 이용하여 입력 비디오의 모든 프레임들(onset 내지 offset)에 대한 공간적인 특징을 추출하고, 모든 프레임들에 대해 추출된 공간적인 특징들을 재귀 신경망 예를 들어, LTSM을 이용하여 시간적인 학습 모델을 생성함으로써, 미세 표정들 각각을 학습할 수 있다.For example, as shown in FIG. 3, the recursive neural network-based dynamic sequence analysis is performed by using a spatial learning model (expression state emphasized CNN model) And extracts spatial features of all frames using a recursive neural network, for example, an LTSM to generate a temporal learning model, thereby learning each of the facial expressions.

상술한 과정에 의해 학습된 공간적인 학습 모델과 시간적인 학습 모델은 미세 표정을 인식하고자 하는 비디오의 미세 표정을 인식하는데 사용될 수 있다.The spatial learning model and the temporal learning model learned by the above process can be used to recognize the micro-facial expression of the video to recognize the micro-facial expression.

이와 같이, 본 발명의 실시예에 따른 미세 표정 학습 방법은 첫번째 단계에서 학습된 표정 상태가 강조된 공간적인 학습 모델을 통해 비디오의 모든 프레임의 특징을 추출하고, 재귀 신경망을 기반으로 모든 프레임간 시간 변화를 학습함으로써, 시간적인 학습 모델을 생성하고, 이를 통해 미세 표정들을 학습할 수 있다.As described above, in the micro-facial expression learning method according to the embodiment of the present invention, the feature of all frames of the video is extracted through the spatial learning model in which the facial expression state learned in the first step is emphasized, , A temporal learning model can be generated and the microphotographs can be learned through this.

이러한 미세 표정 학습 방법에 의해 생성된 공간적인 학습 모델과 시간적인 학습 모델은 미세 표정을 인식하고자 하는 비디오에서 미세 표정을 인식하는데 사용될 수 있으며, 이에 대해 도 4를 참조하여 설명하면 다음과 같다.The spatial learning model and the temporal learning model generated by the micro-facial expression learning method can be used to recognize the micro-facial expression in the video to recognize the micro-facial expression, and will be described with reference to FIG.

도 4는 본 발명의 일 실시예에 따른 미세 표정 인식 방법에 대한 동작 흐름도를 나타낸 것으로, 도 1 내지 도 3에서 설명한 미세 표정 학습 방법에 의해 생성된 학습 모델들을 이용하여 미세 표정을 인식하는 방법에 대한 동작 흐름도를 나타낸 것이다.4 is a flowchart illustrating a method of recognizing a micro facial expression according to an exemplary embodiment of the present invention. Referring to FIGS. 1 to 3, a method of recognizing a micro facial expression using learning models generated by the micro facial expression learning method As shown in FIG.

도 4를 참조하면, 본 발명의 실시예에 따른 미세 표정 인식 방법은 미리 학습된 공간적인 학습 모델을 이용하여 미세 표정을 인식하고자 하는 비디오의 모든 프레임들 예를 들어, onset 프레임 내지 offset 프레임 각각에 대한 공간적인 특징들을 추출한다(S410).Referring to FIG. 4, the method of recognizing fine facial expressions according to an embodiment of the present invention includes a step of recognizing fine expressions of all frames of video, for example, onset frames and offset frames (S410).

단계 S410에 의해 모든 프레임들 각각의 공간적인 특징이 추출되면 모든 프레임들에 대해 추출된 공간적인 특징과 미리 학습된 시간적인 학습 모델을 이용한 재귀 신경망 기반으로 미리 정의된 미세 표정들 각각의 인식 값을 계산한다(S420).If the spatial feature of each frame is extracted in step S410, the recognition value of each of the predefined fine expressions based on the recursive neural network based on the extracted spatial feature for all the frames and the pre-learned temporal learning model (S420).

여기서, 미세 표정들 각각의 인식 값은 미세 표정들 각각에 대한 확률 값일 수 있다.Here, the recognition value of each of the microscopic expressions may be a probability value for each of the microscopic expressions.

단계 S420에 의해 미세 표정들 각각에 대한 인식 값이 계산되면 계산된 인식 값에 기초하여 해당 비디오의 미세 표정을 인식한다(S430).If the recognition value for each of the fine facial expressions is calculated in step S420, the micro facial expression of the corresponding video is recognized based on the calculated recognition value (S430).

여기서, 단계 S430은 미세 표정들 각각에 대해 계산된 인식 값 중 가장 큰 값을 가지는 미세 표정을 해당 비디오의 미세 표정으로 인식할 수 있다.Here, in step S430, the micro-facial expression having the largest value among the recognized facial values calculated for each micro facial expression can be recognized as a micro facial expression of the corresponding video.

이러한 미세 표정 인식 과정에 대해 도 3을 참조하여 설명하면, 미세 표정을 인식하고자 하는 비디오의 모든 프레임들(onset 내지 offset)에 대한 공간적인 특징을 표정 상태가 강조되어 미리 학습된 공간적인 학습 모델(expression state emphasized CNN model)을 이용하여 추출하고, 추출된 공간적인 특징들에 대하여 미세 표정들 각각에 대해 미리 학습된 시간적인 학습 모델과 재귀 신경망 여기서는 LSTM을 이용하여 가장 높은 인식 값을 가지는 미세 표정(analysis result)을 해당 비디오의 미세 표정으로 인식한다.Referring to FIG. 3, the micro-facial recognition process will be described with reference to FIG. 3, wherein spatial features of all frames (onset to offset) of a video to be recognized as a micro- Expression state emphasized CNN model), and the learned temporal learning model and recurrent neural network for each of the extracted features. Here, analysis result as a micro facial expression of the corresponding video.

이와 같이, 본 발명에 따른 미세 표정 인식 방법은 육안으로 식별 못하는 미세 얼굴 움직임을 신경회로망을 구성함으로써, 사람 식별에 유용한 고유 특성 추출이나 자연스러운(spontaneous) 얼굴 감정 인지 등에서 미세 표정을 용이하게 추출할 수 있다.As described above, the fine facial expression recognition method according to the present invention can easily extract a fine facial expression by extracting intrinsic characteristics useful for human identification or spontaneous facial emotion by constructing a neural network that can not be recognized by the naked eye have.

특히, 본 발명에 따른 방법들은 표정 상태를 고려한 학습 방법과 미세 얼굴 다이나믹의 딥 러닝 분석을 통한 미세 표정 인식 프레임워크를 제공함으로써, 미세 표정을 인식할 수 있다.Particularly, the methods according to the present invention can recognize a micro-facial expression by providing a micro-facial expression recognition framework through a learning method considering a facial expression state and a deep-run analysis of micro-facial dynamics.

구체적으로, 본 발명은 표정의 공간정보가 학습된 결과를 시간축에 따른 재귀 신경망 기반의 학습 방법으로 제공하고, 이렇게 학습된 방법에 의해 생성된 학습 모델을 이용하여 비디오의 미세 표정을 인식할 수 있다.Specifically, the present invention provides a recursive neural network-based learning method based on a temporal axis of a learning result of spatial information of a facial expression, and can recognize a micro facial expression of a video using a learning model generated by the learned method .

즉, 본 발명에 따른 미세 표정 인식 방법은 얼굴 공간정보 및 시간정보, 모션정보를 융합하여 미세 표정을 인식할 수 있고, 이러한 본 발명을 통해 미세 얼굴 다이나믹을 고려한 효과적인 표정 인식을 수행할 수 있다.That is, the method of recognizing the micro facial expression according to the present invention can recognize the micro facial expression by fusing the facial spatial information, the time information, and the motion information, and can perform the effective facial expression recognition considering the micro facial dynamics.

도 5는 본 발명의 일 실시예에 미세 표정 학습 장치에 대한 구성을 나타낸 것으로, 도 1 내지 도 3의 방법을 수행하는 장치에 대한 구성을 나타낸 것이다.FIG. 5 illustrates a configuration of a micro-facsimile learning apparatus according to an exemplary embodiment of the present invention, and shows a configuration of an apparatus for performing the methods of FIGS. 1 to 3.

도 5를 참조하면, 본 발명의 실시예에 따른 미세 표정 학습 장치(500)는 공간 학습부(510) 및 시간 학습부(520)를 포함한다.5, the micro-pedestal-learning apparatus 500 according to the embodiment of the present invention includes a spatial learning unit 510 and a time learning unit 520.

공간 학습부(510)는 입력 비디오에서 미리 정의된 미세 표정들에 대한 프레임들을 추출하고, 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성한다.The spatial learning unit 510 extracts frames for predefined fine facial expressions in the input video, and learns spatial features of the extracted frames to generate a spatial learning model.

이 때, 공간 학습부(510)는 분류 에러 최소화 함수, 특징 공간에서 동일 클래스 내 분산 최소화 함수, 표정 상태 분류 에러 최소화 함수, 특징공간에서 표정 상태 내 분산 최소화 함수 및 특징공간에서 표정 상태의 연속성 보존 함수의 5 개의 목적 함수를 이용하여 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성할 수 있으며, 추출된 프레임들에 대한 공간적인 특징에 대하여 CNN을 학습함으로써, 공간적인 학습 모델을 생성할 수 있다.At this time, the spatial learning unit 510 includes a classification error minimization function, a dispersion minimization function in the same class in the feature space, a minimization function of the facial expression classification error, a dispersion minimization function in the facial expression state in the feature space, By using the five objective functions of the function, it is possible to create a spatial learning model by learning the spatial features of the extracted frames. By learning CNN on the spatial characteristics of the extracted frames, You can create a model.

시간 학습부(520)는 생성된 공간적인 학습 모델을 이용하여 입력 비디오의 모든 프레임들에 대한 공간적인 특징을 추출하고, 모든 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 미세 표정들 각각을 학습한다.The temporal learning unit 520 extracts spatial features of all the frames of the input video using the generated spatial learning model and generates a temporal learning model using the extracted spatial features for all the frames Thereby learning each of the fine expressions.

이 때, 시간 학습부(520)는 재귀 신경망을 기반으로 모든 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 미세 표정들 각각을 학습할 수 있으며, 재귀 신경망은 RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함할 수 있다.At this time, the time learning unit 520 can learn each of the fine expressions by generating a temporal learning model using extracted spatial features for all frames based on the recursive neural network, a recurrent neural network, a gated recurrent unit (GRU), and a long short-term memory (LSTM).

본 발명의 실시예에 따른 미세 표정 학습 장치는 도 5 에 기재된 내용 뿐만 아니라 상술한 도 1 내지 도 3의 내용을 모두 포함할 수 있다.The micro-facial expression learning apparatus according to the embodiment of the present invention may include not only the contents described in FIG. 5 but also the contents of FIGS. 1 to 3 described above.

도 6은 본 발명의 일 실시예에 미세 표정 인식 장치에 대한 구성을 나타낸 것으로, 도 4의 방법을 수행하는 장치에 대한 구성을 나타낸 것이다.FIG. 6 shows a configuration of a micro facial expression recognition apparatus according to an embodiment of the present invention, and shows a configuration of an apparatus for performing the method of FIG.

도 6을 참조하면, 본 발명의 실시예에 따른 미세 표정 인식 장치(600)는 추출부(610), 계산부(620) 및 인식부(630)를 포함한다.6, an apparatus 600 for recognizing a micro facial expression according to an exemplary embodiment of the present invention includes an extracting unit 610, a calculating unit 620, and a recognizing unit 630.

추출부(610)는 미리 정의된 미세 표정들에 대한 공간적인 특징을 학습한 공간적인 학습 모델을 이용하여 미세 표정을 인식하고자 하는 비디오의 모든 프레임들에 대한 공간적인 특징을 추출한다.The extractor 610 extracts spatial features of all frames of the video that are to recognize the micro-facial expression using a spatial learning model that learns spatial features of predefined micro-facial features.

계산부(620)는 모든 프레임들에 대해 추출된 공간적인 특징과 미리 학습된 시간적인 학습 모델을 이용한 재귀 신경망 기반으로 미세 표정들 각각에 대한 인식 값을 계산한다.The calculator 620 calculates a recognition value for each of the fine facial expressions based on the recursive neural network using the extracted spatial feature for all the frames and the pre-learned temporal learning model.

여기서, 계산부(620)는 미세 표정들 각각에 대한 확률 값을 계산할 수 있다.Here, the calculation unit 620 may calculate a probability value for each of the fine facial expressions.

인식부(630)는 계산된 인식 값에 기초하여 비디오에서의 미세 표정을 인식한다.The recognition unit 630 recognizes the fine facial expression in the video based on the calculated recognition value.

여기서, 인식부(630)는 미세 표정들 각각에 대해 계산된 인식 값 중 가장 큰 값을 가지는 미세 표정을 해당 비디오의 미세 표정으로 인식할 수 있다.Here, the recognizer 630 may recognize the micro-facial expression having the largest value among the recognized facial expressions calculated for each micro-facial expression as a micro-facial expression of the corresponding video.

본 발명의 실시예에 따른 미세 표정 인식 장치는 도 6 에 기재된 내용 뿐만 아니라 상술한 도 1 내지 도4의 내용을 모두 포함할 수 있다.The apparatus for recognizing micro facial expressions according to the embodiment of the present invention may include not only the contents described in FIG. 6 but also the contents of FIGS. 1 to 4 described above.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 시스템, 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the systems, devices, and components described in the embodiments may be implemented in various forms such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array ), A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예들에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to embodiments may be implemented in the form of a program instruction that may be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

입력 비디오에서 미리 정의된 미세 표정들에 대한 프레임들을 추출하고, 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성하는 단계; 및
상기 생성된 공간적인 학습 모델을 이용하여 상기 입력 비디오의 프레임들에 대한 공간적인 특징을 추출하고, 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 단계
를 포함하는 미세 표정 학습 방법.
Extracting frames for predefined fine facial expressions from the input video and learning a spatial feature of the extracted frames to generate a spatial training model; And
Extracting spatial features for the frames of the input video using the generated spatial learning model and generating a temporal learning model using extracted spatial features for the frames, Steps to learn each
And a micro-facial expression learning method.

제1항에 있어서,
상기 공간적인 학습 모델을 생성하는 단계는
분류 에러 최소화 함수, 특징 공간에서 동일 클래스 내 분산 최소화 함수, 표정 상태 분류 에러 최소화 함수, 특징공간에서 표정 상태 내 분산 최소화 함수 및 특징공간에서 표정 상태의 연속성 보존 함수 중 적어도 하나의 목적 함수를 이용하여 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 상기 공간적인 학습 모델을 생성하는 것을 특징으로 하는 미세 표정 학습 방법.
The method according to claim 1,
The step of generating the spatial learning model
A classification minimization function, a distribution minimization function in the same class in a feature space, a facial expression classification error minimization function, a dispersion minimization function in a facial expression in a feature space, and a continuity preservation function in a feature space using at least one objective function Wherein the spatial learning model is generated by learning a spatial feature of the extracted frames.

제1항에 있어서,
상기 공간적인 학습 모델을 생성하는 단계는
상기 추출된 프레임들에 대한 공간적인 특징에 대하여 CNN(convolutional neural network)을 학습함으로써, 상기 공간적인 학습 모델을 생성하는 것을 특징으로 하는 미세 표정 학습 방법.
The method according to claim 1,
The step of generating the spatial learning model
Wherein the spatial learning model is generated by learning a CNN (convolutional neural network) with respect to spatial features of the extracted frames.

제1항에 있어서,
상기 미세 표정들 각각을 학습하는 단계는
재귀 신경망을 기반으로 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 상기 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 것을 특징으로 하는 미세 표정 학습 방법.
The method according to claim 1,
The step of learning each of the facial expressions
Wherein the temporal learning model is generated using spatial features extracted for the frames based on the recursive neural network, thereby learning each of the micro facial expressions.

제4항에 있어서,
상기 재귀 신경망은
RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함하는 것을 특징으로 하는 미세 표정 학습 방법.
5. The method of claim 4,
The recursive neural network
A recurrent neural network (RNN), a gated recurrent unit (GRU), and a long short-term memory (LSTM).

미리 정의된 미세 표정들에 대한 공간적인 특징을 학습한 공간적인 학습 모델을 이용하여 비디오의 프레임들에 대한 공간적인 특징을 추출하는 단계;
상기 프레임들에 대해 추출된 공간적인 특징과 미리 학습된 시간적인 학습 모델을 이용한 재귀 신경망 기반으로 상기 미세 표정들 각각에 대한 인식 값을 계산하는 단계; 및
상기 계산된 인식 값에 기초하여 상기 비디오에서의 미세 표정을 인식하는 단계
를 포함하는 미세 표정 인식 방법.
Extracting spatial features of frames of video using a spatial learning model that learns spatial features for predefined fine facial expressions;
Calculating recognition values for each of the fine facial expressions based on a recursive neural network using spatial features extracted for the frames and a pre-learned temporal learning model; And
Recognizing a fine expression in the video based on the calculated recognition value
Wherein the micro facial expression recognition method comprises:

제6항에 있어서,
상기 재귀 신경망은
RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함하는 것을 특징으로 하는 미세 표정 인식 방법.
The method according to claim 6,
The recursive neural network
A recurrent neural network (RNN), a gated recurrent unit (GRU), and a long short-term memory (LSTM).

입력 비디오에서 미리 정의된 미세 표정들에 대한 프레임들을 추출하고, 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 공간적인 학습 모델을 생성하는 공간 학습부; 및
상기 생성된 공간적인 학습 모델을 이용하여 상기 입력 비디오의 프레임들에 대한 공간적인 특징을 추출하고, 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 시간 학습부
를 포함하는 미세 표정 학습 장치.
A spatial learning unit for extracting frames for predefined fine expressions in the input video and learning a spatial feature of the extracted frames to generate a spatial learning model; And
Extracting spatial features for the frames of the input video using the generated spatial learning model and generating a temporal learning model using extracted spatial features for the frames, The time learning unit
And a micro-facial expression learning device.

제8항에 있어서,
상기 공간 학습부는
분류 에러 최소화 함수, 특징 공간에서 동일 클래스 내 분산 최소화 함수, 표정 상태 분류 에러 최소화 함수, 특징공간에서 표정 상태 내 분산 최소화 함수 및 특징공간에서 표정 상태의 연속성 보존 함수 중 적어도 하나의 목적 함수를 이용하여 상기 추출된 프레임들에 대한 공간적인 특징을 학습하여 상기 공간적인 학습 모델을 생성하는 것을 특징으로 하는 미세 표정 학습 장치.
9. The method of claim 8,
The space learning unit
A classification minimization function, a distribution minimization function in the same class in a feature space, a facial expression classification error minimization function, a dispersion minimization function in a facial expression in a feature space, and a continuity preservation function in a feature space using at least one objective function And the spatial learning model is generated by learning a spatial feature of the extracted frames.

제8항에 있어서,
상기 공간 학습부는
상기 추출된 프레임들에 대한 공간적인 특징에 대하여 CNN(convolutional neural network)을 학습함으로써, 상기 공간적인 학습 모델을 생성하는 것을 특징으로 하는 미세 표정 학습 장치.
9. The method of claim 8,
The space learning unit
Wherein the spatial learning model is generated by learning a CNN (convolutional neural network) with respect to a spatial feature of the extracted frames.

제8항에 있어서,
상기 시간 학습부는
재귀 신경망을 기반으로 상기 프레임들에 대해 추출된 공간적인 특징을 이용하여 상기 시간적인 학습 모델을 생성함으로써, 상기 미세 표정들 각각을 학습하는 것을 특징으로 하는 미세 표정 학습 장치.
9. The method of claim 8,
The time learning unit
Wherein the micro-facial expression learning unit learns each of the micro-facial expressions by generating the temporal learning model using the extracted spatial feature of the frames based on the recursive neural network.

제11항에 있어서,
상기 재귀 신경망은
RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함하는 것을 특징으로 하는 미세 표정 학습 장치.
12. The method of claim 11,
The recursive neural network
A recurrent neural network (RNN), a gated recurrent unit (GRU), and a long short-term memory (LSTM).

미리 정의된 미세 표정들에 대한 공간적인 특징을 학습한 공간적인 학습 모델을 이용하여 비디오의 프레임들에 대한 공간적인 특징을 추출하는 추출부;
상기 프레임들에 대해 추출된 공간적인 특징과 미리 학습된 시간적인 학습 모델을 이용한 재귀 신경망 기반으로 상기 미세 표정들 각각에 대한 인식 값을 계산하는 계산부; 및
상기 계산된 인식 값에 기초하여 상기 비디오에서의 미세 표정을 인식하는 인식부
를 포함하는 미세 표정 인식 장치.
An extractor for extracting spatial features of frames of a video using a spatial learning model that learns spatial features of predefined fine facial expressions;
A calculation unit for calculating a recognition value for each of the fine expressions based on a recursive neural network using spatial features extracted for the frames and a pre-learned temporal learning model; And
And a recognition unit for recognizing a fine expression in the video based on the calculated recognition value,
Wherein the micro-facial expression recognizing device comprises:

제13항에 있어서,
상기 재귀 신경망은
RNN(recurrent neural network), GRU(gated recurrent unit), 및 LSTM(long short-term memory) 중 적어도 하나를 포함하는 것을 특징으로 하는 미세 표정 인식 장치.14. The method of claim 13,
The recursive neural network
A recurrent neural network (RNN), a gated recurrent unit (GRU), and a long short-term memory (LSTM).