KR20200119410A

KR20200119410A - System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information

Info

Publication number: KR20200119410A
Application number: KR1020190036344A
Authority: KR
Inventors: 박종철; 박한철
Original assignee: 한국과학기술원
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2020-10-20

Abstract

The present invention provides a global and regional context-based Korean dialogue text emotion recognizing system performed on a recording medium recorded in a computer executable language and a method thereof. The method comprises a dialogue text pre-processing step, an emotion recognition step through a neural network model based on global and regional attention mechanisms, and an emotion recognition result outputting step for the dialogue utterance.

Description

전역 및 지역 문맥 기반 한국어 대화문 감정 인식 시스템 및 방법{System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information}System and Method for Recognizing Emotions from Korean Dialogues based on Global and Local Contextual Information}

본 발명은 전역 및 지역 주의집중 메커니즘(global and local attention mechanism) 기반 심층 신경망(deep neural network) 모델을 활용하여, 대화문 각 발화의 감정을 인식하는 시스템 및 그 방법에 관한 것이다. 보다 구체적으로, 전역 주의집중 메커니즘을 통해 전체 대화문 발화에 존재하는 다양한 소주제 상황 가운데서 현재 발화에 관련된 소주제들의 문맥 정보만을 선택적으로 요약하고, 지역 주의집중 메커니즘을 통해 현재 발화 근처에 존재하는 직접적인 감정 원인 발화 문맥을 포착한다. 본 발명은 현재 발화와 직접적인 감정 원인 발화 정보 (지역 문맥) 그리고 이들이 전개되는 문맥 (전역 문맥) 정보를 토대로 각 발화의 감정을 분류하는 시스템과 그 방법에 관한 것이다.The present invention relates to a system and method for recognizing the emotion of each utterance in a dialogue by using a deep neural network model based on a global and local attention mechanism. More specifically, through the global attention mechanism, only contextual information of the sub-topic related to the current utterance is selectively summarized among the various sub-topic situations that exist in the entire dialogue utterance, and the direct emotional cause existing near the current utterance is uttered through the local attention mechanism. Capture the context. The present invention relates to a system and method for classifying emotions of each utterance based on current utterance, direct emotion cause utterance information (local context), and context in which they develop (global context) information.

자연어로 구성된 대화문 발화의 감정을 인식하는 기술은 사람의 감정을 이해하고 그에 따라 반응할 수 있는 인공지능 챗봇 개발에 있어서 필수적인 요소로 인식되고 있다. 대화문 속의 발화 감정을 인식하는 연구들은 주로 대화 문맥의 고려없이, 단일 발화에서 감정을 나타내는 자질(feature)들을 수동으로 정제하여 인식하거나, 혹은 감정 사전을 이용한 감정 인식이 주를 이루었었다. 그러나 수동적인 자질 추출 및 감정 사전을 이용하는 방식은 각 감정 범주의 특징을 정의하는 범위의 한계가 존재하며, 문맥 정보를 고려하지 않기 때문에 같은 발화에 대하여 다양한 감정 범주를 나타낼 수 있는 상황에서 유연하게 작동하기 어렵다. Technology that recognizes the emotions of dialogue speech composed of natural language is recognized as an essential element in the development of artificial intelligence chatbots that can understand people's emotions and react accordingly. Research on recognizing speech emotions in dialogue mainly focused on recognizing and recognizing features representing emotions in a single speech by manually refining them, or by using an emotion dictionary, without consideration of the dialogue context. However, the passive feature extraction and the method of using the emotion dictionary have limitations in the range defining the characteristics of each emotion category, and because context information is not considered, it operates flexibly in situations where various emotion categories can be expressed for the same utterance. Difficult to do.

문맥 정보 고려에 대한 문제를 해결하려는 시도도 존재하였으나, 블랙 박스 형태로 문맥 정보를 포착함으로써 어떤 발화가 원인 문맥이 되는지 알기 어려우며, 문맥 정보로서 직접적인 감정 원인을 포착하긴 하지만 현재 발화와 감정 원인 발화가 전개되는 상황적인 문맥을 동시에 포착하지는 못하였다.There have been attempts to solve the problem of considering context information, but it is difficult to know which utterance becomes the cause context by capturing context information in the form of a black box. Although the direct emotional cause is captured as context information, the current utterance and the emotional cause utterance are It was not able to capture the unfolding contextual context at the same time.

본 발명은, 상기와 같은 문제점을 해결하기 위해 안출된 것으로, 심층 신경망을 이용하여 감정 사전 및 수동적인 자질 추출 없이 데이터로부터 자동으로 발화에 포함된 감정 관련 자질을 학습할 수 있으며, 현재 발화와 직접적인 감정 원인 발화 정보 그리고 이들이 전개되는 문맥 정보를 추출하여 대화문 발화의 감정을 인식하는 시스템과 그 방법을 제공하는데 그 목적이 있다.The present invention has been devised to solve the above problems, and it is possible to learn emotion-related qualities included in the utterance automatically from data without the emotion dictionary and passive qualities extraction using a deep neural network. Its purpose is to provide a system and method for recognizing the emotion of dialogue speech by extracting emotion cause speech information and context information in which they are developed.

다만, 본 발명이 해결하고자 하는 과제는 상기 목적으로 한정되는 것이 아니며, 본 발명의 기술적 사상 및 영역으로부터 벗어나지 않는 범위에서 다양하게 확장될 수 있다. However, the problem to be solved by the present invention is not limited to the above object, and may be variously extended without departing from the spirit and scope of the present invention.

상기 과제를 해결하기 위한 본 발명의 실시예에 따른 컴퓨터에서 실행가능한 언어로 기록된 기록매체에서 수행되는 전역 및 지역 문맥 기반 한국어 대화문 감정 인식 시스템 및 방법은, 대화문 전처리 단계, 전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계, 대화문 발화에 대한 감정 인식 결과 출력 단계를 포함한다. The system and method for recognizing Korean dialogue emotions based on global and regional contexts performed on a recording medium recorded in a language executable on a computer according to an embodiment of the present invention for solving the above problems include: a dialogue preprocessing step, a global and regional attention mechanism It includes the step of recognizing emotion through a neural network model and outputting the result of emotion recognition for dialogue speech.

본 발명의 몇몇 실시예에서, 상기 대화문 전처리 단계는, 각 발화를 형태소 단위로 분해한 후 형태소 유형 정보를 추가하는 과정을 포함하여 수행될 수 있다. In some embodiments of the present invention, the pre-processing of the conversation may be performed by decomposing each utterance into a morpheme unit and then adding morpheme type information.

본 발명의 몇몇 실시예에서, 전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계는 전역 주의집중 메커니즘을 통해 전체 대화문 발화에 존재하는 다양한 소주제 상황 가운데서 현재 발화에 관련된 소주제들의 문맥 정보만을 선택적으로 요약하는 과정을 포함하여 수행될 수 있다. In some embodiments of the present invention, in the step of recognizing emotion through a neural network model based on global and local attention mechanisms, only context information of sub-topic related to the current utterance is selectively selected from among various sub-topic situations existing in the entire dialogue utterance through the global attention mechanism. It can be carried out including the process of summarizing.

본 발명의 몇몇 실시예에서, 전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계는 지역 주의집중 메커니즘을 통해 현재 발화 근처에 존재하는 직접적인 감정 원인 발화 문맥을 포착하는 과정을 포함하여 수행될 수 있다.In some embodiments of the present invention, the step of recognizing emotion through a neural network model based on global and regional attention mechanisms may be performed including a process of capturing a direct emotion cause utterance context existing near the current utterance through a local attention mechanism. have.

본 발명의 몇몇 실시예에서, 전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계는 전역 및 지역 문맥 정보와 현재 발화를 결합하여 현재 발화의 감정을 인식하는 과정을 포함하여 수행될 수 있다.In some embodiments of the present invention, the step of recognizing an emotion through a neural network model based on a global and regional attention mechanism may include a process of recognizing the emotion of the current speech by combining global and regional context information with the current speech.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the present invention are included in the detailed description and drawings.

본 발명의 실시예에 따르면, 심층 신경망을 통해 수동적인 자질 추출 및 감정 사전의 이용없이 데이터로부터 자동으로 발화에 포함된 감정 관련 자질을 학습할 수 있다.According to an embodiment of the present invention, it is possible to learn emotion-related features included in speech automatically from data without passive feature extraction and use of an emotion dictionary through a deep neural network.

또한, 본 발명의 실시예에 따르면, 현재 발화의 직접적인 감정 원인이 되는 발화를 자동으로 추출할 수 있다.In addition, according to an embodiment of the present invention, it is possible to automatically extract an utterance that directly causes emotion of the current utterance.

또한, 본 발명의 실시예에 따르면, 대화문 상에서 현재 발화 및 발화의 직접 감정 원인 발화에 관련된 소주제들의 문맥 정보만을 선택적으로 요약하여 벡터 형태로 인코딩할 수 있다.In addition, according to an embodiment of the present invention, only context information of sub-themes related to the current utterance and the direct emotion cause utterance of the utterance in the dialogue may be selectively summarized and encoded in a vector format.

도 1은 본 발명의 실시예에 따른 전역 및 지역 문맥 기반 한국어 대화문 감정 인식 방법의 흐름도이다.
도 2는 전역 및 지역 주의집중 메커니즘 모델의 흐름도이다.
도 3은 전역 및 지역 주의집중 메커니즘 모델의 구조이다.1 is a flowchart of a method for recognizing emotions in Korean conversational text based on global and regional contexts according to an embodiment of the present invention.
2 is a flow diagram of a model of global and regional attentional mechanisms.
3 is a structure of a global and regional attention mechanism model.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, only these embodiments make the disclosure of the present invention complete, and common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have, and the invention is only defined by the scope of the claims.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예들을 보다 상세하게 설명하고자 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조 부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The same reference numerals are used for the same elements in the drawings, and duplicate descriptions for the same elements are omitted.

도 1은 본 발명의 실시예에 따른 전역 및 지역 문맥 기반 한국어 대화문 감정 인식 방법의 흐름도이다.1 is a flowchart of a method for recognizing emotions in Korean conversational text based on global and regional contexts according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 전역 및 지역 문맥 기반 한국어 대화문 감정 인식 방법은, 대화문 텍스트를 입력받는 단계(S100), 대화문 발화를 전처리하는 단계(S200), 전역 및 지역 주의집중 메커니즘 기반 심층 신경망 모델을 이용한 감정 인식 단계(S300), 대화문 발화에 대한 감정 인식 결과를 출력하는 단계(S400)를 포함한다. Referring to FIG. 1, in the global and local context-based Korean dialogue emotion recognition method according to an embodiment of the present invention, the step of receiving dialogue text (S100), pre-processing the dialogue speech (S200), and concentration of global and local attention An emotion recognition step (S300) using a mechanism-based deep neural network model, and outputting an emotion recognition result for conversation speech (S400).

대화문 텍스트를 입력받는 단계(S100)Step of receiving conversation text (S100)

대화문 텍스트를 입력받는 단계(S100)는 사용자로부터 대화문을 입력받는 단계로, 입력 받은 대화문 텍스트는 대화문 발화를 전처리하는 단계(S200)로 전달된다.The step S100 of receiving the conversation text is a step of receiving the conversation text from the user, and the received conversation text is transferred to the step S200 of preprocessing the conversation utterance.

대화문 발화를 전처리하는 단계(S200)Pre-processing conversation speech (S200)

대화문 발화를 전처리하는 단계(S200)는 대화문의 각 발화에 대한 형태소 분석을 실시하여 발화의 각 단어를 예시 1과 같이 "형태소/형태소 범주" 형태로 변형하는 단계이다. The step of pre-processing the dialogue speech (S200) is a step of transforming each word of the speech into a "morpheme/morpheme category" form as in Example 1 by performing a morpheme analysis on each speech in the dialogue.

(예시 1)(Example 1)

난 가족과 함께 캠핑갈거야. -> 나/VV ㄴ/ETM 가족/NNG 과/JC 함께/MAG 캠핑/NNP 갈/NNG 거/NNB 야/JKV ./SFI am going camping with my family. -> Me/VV b/ETM family/NNG and/JC together/MAG camping/NNP go/NNG go/NNB go/JKV ./SF

본 발명의 대화문 발화 전처리 과정을 수행함으로써, 자연어 처리 시 발생될 수 있는 동형 이의어 문제를 완화할 수 있다 (예시: 형태소 범주를 통해 주격 조사 "은"과 금속을 나타내는 "은"을 구분). 또한 한국어는 교착어로서 형태소들이 결합하여 하나의 어절을 형성하는데, 어절 단위로 분석하게 될 경우 새로운 데이터가 들어왔을 때 학습 데이터 상에 존재하지 않은 형태소들의 조합이 존재할 경우 이를 처리될 수 없게 된다. 형태소 단위로 분석하게 될 경우 이러한 문제들을 해결할 수 있게 된다.By performing the dialogue speech pre-processing process of the present invention, it is possible to alleviate the problem of homomorphic synonyms that may occur during natural language processing (eg, distinguishing between "silver" representing the subject and "silver" representing metal through a morpheme category). In addition, Korean is an awkward language, and morphemes are combined to form a single word. In the case of analysis by word unit, when new data comes in, if there is a combination of morphemes that do not exist in the learning data, it cannot be processed. When analyzing in units of morphemes, these problems can be solved.

전역 및 지역 주의집중 메커니즘 기반 심층 신경망 모델을 이용한 감정 인식 단계(S300)Emotion recognition step using deep neural network model based on global and local attention mechanism (S300)

전역 및 지역 주의집중 메커니즘 기반 심층 신경망 모델을 이용한 감정 인식 단계(S300)는 전술된 단계에서 전처리된 대화문 발화들을 입력으로 하여 각 발화의 감정 범주를 할당하는 단계이다. The emotion recognition step (S300) using the deep neural network model based on global and regional attention mechanisms is a step of allocating an emotion category of each utterance by inputting the dialogue utterances preprocessed in the above-described step.

도 2는 본 발명의 실시예에 따른 도 1의 전역 및 지역 주의집중 메커니즘 기반 심층 신경망 모델을 이용한 감정 인식 단계(S300)를 세부적으로 설명하기 위한 흐름도이다.FIG. 2 is a flowchart for explaining in detail the emotion recognition step (S300) using the deep neural network model based on the global and regional attention mechanism of FIG. 1 according to an embodiment of the present invention.

본 발명의 실시예에 따른 전역 및 지역 주의집중 메커니즘 기반 심층 신경망 모델을 이용한 감정 인식 단계(S300)는, 전처리된 대화문 텍스트를 입력받는 단계(S310), 전역 주의집중 메커니즘(S320), 지역 주의집중 메커니즘(S330), 문맥 정보를 고려한 발화 감정 인식(S340) 단계로 구성된다.The emotion recognition step (S300) using the deep neural network model based on the global and regional attention mechanism according to an embodiment of the present invention includes receiving the preprocessed dialogue text (S310), the global attention mechanism (S320), and the local attention. It consists of a mechanism (S330), the speech emotion recognition (S340) step in consideration of context information.

본 발명에서 전역 주의집중 메커니즘을 통해 전체 대화문 발화에 존재하는 다양한 소주제 상황 가운데서 현재 발화에 관련된 소주제들의 문맥 정보만을 선택적으로 요약하고, 지역 주의집중 메커니즘을 통해 현재 발화 근처에 존재하는 직접적인 감정 원인 발화 문맥을 포착한다.In the present invention, only context information of the sub-topic related to the current utterance is selectively summarized among various sub-topic situations that exist in the entire dialogue utterance through the global attention mechanism, and the direct emotional cause utterance context existing near the current utterance through the local attention mechanism To capture.

예시 2, 3, 4를 통해 대화문 감정 분석에서 전역 및 지역 문맥 분석의중요성을 설명한다. 예시 2, 3에 서술된 예제와 같이 감정 분석 대상인 발화인 "아~ 정말 집에 가고 싶다"와 감정 원인 발화인 "전화해보니까 집에 맛있는거 해놓았데"는 동일하지만, 전개된 문맥이 서로 다르기 때문에 서로 다른 감정으로 인식된다. 전역 주의집중 메커니즘은 대화문 전역에서 감정 원인과 현재 발화에 대응되는 소주제 상황 문맥을 인식하고 요약함으로써 예시 2, 3과 같은 서로 다른 감정 분석 결과를 도출할 수 있게 해준다.Examples 2, 3, and 4 illustrate the importance of global and regional context analysis in dialogue sentiment analysis. As in the examples described in Examples 2 and 3, the emotion analysis target utterance "Ah~ I really want to go home" and the emotion cause utterance "I did something delicious at home" are the same, but the developed contexts are the same. Because they are different, they are perceived as different emotions. The global attention mechanism makes it possible to derive different emotional analysis results as in Examples 2 and 3 by recognizing and summarizing the emotional cause and the context of the sub-theme context corresponding to the current utterance across the dialogue.

(예시 2)(Example 2)

A: 형 집에 전화해서 뭐있나 물어볼께.A: I'll call my brother's house and ask him what he has.

B: 나 너무 배고픈데.B: I'm so hungry.

A: 전화해보니까 집에 맛있는거 해놓았데. A: When I called, I made something delicious at home.

B: 아~ 정말 집에 가고 싶다. (감정 범주: 행복)B: Ah~ I really want to go home. (Emotion Category: Happiness)

(예시 3)(Example 3)

A: 밥은 잘 먹고 다녀?A: Do you eat well?

B: 맛있다고 맨날 불고기만 내놓아서 이제 질려버렸어요.B: I'm tired of serving only bulgogi because it's delicious.

A: 전화해보니까 집에 맛있는거 해놓았데.A: When I called, I made something delicious at home.

B: 아~ 정말 집에 가고 싶다. (감정 범주: 슬픔)B: Ah~ I really want to go home. (Emotion Category: Sadness)

또한 예시 4와 같이 분석 대상 발화인 "아~ 정말 집에 가고 싶다" 에 대해서 직접 감정 원인 발화로서 "이것만 좀 끝내고 집에 가면 안될까?" 를 인식함으로서, 전술된 예시 2, 3의 분석 대상 발화인 "아~ 정말 집에 가고 싶다"와 서로 다른 감정 분석을 실시할 수 있게 해준다.In addition, as in Example 4, the analysis target utterance "Ah~ I really want to go home" is the direct emotion cause utterance, "Can't we just finish this and go home?" By recognizing, it is possible to analyze different emotions from “Ah, I really want to go home”, which is the analysis target utterance in Examples 2 and 3 described above.

(예시 4)(Example 4)

A: 시간이 몇 시지?A: What time is it?

B: 퇴근할 시간이 다 되었네요.B: It's time to go home.

A: 이것만 좀 끝내고 집에 가면 안될까? A: Can't we just finish this and go home?

전처리된 대화문 텍스트를 입력받는 단계(S310)Step of receiving the preprocessed dialogue text (S310)

전처리된 대화문 텍스트를 입력받는 단계(S310)는 전술된 단계에서 변형된 입력 대화문을 전역 및 지역 주의집중 메커니즘 기반 심층 신경망으로 전달하는 단계이다. 이 단계에서 각 발화는 워드 임베딩 레이어 (word embedding layer)를 거쳐 각 단어가 d 차원의 벡터로 변형되고 각 발화에 대해서 d * n (단어 수) 의 행렬을 생성하게 된다. 각 발화 행렬은 합성곱 신경망 (convolutional neural network)를 거쳐 인코딩된 발화 벡터

를 생성하게 된다. 각 벡터의 요소들은 주어진 발화의 의미와 감정 자질을 연속된 값으로 나타내며, 각 자질은 신경망 학습을 통해 자동으로 추출된다.In step S310 of receiving the preprocessed dialog text, the input dialog modified in the above-described step is transmitted to the deep neural network based on the global and local attention mechanisms. In this step, each utterance goes through a word embedding layer, and each word is transformed into a d-dimensional vector, and a matrix of d * n (number of words) is generated for each utterance. Each speech matrix is a speech vector encoded through a convolutional neural network

Will be created. The elements of each vector represent the meaning of a given utterance and emotional features as consecutive values, and each feature is automatically extracted through neural network learning.

전역 주의집중 메커니즘(S320)Global attention mechanism (S320)

전역 주의집중 메커니즘(S320)을 통해 전체 대화문 발화에 존재하는 다양한 소주제 상황 가운데서 현재 발화에 관련된 소주제들의 문맥 정보만을 선택적으로 요약하여 인코딩된 벡터 형태로 표현하는 단계이다.This step is a step of selectively summarizing only context information of sub-themes related to the current utterance among the various sub-topic situations existing in the entire dialogue utterance through the global attention mechanism (S320) and expressing them in an encoded vector form.

합성곱 신경망을 통해 인코딩된 현재 발화 벡터를

, 그 외 문맥이 되는 발화 벡터들을

라고 한다면 수학식 (1)에 따라 각 벡터는 선형 변환 (linear transformation)되어 상태 벡터

를 각각 생성한다.The current speech vector encoded through the convolutional neural network

, And other utterance vectors

If so, each vector is linearly transformed according to Equation (1) to form a state vector.

Respectively.

[수학식 (1)][Equation (1)]

현재 발화에 관련된 소주제들의 문맥 정보만을 선별하기 위해 수학식 (2)와 같은 주의집중 벡터

를 생성한다. 이 벡터의 원소는 현재 발화에 대해서 각 문맥 발화의 중요도를 나타낸다. 이 때 주의집중 벡터의 크기는 현재 발화문을 제외한 대화문 전체의 발화수가 된다.Attention vector as shown in Equation (2) to select only the context information of the sub-themes related to the current speech.

Create The elements of this vector represent the importance of each context utterance with respect to the current utterance. At this time, the size of the attention vector is the number of speeches in the entire conversation excluding the current speech.

[수학식 (2)][Equation (2)]

각 문맥 발화의 중요도를 나타내는 주의집중 벡터

와 문맥 발화 벡터들

를 이용하여 수학식 (3)과 같이 가중치 평균을 통해 전역 문맥 벡터

를 생성한다. 해당 벡터는 현재 발화가 속한 소주제 상황들에 대한 특징을 요약한 벡터이다.Attention vector indicating the importance of each context utterance

And context firing vectors

Using the global context vector through the weighted average as in Equation (3)

Create This vector is a vector that summarizes the characteristics of the sub-topic situations to which the current utterance belongs.

[수학식 (3)][Equation (3)]

지역 주의집중 메커니즘(S330)Local attention mechanism (S330)

지역 주의집중 메커니즘을 통해 현재 발화 근처에 존재하는 직접적인 감정 원인 발화 문맥을 포착한다. 이를 위해서 전역 주의집중 메커니즘과 동일하되, 수학식 (4)와 같이 문맥 발화의 범위를 현재 발화 위치(

)로부터

만큼 앞선 문맥까지만 주의집중 메커니즘을 실시한다. 생성된 지역 주의 집중 벡터

의 원소 중 가장 높은 점수를 지닌 발화문이 현재 발화의 직접적인 감정 원인 발화가 된다. 이를 통해 어떤 발화문이 직접적인 감정 원인 발화인지를 알 수 있다.Through the local attention mechanism, it captures the context of the direct emotion-cause utterance existing near the current utterance. For this purpose, it is the same as the global attention mechanism, but the range of the context utterance is set to the current utterance position (

)from

The attention-focusing mechanism is implemented only up to the context that precedes it. Generated local attention vector

The utterance with the highest score among the elements of is the direct emotional cause utterance of the current utterance. Through this, it is possible to know which utterance is the direct emotional utterance.

[수학식 (4)][Equation (4)]

문맥 정보를 고려한 발화 감정 인식(S340)Speech emotion recognition considering context information (S340)

문맥 정보를 고려한 발화 감정 인식(S340) 단계에서는 전술된 단계에서 생성된 문맥 벡터 (

) 와 현재 발화의 상태 벡터

를 이용하여 감정 분류를 위한 상태 벡터

를 생성한다.In the step of recognizing the speech emotion in consideration of the context information (S340), the context vector generated in the above-described step (

) And the state vector of the current utterance

State vector for emotion classification using

Create

[수학식 (5)][Equation (5)]

전역 및 지역 문맥 벡터와 현재 발화는 연결되어 (concatenate (;)) 완전 연결층(fully-connected layer)에 전달된다. 마지막으로 소프트맥스 (softmax) 층에서 감정 분포를 출력한다. 소프트맥스 층에서는 감정 범주 별로 확률을 출력하는데, 이 때 가장 높은 확률을 나타내는 범주로 현재 발화의 감정이 할당된다.The global and local context vectors and the current utterance are concatenated (;) and passed to the fully-connected layer. Finally, the emotion distribution is output in the softmax layer. The softmax layer outputs probabilities for each emotion category, and at this time, the emotion of the current speech is assigned to the category representing the highest probability.

대화문 발화에 대한 감정 인식 결과를 출력하는 단계(S400)Outputting a result of recognizing emotions for dialogue speech (S400)

대화문 발화에 대한 감정 인식 결과를 출력하는 단계(S400)는 전술된 단계에서 인식된 모든 대화문 발화들의 감정 인식 결과를 그 확률 분포와 함께 출력해주는 단계이다.In step S400 of outputting the emotion recognition result for the dialogue speech utterance (S400), the emotion recognition result of all dialogue speech speech recognized in the above-described step is outputted together with the probability distribution.

[실험예] [Experimental Example]

대화문 발화의 감정 분류 성능 실험Emotion classification performance experiment of dialogue speech

웹 상에 존재하는 한국어 일상 대화를 수집한 후 각 발화에서 나타나는 감정 범주를 주석한 데이터를 활용하여 실험을 수행하였다. 실험에 사용된 감정 범주는 총 7가지로 Ekman의 6가지 감정 범주(행복, 슬픔, 혐오, 공포, 놀람, 분노) 및 중립 감정을 포함한다. After collecting Korean daily conversations on the web, an experiment was conducted using data that annotated the emotion categories that appear in each utterance. The emotion categories used in the experiment were a total of seven, including Ekman's six emotion categories (happiness, sadness, disgust, fear, surprise, anger) and neutral emotions.

본 발명의 우수성을 입증하기 위해 3가지 서로 다른 신경망 모델과 비교를 수행하였다. 합성곱 신경망의 경우 문맥 정보 없이 단일 발화만을 입력으로 받아 감정 인식을 수행한다. 합성곱 및 장단기 메모리 재귀 신경망은 합성곱 신경망을 통해 각 발화문을 인코딩한 후 장단기 메모리 재귀 신경망에 입력하여 서로 다른 발화 간에 문맥 정보가 교환될 수 있도록 도움으로써 발화문의 감정 인식을 수행한다. 이때, 장단기 메모리 재귀 신경망은 어떤 발화가 현재 발화의 감정 원인이 되는지 알려주지 않는다. 마지막으로 전역 주의집중 메커니즘만 적용된 심층 신경망을 이용하여 본 발명에서 제안하는 전역 및 지역 주의집중 메커니즘을 동시에 사용하는 과정의 필요성을 입증하였다. In order to demonstrate the excellence of the present invention, comparisons with three different neural network models were performed. In the case of a convolutional neural network, emotion recognition is performed by receiving only a single speech without context information. Convolutional and short-term memory recursive neural networks perform emotion recognition of speech sentences by encoding each speech through a convolutional neural network and then inputting them to the long-term memory recursive neural network to exchange context information between different speeches. At this time, the short-term memory recursive neural network does not tell which utterance causes the emotion of the current utterance. Finally, the necessity of a process of simultaneously using the global and local attention mechanisms proposed in the present invention was demonstrated using a deep neural network to which only the global attention mechanism was applied.

실험에서 사용된 성능 척도는 감정 범주 별 평균 정확도를 이용하였으며, 실험 결과는 표 1에 나타내었다.The performance measure used in the experiment used the average accuracy for each emotion category, and the experimental results are shown in Table 1.

모델Model 감정 범주 별 평균 정확도 (%)Average accuracy by emotion category (%) 합성곱 신경망Convolutional neural network 38.0038.00 합성곱 및 장단기 메모리 재귀 신경망Convolutional and short-term memory recursive neural networks 38.2338.23 전역 주의집중 메커니즘 기반 심층 신경망Deep neural network based on global attention mechanism 34.5234.52 전역 및 지역 주의집중 메커니즘 기반 심층 신경망Deep neural network based on global and local attention mechanism 42.5542.55

도 3은 전역 및 지역 주의 집중 메커니즘 모델의 구조이다. 3 is a structure of a global and regional attention mechanism model.

도 3에 도시한 바와 같이, 본 발명의 주의집중 매커니즘 모델은 CNN을 통해 획득된 발화들의 표현 벡터에 기반하여 작성될 수 있다. 전역 주의 집중은 대화 상의 전체 발화 가운데 현재 발화와 관련된 상황을 요약하기 위한 것이고, 지역 주의 집중은 현재 발화 근처에서 해당 발화의 감정을 직접적으로 야기시키는 발화 문맥을 검출하기 위한 것이다.As shown in FIG. 3, the attention-focusing mechanism model of the present invention may be created based on the expression vector of speeches acquired through CNN. Global attention is to summarize the situation related to the current utterance among all utterances in the conversation, and local attention is to detect the utterance context that directly causes the emotion of the utterance near the current utterance.

Claims

컴퓨터에서 실행가능한 언어로 기록된 기록매체에서 수행되는 대화문 감정 인식 방법으로서,
대화문 전처리 단계,
전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계,
대화문 발화에 대한 감정 인식 결과 출력 단계를 포함하는 대화문 감정 인식 방법.A method for recognizing conversational emotions performed on a recording medium recorded in a language executable on a computer, comprising:
Dialogue preprocessing step,
Emotion recognition stage through neural network model based on global and local attention mechanism,
A conversational emotion recognition method comprising the step of outputting an emotion recognition result for conversational speech.

제1항에 있어서,
상기 대화문 전처리 단계는, 동형이의어 문제 및 어절 단위 단어 분석의 문제 해소를 하기 위해 형태소 및 형태소 정보를 결합하여 발화문을 변형하는 과정을 포함하여 수행되는, 대화문 감정 인식 방법.The method of claim 1,
The dialogue sentence pre-processing step is performed, including a process of transforming the speech by combining morpheme and morpheme information in order to solve a problem of homozygous problem and word analysis of words.

제1항에 있어서,
상기 전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계는, 전역 주의집중 메커니즘을 통해 전체 대화문 발화에 존재하는 다양한 소주제 상황 가운데서 현재 발화에 관련된 소주제들의 문맥 정보만을 선택적으로 요약하는 과정을 포함하여 수행되는, 대화문 감정 인식 방법.The method of claim 1,
The emotional recognition step through the global and regional attention mechanism-based neural network model includes a process of selectively summarizing only context information of sub-themes related to the current utterance among various sub-topic situations existing in the entire dialogue utterance through the global attention mechanism. Conversational emotion recognition method performed.

제1항에 있어서,
상기 지역 주의집중 메커니즘을 통해 현재 발화 근처에 존재하는 직접적인 감정 원인 발화 문맥을 포착하는 과정을 포함하여 수행되는, 대화문 감정 인식 방법.The method of claim 1,
A method for recognizing conversational emotions, including the process of capturing a direct emotion cause utterance context existing near the current utterance through the local attention mechanism.

제1항에 있어서,
상기 전역 및 지역 주의집중 메커니즘 기반 신경망 모델을 통한 감정 인식 단계는, 현재 발화와 직접적인 감정 원인 발화 정보 (지역 문맥) 그리고 이들이 전개되는 문맥 (전역 문맥) 정보를 토대로 각 발화의 감정을 분류하는 과정을 포함하여 수행되는, 대화문 감정 인식 방법.

The method of claim 1,
The emotional recognition step through the global and regional attention mechanism-based neural network model includes a process of classifying the emotion of each utterance based on the current utterance, the direct emotion cause utterance information (local context), and the context in which they develop (global context) information. Conversational emotion recognition method performed, including.