KR20090063202A

KR20090063202A - Method for apparatus for providing emotion speech recognition

Info

Publication number: KR20090063202A
Application number: KR1020090047608A
Authority: KR
Inventors: 정홍; 정경중; 송재윤
Original assignee: 포항공과대학교 산학협력단
Priority date: 2009-05-29
Filing date: 2009-05-29
Publication date: 2009-06-17

Abstract

A feeling recognition apparatus and a method thereof are provided to increase a feeling recognition rate by analyzing a speaker's feeling in a speaker independent service, thereby enhancing service satisfaction. A spectrogram converter(102) converts an inputted voice signal into spectrogram. A zero-crossing detector(104) extracts a vowel component from the converted spectrogram. A multivariable data separator(106) separates the extracted vowel component into time axis information and frequency axis information. The multivariable data separator stores a vector component of the frequency axis information in a training database(108). A matching unit conducts matching inspection for the stored vector component. The matching unit outputs feeling recognition result data.

Description

감정 인식 장치 및 방법{METHOD FOR APPARATUS FOR PROVIDING EMOTION SPEECH RECOGNITION}Emotion Recognition Apparatus and Method {METHOD FOR APPARATUS FOR PROVIDING EMOTION SPEECH RECOGNITION}

본 발명은 감정 인식 기술에 관한 것으로, 특히 입력되는 음성 신호를 분석하고 그 결과에 따라 해당 음성 신호에 내포된 화자의 감정을 인식하는데 적합한 감정 인식 장치 및 방법에 관한 것이다.The present invention relates to an emotion recognition technology, and more particularly, to an emotion recognition apparatus and method suitable for analyzing an input voice signal and recognizing a speaker's emotion contained in the voice signal according to the result.

음성은 인간의 가장 자연스러운 의사소통 수단이자 정보전달 수단으로서, 정보 기술분야의 발전과 함께 음성 인식 정보를 실생활에 적용하기 위한 기술들이 개발되고 있으며, 특히 이러한 음성을 수치화하고 효과적으로 처리하기 위한 음성 정보 응용 기술이 매우 비약적으로 발전하고 있는 추세이다.Voice is the most natural means of communication and information transmission for human beings. With the development of information technology, technologies for applying voice recognition information to real life have been developed, and in particular, voice information application for quantifying and effectively processing such voices. Technology is developing very rapidly.

예컨대, 휴대폰을 이용하여 상대방 통화자의 속마음을 예측해주는 감정 분석 기술, 로봇에 감정인식 기능을 탑재하여 조작자의 감정을 예측하는 기술 등이 음성 정보 응용 기술의 대표적인 예라 할 수 있다.For example, emotion analysis technology that predicts the inner mind of the other party's caller using a mobile phone, and technology that predicts the emotion of an operator by mounting an emotion recognition function on a robot may be a typical example of voice information application technology.

기존의 음성 인식을 통한 감정 분석 기술에서는, 목소리의 크기, 발성속도, 주파수 분석을 통한 포먼트(formant)의 위치, LPC(Linear Predictive Coding), MFCC(Mel-Frequency Cepstral Coefficients) 등 수많은 기법을 이용하여 사람의 목소리를 분석하였다.In the existing emotion analysis technique using speech recognition, numerous techniques such as voice size, voice speed, position of formant through frequency analysis, linear predictive coding (LPC), and mel-frequency cepstral coefficients (MFCC) are used. To analyze the human voice.

하지만 가장 큰 문제점은, 발성하는 화자에 따라서 목소리의 크기나 주파수 영역이 달라진다는 것인데, 화자종속(speaker dependent service) 서비스(예를 들어, 휴대폰 음성 다이얼링 서비스)에서는 높은 인식률을 얻은 반면에 화자독립 서비스(speaker independent service)(예를 들어, ARS 음성인식 전화번호안내/예매 서비스, 로봇 서비스 등)인 경우에는 인식률이 아직까지도 저조한 실정이다.However, the biggest problem is that the size or frequency range of the voice varies according to the speaker who speaks. In speaker dependent service (eg, mobile phone voice dialing service), the recognition rate is high, whereas the speaker independent service is used. (speaker independent service) (eg, ARS voice recognition phone number guide / booking service, robot service, etc.), the recognition rate is still low.

즉, 인간중심의 서비스 산업에 있어서 중요한 요소 중의 하나가 화자독립 서비스 구조의 고객의 감정을 빠르게 인식하여 대처하는 것인데, 기존의 음성 인식 기술에서는 이와 같은 화자독립 서비스에서의 만족도는 매우 낮은 것이 현실이다.In other words, one of the important factors in the human-oriented service industry is to quickly recognize and cope with the emotions of the customers of the speaker-independent service structure. In the existing voice recognition technology, the satisfaction of the speaker-independent service is very low. .

이에 본 발명에서는, 인식된 음성 정보를 스펙트로그램(spectrogram)으로 변환하고, 변환된 스펙트로그램에서 주파수 축 정보 벡터를 분리하여 감정 인식 정보로 출력함으로써, 화자독립에서 높은 인식률을 획득할 수 있는 감정 인식 기술을 마련하고자 한다.Accordingly, in the present invention, by converting the recognized speech information into a spectrogram, by separating the frequency axis information vector from the transformed spectrogram and outputting it as emotion recognition information, emotion recognition capable of obtaining a high recognition rate in speaker independence I want to come up with technology.

본 발명의 과제를 해결하기 위한 일 관점에 따르면, 입력되는 음성신호를 스 펙트로그램으로 변환하는 스펙트로그램 변환부와, 상기 변환된 스펙트로그램에서 모음 성분을 추출하는 제로 크로싱 검출부와, 상기 추출된 모음 성분을 시간 축 정보와 주파수 축 정보로 분리한 후 상기 주파수 축 정보의 벡터 성분을 트레이닝 데이터베이스에 저장하는 다변수 데이터 분리부와, 상기 저장되는 주파수 축 정보의 벡터 성분에 대해 매칭 검사를 실시하고, 상기 매칭 검사의 실시 결과에 따른 감정 인식 결과 데이터를 출력하는 매칭부를 포함하는 감정 인식 장치를 제공한다.According to an aspect of the present invention, a spectrogram converter for converting an input speech signal into a spectrogram, a zero crossing detection unit for extracting a vowel component from the converted spectrogram, and the extracted After the vowel component is separated into time axis information and frequency axis information, a multivariate data separation unit for storing the vector component of the frequency axis information in a training database and a matching test are performed on the vector component of the stored frequency axis information. And an matching unit configured to output emotion recognition result data according to a result of the matching test.

본 발명의 과제를 해결하기 위한 다른 관점의 일 실시예에 따르면, 입력되는 음성신호를 스펙트로그램으로 변환하는 과정과, 상기 변환된 스펙트로그램을 각각의 감정별로 구분하고, 상기 감정별로 각각 구분된 스펙트로그램에 대해 비 능동 매트릭스 분해 기능을 적용하여 각각의 베이시스 벡터를 추출하는 과정과, 상기 추출된 각각의 베이시스 벡터에 대해 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력하는 과정을 포함하는 감정 인식 방법을 제공한다.According to another embodiment of the present invention, a process of converting an input voice signal into a spectrogram, and classifying the converted spectrogram for each emotion and for each spectrogram An emotion recognition method includes extracting each basis vector by applying an inactive matrix decomposition function to a gram, and performing a matching test on the extracted basis vectors to output emotion recognition result data. to provide.

본 발명의 과제를 해결하기 위한 다른 관점의 다른 실시예에 따르면, 입력되는 음성신호를 스펙트로그램으로 변환하는 과정과, 상기 변환된 스펙트로그램을 각각의 감정별로 구분하고, 상기 감정별로 각각 구분된 스펙트로그램에서 스펙트로그램 파일별로 1차 비 능동 매트릭스 분해 기능을 적용하여 각각의 파일별 베이시스 벡터를 추출하는 과정과, 상기 추출된 각각의 파일별 베이시스 벡터를 하나의 그룹으로 합친 후 2차 비 능동 매트릭스 분해 기능을 적용하여 감정별 베이시스 그룹 벡터를 추출하는 과정과, 상기 추출된 감정별 베이시스 그룹 벡터를 취합한 후 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력하는 과정을 포함하는 감정 인식 방법을 제공한다.According to another embodiment of another aspect of the present invention, a process of converting an input voice signal into a spectrogram, and classifying the converted spectrogram for each emotion and for each spectrogram The process of extracting the basis vector of each file by applying the first-order inactive matrix decomposition function for each spectrogram file in the gram, and combining the extracted basis vector of each file into one group, and then decomposing the second inactive matrix. The present invention provides an emotion recognition method including a process of extracting a basis group vector for each emotion by applying a function, and outputting emotion recognition result data by performing a matching test after collecting the extracted basis group for each emotion.

본 발명의 과제를 해결하기 위한 다른 관점의 또 다른 실시예에 따르면, 입력되는 음성신호를 스펙트로그램으로 변환하는 과정과, 상기 변환된 스펙트로그램을 화자별로 구분하고, 상기 화자별로 각각 구분된 스펙트로그램에서 스펙트로그램 파일별로 1차 비 능동 매트릭스 분해 기능을 적용하여 화자 각각의 파일별 베이시스 벡터를 추출하는 과정과, 상기 추출된 화자 각각의 파일별 베이시스 벡터를 하나의 그룹으로 합친 후 2차 비 능동 매트릭스 분해 기능을 적용하여 화자별 베이시스 그룹 벡터를 추출하는 과정과, 상기 추출된 화자별 베이시스 그룹 벡터를 취합한 후 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력하는 과정을 포함하는 감정 인식 방법을 제공한다.According to another embodiment of the present invention to solve the problem of the present invention, the process of converting the input voice signal into a spectrogram, and the spectrogram divided by the speaker, the spectrogram divided by each speaker Extracts the basis vector of each speaker by applying the first inactive matrix decomposition function for each spectrogram file, and combines the basis vector of each file of the extracted speakers into one group and then adds the second inactive matrix. It provides a emotion recognition method comprising the step of extracting the basis group basis for each speaker by applying a decomposition function, and performing a matching test after collecting the extracted basis group basis vector for each speaker, and outputs the emotion recognition result data .

본 발명에 의하면, ARS 음성인식 전화번호안내 서비스, ARS 음성인식 차표예매 서비스, 로봇의 감정인식 서비스 등과 같은 화자독립 서비스(speaker independent service)에서 화자의 감정을 분석하여 감정 인식률을 높임으로써, 서비스 만족도를 높이고 음성 인식 서비스 시장의 활성화를 꾀할 수 있다.According to the present invention, the service satisfaction by increasing the emotion recognition rate by analyzing the emotion of the speaker in a speaker independent service, such as ARS voice recognition telephone number guide service, ARS voice recognition ticket booking service, robot emotion recognition service, etc. It is possible to raise the market price and revitalize the voice recognition service market.

먼저 본 발명은, 주파수의 분포와 감정의 특징들 간의 상관 관계를 이용한 것을 특징으로 한다.First, the present invention is characterized by using the correlation between the distribution of the frequency and the characteristics of the emotion.

도 1은 화자(speaker)의 몇 가지 감정, 예컨대 평정(neutral), 분노(angry), 기쁨(happy), 슬픔(sad) 등의 감정을 스펙트로그램(spectrogram)으로 각각 예시한 것이다.FIG. 1 illustrates spectrograms of several emotions of a speaker, such as emotions such as neutral, anger, happy, sad, and the like.

도 1의 각각의 예시도에서 가로 축은 시간 축을 나타내고 세로 축은 주파수 축을 나타내는데, 주파수 축은 변화분석이 용이하도록, 예를 들면 로그 스케일(log scale)로 표현할 수 있다.In each exemplary diagram of FIG. 1, the horizontal axis represents the time axis and the vertical axis represents the frequency axis, and the frequency axis may be represented by, for example, a log scale to facilitate change analysis.

일반적으로 감정은 성도의 변화에 큰 영향을 미친다. 예컨대, 일정한 목소리가 나오는 화자의 평상시 감정에서는 성도의 면적이 넓게 유지되나, 화자가 화가 나게 되면 목에 힘을 주게 되고 성도의 면적은 줄어들게 된다. 이러한 성도의 변화는 성도 벽에 의한 손실을 증가시키게 된다.In general, emotions have a big impact on the change in saints. For example, in the normal emotion of a speaker who has a certain voice, the area of saints is kept wide, but when the speaker becomes angry, the area of saints is strengthened and the area of saints is reduced. This change in sainthood increases the losses caused by saint walls.

이러한 성도의 변화는 주파수 영역에 영향을 미치게 되는데, 도 1에서 보는 바와 같이, 화가 나거나(분노), 슬플 때(슬픔) 주파수 성분이 위쪽으로 향하는 것을 알 수 있다.This change in sainthood affects the frequency domain. As shown in FIG. 1, it can be seen that the frequency component is directed upward when angry (anger) or sad (sorrow).

본 발명에서는 이와 같이 감정별로 주파수의 분포가 상이하다는 점을 이용하여, 입력된 음성신호를 스펙트로그램으로 변환하고, 변환된 스펙트로그램에서 모음 성분만을 추출하는 제로 크로싱(zero-crossing) 과정을 수행하며, 제로 크로싱 과정의 수행에 의해 추출된 스펙트로그램의 모음 성분을 시간 축 정보와 주파수 축 정보로 분리한 후 주파수 축 정보의 벡터 성분만을 트레이닝(training)시키고, 트레이닝 결과를 매칭시켜 감정 인식 결과 데이터를 출력하는 것을 특징으로 한다.In the present invention, using the fact that the frequency distribution is different for each emotion, the input voice signal is converted into a spectrogram, and a zero-crossing process of extracting only a vowel component from the converted spectrogram is performed. After separating the vowel components of the spectrogram extracted by performing the zero-crossing process into time-axis information and frequency-axis information, train only the vector components of the frequency-axis information, and match the training results to match the emotion recognition result data. It is characterized by outputting.

즉, 제로 크로싱 과정에 의해 구해진 스펙트로그램을 다변수 데이터 분리 기 법을 이용하여 시간 축 정보와 주파수 축 정보로 분리하고, 주파수 축 정보만을 트레이닝 DB(Data-Base)에 저장한다. 테스트를 하기 위하여 새로 입력을 받고 전처리 과정을 거친 다음 매칭 파트에서 매칭 검사를 실시하여 유사한 스펙트로 분포, 예컨대 감정 인식 결과 데이터를 구하게 된다.That is, the spectrogram obtained by the zero crossing process is separated into time axis information and frequency axis information using a multivariate data separation technique, and only the frequency axis information is stored in a training DB (Data-Base). In order to perform a test, a new input is input, a preprocessing process is performed, and a matching test is performed on the matching part to obtain a similar spectro distribution, eg, emotion recognition result data.

연속 데이터 베이스를 사용하는 경우에는 연속 데이터베이스 파라미터가 저장된 데이터베이스를 이용하게 되는데, 이때는, 은닉 마르코프 모델(Hidden Markov Model, HMM)이 추가될 수 있을 것이다. 연속 모델을 이용할 경우 감정의 변화를 제한하게 되는데 제한된 감정의 변화로 인하여 인식률이 증가할 수 있다.In the case of using a continuous database, a database in which continuous database parameters are stored is used. In this case, a Hidden Markov Model (HMM) may be added. The use of a continuous model limits the change of emotion, which can increase the recognition rate due to the limited change of emotion.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대하여 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail an embodiment of the present invention.

도 2는 본 실시예에 따른 음성 인식 장치에 대한 구성 블록도로서, 음성신호 획득부(100), 스펙트로그램(spectrogram) 변환부(102), 제로 크로싱(zero-crossing) 검출부(104), 다변수 데이터 분리부(106), 트레이닝(training) DB(108), 매칭부(110), HMM(Hidden Markov Model)부(112), 연속변수 DB(114)를 포함한다.FIG. 2 is a block diagram illustrating a speech recognition apparatus according to the present embodiment, which includes a speech signal acquisition unit 100, a spectrogram converter 102, a zero-crossing detection unit 104, and The variable data separating unit 106, a training DB 108, a matching unit 110, a Hidden Markov Model (HMM) unit 112, and a continuous variable DB 114 are included.

도 2에 도시한 바와 같이, 음성신호 획득부(100)는 도시 생략된 마이크(microphone) 등을 통해 입력되는 음성신호, 예컨대 실시간(real-time) 정보 또는 파일 형태의 음성신호를 획득하는 역할을 한다.As shown in FIG. 2, the voice signal acquisition unit 100 acquires a voice signal input through a microphone (not shown), for example, a real-time information or a voice signal in the form of a file. do.

스펙트로그램 변환부(102)는 음성신호 획득부(100)를 통해 획득된 음성신호 를 스펙트로그램으로 변환하는 역할을 한다. 즉, 음성신호 획득부(100)를 통해 음성신호가 획득되면, 획득된 음성신호를 특징 벡터인 스펙트로그램으로 변경할 필요가 있는데, 이러한 스펙트로그램은 화자의 특징 감정을 주파수 분포로 표현하는데 용이하게 활용될 수 있다.The spectrogram converter 102 converts the voice signal obtained through the voice signal acquisition unit 100 into a spectrogram. That is, when a voice signal is acquired through the voice signal acquisition unit 100, it is necessary to change the acquired voice signal into a spectrogram which is a feature vector. The spectrogram is easily utilized to express the feature emotion of a speaker in a frequency distribution. Can be.

구체적으로, 스펙트로그램 변환부(102)는 음성신호 획득부(100)로부터의 음성신호를 일정 시간의 프레임 단위, 예컨대 40ms의 프레임 단위로 30ms가 겹치도록 끊어서 윈도우(window)를 적용한다. 이때 윈도우는, 예컨대 해밍 윈도우(Hamming Window)를 적용될 수 있다. 최종적으로 스펙트로그램 변환부(102)는 이러한 해밍 윈도우가 적용된 음성신호에 대해 변환 기법, 예컨대 FFT(Fast Fourier Transform)를 적용하여 음성신호를 스펙트로그램으로 변경할 수 있을 것이다.Specifically, the spectrogram converter 102 applies a window by breaking the voice signal from the voice signal acquisition unit 100 so that 30 ms overlaps in a frame unit of a predetermined time, for example, a 40 ms frame unit. In this case, the window may be, for example, a Hamming Window. Finally, the spectrogram converter 102 may change a speech signal into a spectrogram by applying a transformation technique, for example, a fast fourier transform (FFT), to the speech signal to which the hamming window is applied.

제로 크로싱 검출부(104)는 상술한 스펙트로그램 변환부(102)를 통해 변환된 스펙트로그램에서 모음 성분만을 추출, 즉 변환된 스펙트로그램에서 노이즈, 묵음, 자음 등이 제거된 스펙트로그램만을 추출하는 역할을 한다.The zero crossing detection unit 104 extracts only vowel components from the spectrogram converted through the spectrogram converter 102 described above, that is, extracts only the spectrogram from which the noise, silence, consonants, etc. have been removed from the converted spectrogram. do.

일반적인 스펙트로그램 자체의 데이터 량은, 시간 축으로의 데이터 량(프레임 개수)과 주파수 축으로의 데이터 량의 곱이기 때문에, 이와 같이 방대한 정보를 음성 인식 장치에서 처리하기에는 역부족이다. 따라서, 스펙트로그램의 필요하지 않는 부분을 제거하고 데이터를 압축할 필요가 있는데, 제로 크로싱 검출부(104)를 통해 음성신호에서 필요 없는 묵음 성분과 노이즈로 취급되는 자음 성분을 제거하여 더욱 신뢰도 높은 결과를 얻을 수 있다.Since the data amount of the general spectrogram itself is the product of the data amount (number of frames) on the time axis and the data amount on the frequency axis, it is not sufficient to process such a large amount of information in the speech recognition apparatus. Therefore, it is necessary to remove unnecessary parts of the spectrogram and compress the data. The zero crossing detection unit 104 removes unnecessary silence components and consonant components treated as noise from the audio signal, thereby providing more reliable results. You can get it.

[수학식 1]은 원본 스펙트로그램에서 제로 크로싱과 에너지 평균을 구하여 모음 성분만을 추출하는 경우를 예시한 것이다.Equation 1 exemplifies a case where only a vowel component is extracted by obtaining a zero crossing and an energy average from an original spectrogram.

여기서, fs는 샘플링 주파수로서, 예를 들면 16,000 Hz가 적용될 수 있다. N은 샘플의 프레임의 길이, Sn은 샘플의 진폭을 각각 나타낸다. num_cross는 프레임 단위 동안 제로 크로싱된 총 량을 나타내며, Range는 입력신호에서 최대값과 최소값의 차이를 나타낸다. MaxRange는 최대 표현 가능한 범위로서, 예컨대 16bit를 사용하기 때문에 값은 65536이 될 수 있다.Here, fs is a sampling frequency, for example, 16,000 Hz may be applied. N represents the length of the frame of the sample, Sn represents the amplitude of the sample, respectively. num_cross represents the total amount of zero crossings during the frame unit, and Range represents the difference between the maximum value and the minimum value in the input signal. MaxRange is the maximum representable range. For example, since 16 bits are used, the value may be 65536.

따라서, 음성신호가 입력되면 스펙트로그램 변환부(102)를 통해 음성신호를 스펙트로그램으로 변환하고, 변환된 스펙트로그램은 제로 크로싱 검출부(104)의 제로 크로싱 검사를 통해 모음을 제외한 자음 및 노이즈 성분이 제거된다. 자음과 노이즈 성분은 이후에 사용될 다변수 데이터 분리부(106)에서 변동폭을 크게 만들어서 인식률을 떨어뜨리는 원인이 될 수 있다.Therefore, when the voice signal is input, the spectrogram converter 102 converts the voice signal into a spectrogram, and the converted spectrogram has consonants and noise components except vowels through the zero crossing test of the zero crossing detector 104. Removed. The consonant and noise components may cause a large variation in the multivariable data separator 106 to be used later, thereby causing a decrease in recognition rate.

다변수 데이터 분리부(106)는 다변수 데이터를 분리하는 역할을 하는 것으로, 예컨대 비 능동 매트릭스 분해(Non-negative Matrix Factorization, NMF) 기능이 포함될 수 있다. 이러한 다변수 데이터 분리부(106)는 제로 크로싱 검출 부(104)를 통해 추출된 스펙트로그램의 모음 성분을 시간 축 정보와 주파수 축 정보로 분리한 후, 주파수 축 정보의 벡터 성분만을 트레이닝 DB(108)에 저장하는 역할을 한다.The multivariate data separator 106 separates multivariate data, and may include, for example, non-negative matrix factorization (NMF). The multivariate data separator 106 separates the vowel component of the spectrogram extracted through the zero crossing detection unit 104 into time axis information and frequency axis information, and then trains only the vector component of the frequency axis information. ) To save.

이러한 다변수 데이터 분리부(106)에서는, 매트릭스의 모든 값이 영(0)보다 큰 양수 값을 갖는 것을 전제로 하는데, 다음 [수학식 2]와 같이 비 능동 매트릭스 요소 V가 주어졌을 때 매트릭스 요소 W(Width)와 H(Height)를 구할 수 있다.In this multivariate data separator 106, it is assumed that all values of the matrix have positive values greater than zero, and the matrix element is given when the inactive matrix element V is given by Equation 2 below. W (Width) and H (Height) can be obtained.

주어진 다변수 데이터를 갖는 비 능동 매트릭스 요소 V가 n*m 차수를 가질 때, 매트릭스 요소 W는 n*r의 차수를 갖게 되고, 매트릭스 요소 H는 r*m의 차수를 가지게 된다. 이때, r은 n과 m보다 작은 값이 선택되며, 매트릭스 요소 W와 H는 주어진 비 능동 매트릭스 요소 V의 압축된 버전, 즉 비 능동 매트릭스 요소 V보다 작아지게 되는 결과를 낳게 된다. 그리고 r은, 실험적인 선택에 의해, 예를 들면 1로 선택될 수 있는데, 그 이유는 문장마다 하나의 감정이 포함되어 있다고 가정을 하기 때문이다.When an inactive matrix element V with a given multivariate data has an order of n * m, matrix element W has an order of n * r, and matrix element H has an order of r * m. In this case, r is smaller than n and m, and the matrix elements W and H become smaller than the compressed version of the given inactive matrix element V, that is, the inactive matrix element V. And r can be selected, for example, by 1 by experimental selection, since it is assumed that one sentence contains one emotion.

한편, 비용(cost) 함수는, 주어진 비 능동 매트릭스 요소 V와 추출된 매트릭스 요소 W 및 H의 유클리디안 거리(euclidean distance)의 제곱과의 관계식으로 표현될 수 있으며, 이는 다음 [수학식 3]에 예시한 바와 같다.On the other hand, the cost function may be expressed as a relation between a given inactive matrix element V and the square of the euclidean distance of the extracted matrix elements W and H, which is represented by Equation 3 As illustrated in.

이와 같은 [수학식 3]에 대해 특정 프로그램, 예를 들면 EM(Expectation Maximization) 프로그램을 사용하여 코딩(coding)을 진행하는 경우에는, 코딩 과정이 반복(iteration)을 하게 된다. 따라서, 비용 함수 값을 각 반복 구간마다 구하여 이전 값과 비교하여 종료 순간을 결정할 필요가 있다. 예컨대, 이전 값과의 에러가 줄어들다가 사용자가 원하는 구간 내에 들어오면 EM 프로그램을 종료할 수 있다.When coding is performed using a specific program, for example, an EM (Expectation Maximization) program, the coding process is iterated. Therefore, it is necessary to determine the end point by comparing the previous value with the cost function value for each iteration. For example, when the error with the previous value decreases and the user enters the desired interval, the EM program may be terminated.

이러한 [수학식 3]을 자승(squared) 거리를 이용하여 계산하면, 다음 [수학식 4]와 같은 업데이트 룰(update rule)이 구현될 수 있다.If Equation 3 is calculated using a squared distance, an update rule as shown in Equation 4 may be implemented.

처음 EM 프로그램이 시작될 때, 매트릭스 요소 W와 H에 랜덤(random)한 값을 입력하고, [수학식 4]의 업데이트 룰에 맞게 EM 프로그램을 구동시키면 자동으로 업데이트가 진행된다. 그리고 상술한 비용 함수를 이용하여 EM 프로그램을 종료해주면 된다.When the EM program is started for the first time, a random value is input to the matrix elements W and H, and the EM program is automatically updated according to the update rule of [Equation 4]. The EM program may be terminated using the cost function described above.

도 3은 EM 프로그램을 이용하여 구현된 다변수 데이터 분리부(106)의 비 능동 매트릭스 분해 과정을 예시한 것이다.3 illustrates an inactive matrix decomposition process of the multivariate data separator 106 implemented using an EM program.

도 3에 예시한 바와 같이, V는 비 능동 매트릭스 요소인 입력 매트릭스(input matrix)로서 스펙트로그램이 포함될 수 있다. 이때, 입력되는 스펙트로그램은 로그(log) 값이 아닌 리얼(real) 값이어야만 하는데, 그 이유는 비 능동 매트릭스 분해시 영보다 작은 값은 입력할 수 없다는 제한이 있기 때문이다.As illustrated in FIG. 3, V may include a spectrogram as an input matrix that is an inactive matrix element. In this case, the input spectrogram should be a real value, not a log value, because there is a restriction that a value smaller than zero cannot be input during inactive matrix decomposition.

이후, 전처리 과정을 거친 스펙트로그램을 적용하여 상술한 [수학식 4]의 업데이트 룰에 따라 업데이트를 하면, r값이 1인 벡터 성분의 매트릭스 요소 W와 H를 구할 수 있게 된다.Subsequently, by applying the spectrogram subjected to the preprocessing and updating according to the above-described update rule of [Equation 4], matrix elements W and H of the vector component having r value 1 can be obtained.

여기서, 매트릭스 요소 W는 주파수 축으로의 베이시스 정보(basis information) 또는 베이시스 벡터를 의미하고, 매트릭스 요소 H는 시간 축으로의 엔코딩 정보(encoding information) 또는 엔코딩 벡터를 각각 의미한다. 앞에서도 언급하였듯이, 비 능동 매트릭스 분해는 다변수 데이터를 분리하는 기능을 포함하는 바, 본 실시예에서는 제로 크로싱 검출부(104)를 거친 스펙트로그램에서 주파수 축의 정보와 시간 축의 정보를 분리하는 것이다.Herein, the matrix element W means basis information or basis vector on the frequency axis, and the matrix element H means encoding information or encoding vector on the time axis, respectively. As mentioned above, inactive matrix decomposition includes a function of separating multivariate data. In this embodiment, the information on the frequency axis and the time axis is separated from the spectrogram passed through the zero crossing detection unit 104.

이때, 엔코딩 정보인 매트릭스 요소 H는 비 능동 매트릭스 요소인 입력 매트릭스 V의 시간 축으로의 크기를 잘 따라가고 있음을 알 수 있으며, 베이시스 정보인 매트릭스 요소 W는 입력 매트릭스 V의 주파수 축으로의 변화를 잘 따라가고 있음을 알 수 있다.In this case, it can be seen that the matrix element H, which is encoding information, follows the magnitude of the input matrix V, which is an inactive matrix element, well along the time axis, and the matrix element W, which is the basis information, changes the frequency axis of the input matrix V. You can see that it is following well.

이와 같이, 본 실시예에서는 비 능동 매트릭스 분해를 이용하여 시간 축 정 보(시간 축 벡터 성분)와 주파수 축 정보(주파수 축 벡터 성분)를 분리한 다음, 시간 축 정보인 매트릭스 요소 H는 제거하고 주파수 축 정보인 매트릭스 요소 W만을 감정 인식에 적용한 것을 특징으로 한다.As described above, in this embodiment, the time axis information (time axis vector component) and the frequency axis information (frequency axis vector component) are separated using inactive matrix decomposition, and then the matrix element H, which is time axis information, is removed and the frequency is removed. It is characterized in that only the matrix element W as the axis information is applied to the emotion recognition.

한편, 트레이닝 DB(108)는 다변수 데이터 분리부(106)를 통해 분리된 주파수 축 정보, 즉 주파수 축 벡터 성분인 매트릭스 요소 W를 저장하는 역할을 한다.Meanwhile, the training DB 108 stores the frequency axis information separated through the multivariate data separator 106, that is, the matrix element W that is a frequency axis vector component.

매칭부(110)는 트레이닝 DB(108)에 저장된 음성 신호의 주파수 축 벡터 성분인 매트릭스 요소 W에 대해 매칭 검사, 예컨대 유클리디안 거리 매칭 검사를 실시하고, 매칭 검사 실시에 따른 감정 인식 결과 데이터를 출력하는 역할을 한다.The matching unit 110 performs a matching test, for example, Euclidean distance matching test, on the matrix element W, which is a frequency axis vector component of the voice signal stored in the training DB 108, and applies the emotion recognition result data according to the matching test. It plays a role of outputting.

HMM(Hidden Markov Model)부(112)는 연속변수 파라미터가 저장된 연속변수 DB(114)를 사용할 경우에 감정의 변화를 제한하기 위한 은닉 마르코프 모델로서, 음성신호의 인식률 증가를 위한 추가적인 구성 요소에 포함될 수 있다.Hidden Markov Model (HMM) unit 112 is a hidden Markov model for limiting the change of emotion when using the continuous variable DB 114 in which the continuous variable parameter is stored and is included in an additional component for increasing the recognition rate of a speech signal. Can be.

이하, 상술한 구성과 함께, 본 실시예에 따른 다변수 데이터 분리부(106)의 화자독립에서 높은 인식률을 갖는 감정 인식 방법의 실시 형태들을 도 4 내지 도 6을 참조하여 구체적으로 설명하기로 한다.Hereinafter, embodiments of an emotion recognition method having a high recognition rate in speaker independence of the multivariate data separation unit 106 according to the present embodiment will be described in detail with reference to FIGS. 4 to 6. .

먼저, 도 4는 본 발명에 따른 감정 인식 방법의 일 실시예로서, 각각의 감정별로 구분된 스펙트로그램(S1∼S4)에 대해 비 능동 매트릭스 분해 기능을 적용하는 다변수 데이터 분리부(F1∼F4)와, 다변수 데이터 분리부(F1∼F4)를 통해 추출된 각각의 베이시스 벡터(v1∼v4)에 대해 매칭 검사를 실시하는 매칭부(110)를 포함한다.First, FIG. 4 is an embodiment of an emotion recognition method according to the present invention. The multivariate data separation unit F1 to F4 applies an inactive matrix decomposition function to spectrograms S1 to S4 classified for each emotion. ) And a matching unit 110 that performs matching check on each basis vector v1 to v4 extracted through the multivariate data separation units F1 to F4.

다변수 데이터 분리부(F1∼F4)는, 예컨대‘평정’을 나타내는 감정 스펙트로 그램(S1)에 대해 비 능동 매트릭스 분해 기능을 적용하는 제 1 다변수 데이터 분리부(F1),‘분노’를 나타내는 감정 스펙트로그램(S2)에 대해 비 능동 매트릭스 분해 기능을 적용하는 제 2 다변수 데이터 분리부(F2),‘기쁨’을 나타내는 감정 스펙트로그램(S3)에 대해 비 능동 매트릭스 분해 기능을 적용하는 제 3 다변수 데이터 분리부(F3),‘슬픔’을 나타내는 감정 스펙트로그램(S4)에 대해 비 능동 매트릭스 분해 기능을 적용하는 제 4 다변수 데이터 분리부(F4)를 포함할 수 있다.The multivariate data separators F1 to F4 represent, for example, the first multivariate data separator F1 and 'anger' to which an inactive matrix decomposition function is applied to the emotion spectrogram S1 representing 'rating'. A second multivariate data separation unit F2 that applies inactive matrix decomposition to emotion spectrogram S2, and a third to apply inactive matrix decomposition to emotion spectrogram S3 representing 'joy' The multivariate data separator F3 may include a fourth multivariate data separator F4 that applies an inactive matrix decomposition function to the emotion spectrogram S4 representing 'sorrow'.

이때, 각각의 스펙트로그램(S1, S2, S3, S4)은 각각 N개의 스펙트로그램 파일(sp1∼spN)을 포함할 수 있는데, 예를 들어‘평정’을 나타내는 감정 스펙트로그램(S1)의 N개의 스펙트로그램 파일(sp1∼spN)을 하나의 매트릭스로 합친 후 제 1 다변수 데이터 분리부(F1)를 통해 비 능동 매트릭스 분해 기능을 적용하여 매트릭스 요소 W인 제 1 베이시스 벡터(v1)를 추출할 수 있다.In this case, each of the spectrograms S1, S2, S3, and S4 may include N spectrogram files sp1 to spN, for example, N pieces of emotion spectrograms S1 indicating 'rating'. After combining the spectrogram files sp1 to spN into one matrix, the first basis vector v1 which is the matrix element W may be extracted by applying an inactive matrix decomposition function through the first multivariate data separator F1. have.

마찬가지로,‘분노’,‘기쁨’,‘슬픔’을 각각 나타내는 감정 스펙트로그램(S2)(S3)(S4)의 각각의 N개의 스펙트로그램 파일(sp1∼spN)을 하나의 매트릭스로 합친 후 제 2 다변수 데이터 분리부(F2), 제 3 다변수 데이터 분리부(F3), 제 4 다변수 데이터 분리부(F4)를 통해 비 능동 매트릭스 분해 기능을 각기 적용하여 매트릭스 요소 W인 제 2 베이시스 벡터(v2), 제 3 베이시스 벡터(v3), 제 4 베이시스 벡터(v4)를 각각 추출할 수 있을 것이다.Similarly, each of the N spectrogram files sp1 to spN of the emotion spectrograms S2, S3, and S4 representing 'anger', 'joy' and 'sorrow', respectively, are combined into a matrix, and then the second The second basis vector, which is the matrix element W, is applied by applying the inactive matrix decomposition function through the multivariate data separator F2, the third multivariate data separator F3, and the fourth multivariate data separator F4, respectively. v2), the third basis vector v3 and the fourth basis vector v4 may be extracted, respectively.

매칭부(110)는 이들 제 1 다변수 데이터 분리부(F1), 제 2 다변수 데이터 분리부(F2), 제 3 다변수 데이터 분리부(F3) 및 제 4 다변수 데이터 분리부(F4)를 통해 추출된 각각의 베이시스 벡터(v1∼v4)에 대해 매칭 검사, 예컨대 유클리디안 거 리 매칭 검사를 실시하고, 매칭 검사 실시에 따른 감정 인식 결과 데이터를 출력한다. 또한, 매칭부(110)는, 테스트를 위한 새로운 입력값, 즉 테스트 정보에 대한 베이시스 벡터(vt)를 테스트용 다변수 데이터 분리부(Ft)를 통해 추출하고, 추출된 테스트 정보에 대한 베이시스 벡터(vt)에 대해 유클리디안 거리 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력할 수 있다.The matching unit 110 includes these first multivariate data separators F1, second multivariate data separators F2, third multivariate data separators F3, and fourth multivariate data separators F4. A matching test, for example, Euclidean distance matching test, is performed on each basis vector v1 to v4 extracted through the step, and the emotion recognition result data according to the matching test is output. In addition, the matching unit 110 extracts a new input value for a test, that is, a basis vector (vt) for test information through the multivariate data separating unit (Ft) for testing, and a basis vector for the extracted test information. Euclidean distance matching test for (vt) can be performed to output the emotion recognition result data.

이와 같은 도 4의 실시예는, 이후에 스펙트로그램이 추가될 때마다 전체 트레이닝을 재 수행할 필요가 있으며, 비교적 방대한 데이터가 하나의 베이시스 벡터로 압축되기 때문에 변동율이 클 수 있다.This embodiment of FIG. 4 needs to re-perform the entire training every time a spectrogram is added later, and the rate of change can be large because relatively large amounts of data are compressed into one basis vector.

도 5는 본 발명에 따른 감정 인식 방법의 다른 실시예로서, 각각의 감정별로 구분된 임의의 스펙트로그램, 예컨대‘평정’에 해당하는 스펙트로그램(S1)에서 스펙트로그램 파일별(sp1∼spN)로 비 능동 매트릭스 분해 기능을 적용하여 파일별 베이시스 벡터(fv1∼fv5)를 추출하는 각각의 파일별 다변수 데이터 분리부(F1-1)와, 각각의 파일별 다변수 데이터 분리부(F1-1)로부터 추출된 파일별 베이시스 벡터(fv1∼fv5)를 하나의 그룹으로 합쳐서 비 능동 매트릭스 분해 기능을 적용하여 베이시스 그룹 벡터(V1)를 추출하는 그룹 다변수 데이터 분리부(F1-2)와, 그룹 다변수 데이터 분리부(F1-2)를 통해 추출된 베이시스 그룹 벡터(V1)에 대해 매칭 검사를 실시하는 매칭부(110)를 포함한다.5 is another embodiment of the emotion recognition method according to the present invention, which is a spectrogram divided by each emotion, for example, a spectrogram (S1) corresponding to 'rating' from spectrogram file (sp1 to spN) File-specific multivariate data separator (F1-1) for extracting the basis vectors fv1 to fv5 by file by applying the inactive matrix decomposition function, and multivariate data separator (F1-1) for each file A group multivariate data separator (F1-2) that combines the basis-based basis vectors (fv1 to fv5) extracted from the file into one group and extracts the basis group vector (V1) by applying an inactive matrix decomposition function. The matching unit 110 performs a matching test on the basis group vector V1 extracted through the variable data separation unit F1-2.

마찬가지로,‘분노’,‘기쁨’,‘슬픔’에 해당하는 각각의 스펙트로그램에 대해서 해당 스펙트로그램에 대한 각각의 파일별 비 능동 매트릭스 분해 기능을 적용하여 파일별 베이시스 벡터를 추출하는 각각의 파일별 다변수 데이터 분리부와, 이들 다변수 데이터 분리부로부터 추출된 파일별 베이시스 벡터를 하나의 그룹으로 합쳐서 비 능동 매트릭스 분해 기능을 적용하여 베이시스 그룹 벡터를 추출하는 그룹 다변수 데이터 분리부 등이 동일하게 적용될 수 있을 것이다.Similarly, for each spectrogram corresponding to 'anger', 'joy', and 'sorrow', each file that extracts the basis vector for each file by applying the inactive matrix decomposition for each file for the spectrogram The multivariate data divider and the group multivariate data divider that combines the basis-by-file basis vectors extracted from the multivariate data divider into one group and extracts the basis group vector by applying the inactive matrix decomposition are the same. Could be applied.

매칭부(110)는 그룹 다변수 데이터 분리부(F1-2)(Fn-2)를 통해 추출된 베이시스 그룹 벡터(V1)(Vn)에 대해 매칭 검사, 예컨대 유클리디안 거리 매칭 검사를 실시하고, 매칭 검사 실시에 따른 감정 인식 결과 데이터를 출력한다. 또한, 매칭부(110)는, 테스트를 위한 새로운 입력값, 즉 테스트 정보에 대한 베이시스 벡터(vt)를 테스트용 다변수 데이터 분리부(Ft)를 통해 추출하고, 추출된 테스트 정보에 대한 베이시스 벡터(vt)에 대해 유클리디안 거리 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력할 수 있다.The matching unit 110 performs a matching test, for example, Euclidean distance matching test, on the basis group vectors V1 and Vn extracted through the group multivariate data separation unit F1-2 and Fn-2. The emotion recognition result data according to the matching test is output. In addition, the matching unit 110 extracts a new input value for a test, that is, a basis vector (vt) for test information through the multivariate data separating unit (Ft) for testing, and a basis vector for the extracted test information. Euclidean distance matching test for (vt) can be performed to output the emotion recognition result data.

도 5의 실시예는, 각각의 스펙트로그램 파일(sp1∼spN)에서 비 능동 매트릭스 분해 기능을 선 적용하여 파일별로 베이시스 벡터(fv1∼fv5)를 추출하고, 추출된 파일별 베이시스 벡터(fv1∼fv5)를 하나의 그룹으로 합쳐서 비 능동 매트릭스 분해 기능을 한번 더 적용하는 것을 특징으로 한다.In the embodiment of FIG. 5, the basis vectors fv1 to fv5 are extracted for each file by applying an inactive matrix decomposition function to each spectrogram file sp1 to spN, and the extracted basis vector vectors fv1 to fv5. ) Is combined into one group to apply the inactive matrix decomposition function once more.

이 경우, 추후 추가 파일이 발생하여도 전체 시스템을 다시 트레이닝 할 필요 없이 추가 파일에 대해서만 비 능동 매트릭스 기능을 적용하고 다시 그룹핑 하면 된다. 방대한 양의 데이터를 다시 트레이닝하지 않아도 되기 때문에 시간적으로 이득이다. 또한, 전체 파일에 대해 한번의 비 능동 매트릭스 분해 기능을 적용하는 것이 아닌 각각의 파일에 대해 미리 비 능동 매트릭스 분해 기능을 적용하기 때문에 에러율이 상당히 줄어들 수 있으며, 각각의 파일별로 추출된 대표 베이시스 벡터, 즉 베이시스 그룹 벡터만을 사용하기 때문에 인식률이 더 높아질 것으로 기대된다.In this case, even if additional files occur later, the inactive matrix function is applied only to the additional files and grouped again without having to retrain the entire system. This is a gain in time because we don't have to retrain huge amounts of data again. In addition, since the inactive matrix decomposition function is applied to each file in advance instead of the one inactive matrix decomposition function for the entire file, the error rate can be considerably reduced, and the representative basis vector extracted for each file, That is, the recognition rate is expected to be higher because only basis group vectors are used.

도 6은 본 발명에 따른 감정 인식 방법의 또 다른 실시예로서, 임의의 화자(話者), 예컨대 화자1의 스펙트로그램(S1)에서 파일별 베이시스 벡터(fv1∼fv5)를 추출하는 각각의 화자1의 파일별 다변수 데이터 분리부(F1-1)와, 각각의 화자1의 파일별 다변수 데이터 분리부(F1-1)로부터 추출된 파일별 베이시스 벡터(fv1∼fv5)를 하나의 그룹으로 합쳐서 비 능동 매트릭스 분해 기능을 적용하여 화자1의 베이시스 그룹 벡터(V1)를 추출하는 화자1의 그룹 다변수 데이터 분리부(F1-2)와, 화자2의 스펙트로그램(S2)에서 파일별 베이시스 벡터(fv1∼fv5)를 추출하는 각각의 화자2의 파일별 다변수 데이터 분리부(F2-1)와, 화자2의 파일별 다변수 데이터 분리부(F1-1)로부터 추출된 파일별 베이시스 벡터(fv1∼fv5)를 하나의 그룹으로 합쳐서 비 능동 매트릭스 분해 기능을 적용하여 화자2의 베이시스 그룹 벡터(V2)를 추출하는 화자2의 그룹 다변수 데이터 분리부(F2-2)와, 화자1의 그룹 다변수 데이터 분리부(F1-2)를 통해 추출된 화자1의 베이시스 그룹 벡터(V1) 및 화자2의 그룹 다변수 데이터 분리부(F2-2)를 통해 추출된 화자2의 베이시스 그룹 벡터(V2)를 취합하여 매칭 검사를 실시하는 매칭부(110)를 포함한다.FIG. 6 is a further embodiment of the emotion recognition method according to the present invention, wherein each speaker extracts a file-specific basis vector fv1 to fv5 from an arbitrary speaker, for example, a speaker 1's spectrogram S1. The file-specific basis vectors fv1 to fv5 extracted from the file-specific multivariate data separation unit F1-1 of each speaker 1 and the file-specific multivariate data separation unit F1-1 of each speaker 1 are grouped together. The group multivariate data separator (F1-2) of the speaker 1 and the spectrogram (S2) of the speaker 2 which extract the basis group vector (V1) of the speaker 1 by applying the inactive matrix decomposition function. The file-specific basis vector extracted from the file-specific multivariate data separator (F2-1) of each speaker 2 (fv1 to fv5) extracted from each speaker and the file-specific multivariate data separator (F1-1) of the speaker 2 ( Speakers are applied by integrating fv1 to fv5) into one group and applying inactive matrix decomposition. The speaker group's basis multivariate data separator (F2-2) extracting the basis group vector V2 of speaker 2 and the speaker group's basis group extracted through the speaker's group multivariate data separator (F1-2) The matching unit 110 includes a matching group 110 that performs a matching test by combining the basis group vector V2 of the speaker 2 extracted through the vector V1 and the group multivariate data separation unit F2-2 of the speaker 2.

구체적으로 화자1의 파일별 다변수 데이터 분리부(F1-1)는, 화자1의 임의의 감정, 예컨대‘평정’에 해당하는 스펙트로그램(S1)에서 스펙트로그램 파일별(sp1∼spN)로 비 능동 매트릭스 분해 기능을 적용하여 파일별 베이시스 벡터(fv1∼fv5)를 추출하는 역할을 한다.Specifically, the multivariate data separating unit F1-1 for each file of the speaker 1 is non-specified by the spectrogram file sp1 to spN in the spectrogram S1 corresponding to any emotion of the speaker 1, for example, 'rating'. The active matrix decomposition function is applied to extract the basis vectors fv1 to fv5 for each file.

또한, 각각의 화자2의 파일별 다변수 데이터 분리부(F2-1)는, 화자2의 임의의 감정, 예컨대‘평정’에 해당하는 스펙트로그램(S2)에서 스펙트로그램 파일별(sp1∼spN)로 비 능동 매트릭스 분해 기능을 적용하여 파일별 베이시스 벡터(fv1∼fv5)를 추출하는 역할을 한다.In addition, the multivariate data separating unit F2-1 for each speaker 2 of each speaker 2 has a spectrogram file sp1 to spN in the spectrogram S2 corresponding to any emotion of the speaker 2, for example, 'rating'. By applying the inactive matrix decomposition function to extract the basis vector (fv1 ~ fv5) for each file.

매칭부(110)는 화자1의 그룹 다변수 데이터 분리부(F1-2)를 통해 추출된 화자1의 베이시스 그룹 벡터(V1) 및 화자2의 그룹 다변수 데이터 분리부(F2-2)를 통해 추출된 화자2의 베이시스 그룹 벡터(V2)에 대해 매칭 검사, 예컨대 유클리디안 거리 매칭 검사를 실시하고, 매칭 검사 실시에 따른 감정 인식 결과 데이터를 출력한다. 또한, 매칭부(110)는, 테스트를 위한 새로운 입력값, 즉 테스트 정보에 대한 베이시스 벡터(vt)를 테스트용 다변수 데이터 분리부(Ft)를 통해 추출하고, 추출된 테스트 정보에 대한 베이시스 벡터(vt)에 대해 유클리디안 거리 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력할 수 있다.The matching unit 110 uses the basis group vector V1 of the speaker 1 and the group multivariate data separation unit F2-2 of the speaker 2 extracted through the group multivariate data separation unit F1-2 of the speaker 1. A matching test, for example, an Euclidean distance matching test, is performed on the extracted basis group V2 of the speaker 2, and the emotion recognition result data according to the matching test is output. In addition, the matching unit 110 extracts a new input value for a test, that is, a basis vector (vt) for test information through the multivariate data separating unit (Ft) for testing, and a basis vector for the extracted test information. Euclidean distance matching test for (vt) can be performed to output the emotion recognition result data.

일반적으로 화자 개개인의 성대(聲帶)는 조금씩 다른 특징을 가지고 있는데, 특히 여성의 성대가 남성의 성대보다 30% 정도 길이가 짧다는 특징이 있다. 즉, 화자별 성대의 구조적인 특징으로 인해 화자마다 서로 다른 주파수 분포를 가지게 되는데, 이러한 특성을 무시하고 각각의 스펙트로그램 파일에서 얻어진 대표 베이시스 벡터를 같이 그룹핑 하는 경우에는 감정 인식 시스템에서 에러율을 증가시킬 수 있다.In general, the vocal cords of each speaker have a slightly different characteristic, especially that the female vocal cords are 30% shorter than the male vocal cords. In other words, due to the structural characteristics of the vocal cords of each speaker, each speaker has a different frequency distribution.If this characteristic is ignored and the representative basis vectors obtained from the spectrogram files are grouped together, the error recognition system may increase the error rate. Can be.

이에 도 6의 실시예에서는, 화자별로 각각의 감정 스펙트로그램 파일에서 얻어진 베이시스 벡터를 그룹핑하여 개인별 최종 베이시스 벡터를 추출함으로써, 에 러율을 크게 줄이도록 한 것을 특징으로 한다. 매칭 시간이 조금 증가할 수 있지만 전체적으로 인식률이 증가하고 스펙트로그램 파일이 추가되었을 때 다시 트레이닝을 반복하는 시간도 줄어들 것으로 예상된다.In the embodiment of FIG. 6, an error rate is greatly reduced by extracting the final basis vector for each individual by grouping the basis vectors obtained from each emotion spectrogram file for each speaker. The matching time may increase slightly, but overall the recognition rate increases and the time to repeat training again when spectrogram files are added is expected.

도 7은 특정 프로그램, 예컨대“Visual C++”로 프로그래밍 한 감정 인식 결과를 예시적으로 나타낸 것이다.7 shows an example of emotion recognition results programmed with a specific program, such as "Visual C ++".

입력 부분은 실시간으로 입력을 받는 부분과 파일로 입력을 받는 부분으로 구분될 수 있다. 트레이닝 부분(training part)은 감정의 종류와 파일의 개수를 입력하여 사용자가 편하게 트레이닝 할 수 있게 구성할 수 있으며, 파라미터 부분(parameter)은 내부에서 계산된 비 능동 매트릭스 분해의 매칭 확률을 나타내고 있고, 표현 부분(expression part)은 인식된 감정을 이미지 형태로 표현할 수 있을 것이다.The input part may be divided into a part receiving an input in real time and a part receiving an input in a file. The training part can be configured to allow the user to train comfortably by inputting the type of emotion and the number of files. The parameter part indicates a matching probability of the inactive matrix decomposition calculated therein. The expression part may express the perceived emotion in the form of an image.

앞서 언급한 실시예는 본 발명을 한정하는 것이 아니라 예증하는 것이며, 이 분야의 당업자라면 첨부한 청구항에 의해 정의된 본 발명의 범위로부터 벗어나는 일 없이, 많은 다른 실시예를 설계할 수 있음을 유념해야 한다. 청구항에서는, 괄호 안에 있는 어떤 참조 기호도 본 발명을 한정하도록 해석되지 않아야 한다.“포함하는”,“포함한다”등의 표현은, 전체적으로 모든 청구항 또는 명세서에 열거된 것을 제외한 구성 요소 또는 단계의 존재를 배제하지 않는다. 구성 요소의 단수의 참조부는 그러한 구성 요소의 복수의 참조부를 배제하지 않으며, 그 반대도 마찬가지이다. 본 발명은, 몇몇 별개의 구성 요소를 포함하는 하드웨어 수단 및 적절히 프로그래밍된 컴퓨터 수단에 의해 실시될 수 있을 것이다. 몇몇 수단을 열거하는 청구항에서, 이들 수단의 몇몇은 하드웨어의 같은 항목에 의해 구현될 수 있다. 서로 다른 종속항에 확실한 수단이 기술되었다고 하는 단순한 사실은, 이러한 수단의 조합이 사용될 수 없다는 것을 나타내지 않는다.The foregoing embodiments are intended to illustrate, not limit, the invention, and those skilled in the art should note that many other embodiments can be designed without departing from the scope of the invention as defined by the appended claims. do. In the claims, any reference signs placed between parentheses shall not be construed to limit the invention. The expression “comprising”, “comprises”, etc., refer to the presence of elements or steps other than those listed in all claims or in the specification as a whole. Do not exclude The singular references of components do not exclude a plurality of references of such components, and vice versa. The invention may be practiced by means of hardware comprising several distinct components and by means of suitably programmed computer means. In the claims enumerating several means, some of these means may be embodied by the same item of hardware. The simple fact that certain means are described in different dependent claims does not indicate that a combination of these means cannot be used.

도 1은 화자(speaker)의 감정들을 스펙트로그램(spectrogram)으로 각각 예시한 도면,1 is a diagram illustrating the emotions of a speaker in a spectrogram, respectively;

도 2는 본 실시예에 따른 감정 인식 장치에 대한 구성 블록도,2 is a block diagram illustrating an emotion recognition apparatus according to the present embodiment;

도 3은 EM(Expectation Maximization) 프로그램을 이용하여 구현된 다변수 데이터 분리부의 비 능동 매트릭스 분해 과정을 예시한 도면,3 is a diagram illustrating an inactive matrix decomposition process of a multivariate data separation unit implemented using an EM (Expectation Maximization) program.

도 4는 다변수 데이터 분리부의 화자독립에서 높은 인식률을 갖는 감정 인식 방법의 일 실시 형태의 구성도,4 is a configuration diagram of an embodiment of an emotion recognition method having a high recognition rate in speaker independence of a multivariate data separation unit;

도 5는 다변수 데이터 분리부의 화자독립에서 높은 인식률을 갖는 감정 인식 방법의 다른 실시 형태의 구성도,5 is a configuration diagram of another embodiment of an emotion recognition method having a high recognition rate in speaker independence of a multivariate data separation unit;

도 6은 다변수 데이터 분리부의 화자독립에서 높은 인식률을 갖는 감정 인식 방법의 또 다른 실시 형태의 구성도,6 is a configuration diagram of another embodiment of an emotion recognition method having a high recognition rate in speaker independence of a multivariate data separation unit;

도 7은 특정 프로그램으로 프로그래밍 한 감정 인식 결과를 예시적으로 나타낸 도면.7 is a diagram illustrating an example of a result of emotion recognition programmed by a specific program.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 : 음성신호 획득부 102 : 스펙트로그램 변환부100: audio signal acquisition unit 102: spectrogram conversion unit

104 : 제로 크로싱 검출부 106 : 다변수 데이터 분리부104: zero crossing detection unit 106: multivariate data separation unit

108 : 트레이닝 DB 110 : 매칭부108: training DB 110: matching unit

112 : 은닉 마르코프 모델 114 : 연속변수 DB112: hidden Markov model 114: continuous variable DB

Claims

입력되는 음성신호를 스펙트로그램으로 변환하는 스펙트로그램 변환부와,A spectrogram converter for converting an input voice signal into a spectrogram,

상기 변환된 스펙트로그램에서 모음 성분을 추출하는 제로 크로싱 검출부와,A zero crossing detection unit extracting a vowel component from the converted spectrogram,

상기 추출된 모음 성분을 시간 축 정보와 주파수 축 정보로 분리한 후 상기 주파수 축 정보의 벡터 성분을 트레이닝 데이터베이스에 저장하는 다변수 데이터 분리부와,A multivariate data separator for separating the extracted vowel components into time axis information and frequency axis information, and storing vector components of the frequency axis information in a training database;

상기 저장되는 주파수 축 정보의 벡터 성분에 대해 매칭 검사를 실시하고, 상기 매칭 검사의 실시 결과에 따른 감정 인식 결과 데이터를 출력하는 매칭부Matching unit for performing a matching test on the vector component of the stored frequency axis information, and outputs the emotion recognition result data according to the result of the matching test

를 포함하는 감정 인식 장치.Emotion recognition device comprising a.

제 1 항에 있어서,The method of claim 1,

상기 주파수 축 정보의 벡터 성분은,The vector component of the frequency axis information is

매트릭스 요소 W(Width)의 베이시스 정보인 감정 인식 장치.Emotion recognition device which is the basis information of matrix element W (Width).

제 1 항에 있어서,The method of claim 1,

상기 다변수 데이터 분리부는,The multivariate data separation unit,

비 능동 매트릭스 분해(Non-negative Matrix Factorization, NMF) 기능을 포 함하는 감정 인식 장치.Emotion recognition device with non-negative matrix factorization (NMF).

제 1 항에 있어서,The method of claim 1,

상기 매칭 검사는, 유클리디안 거리(euclidean distance) 매칭 검사인 감정 인식 장치.And the matching test is an euclidean distance matching test.

제 1 항에 있어서,The method of claim 1,

상기 장치는,The device,

연속변수 파라미터가 저장된 연속변수 데이터베이스를 사용할 경우에 감정의 변화를 제한하기 위한 은닉 마르코프 모델(hidden markov model)을 더 포함하는 감정 인식 장치.And a hidden markov model for limiting emotion changes when using a continuous variable database storing continuous variable parameters.

제 1 항에 있어서,The method of claim 1,

상기 다변수 데이터 분리부는,The multivariate data separation unit,

주어진 비 능동 매트릭스 요소와 상기 주파수 축 정보의 벡터 성분 및 상기 시간 축 정보의 벡터 성분의 유클리디안 거리의 제곱과의 관계식에 의해 비용 함수를 계산하는 감정 인식 장치.And a cost function calculated by a relation between a given inactive matrix element and the square of the Euclidean distance of the vector component of the frequency axis information and the vector component of the time axis information.

입력되는 음성신호를 스펙트로그램으로 변환하는 과정과,Converting the input voice signal into a spectrogram,

상기 변환된 스펙트로그램을 각각의 감정별로 구분하고, 상기 감정별로 각각 구분된 스펙트로그램에 대해 비 능동 매트릭스 분해 기능을 적용하여 각각의 베이시스 벡터를 추출하는 과정과,Dividing the transformed spectrogram for each emotion, and extracting each basis vector by applying an inactive matrix decomposition function to the spectrogram divided for each emotion;

상기 추출된 각각의 베이시스 벡터에 대해 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력하는 과정A process of outputting emotion recognition result data by performing matching test on each extracted basis vector

을 포함하는 감정 인식 방법.Emotion recognition method comprising a.

제 7 항에 있어서,The method of claim 7, wherein

상기 감정별로 각각의 구분된 스펙트로그램내에 포함되는 N개의 스펙트로그램 파일을 하나의 매트릭스로 합치는 과정과,Combining the N spectrogram files included in each separated spectrogram for each emotion into a matrix;

상기 합쳐진 하나의 매트릭스에 대해 상기 비 능동 매트릭스 분해 기능을 적용하는 과정Applying the inactive matrix decomposition function to the combined matrix

을 더 포함하는 감정 인식 방법.Emotion recognition method further comprising.

제 7 항에 있어서,The method of claim 7, wherein

상기 베이시스 벡터는,The basis vector is

상기 변환된 스펙트로그램의 주파수 축 벡터 성분인 감정 인식 방법.Emotion recognition method is a frequency axis vector component of the transformed spectrogram.

상기 변환된 스펙트로그램을 각각의 감정별로 구분하고, 상기 감정별로 각각 구분된 스펙트로그램에서 스펙트로그램 파일별로 1차 비 능동 매트릭스 분해 기능을 적용하여 각각의 파일별 베이시스 벡터를 추출하는 과정과,Dividing the transformed spectrogram for each emotion and extracting a basis vector for each file by applying a first inactive matrix decomposition function for each spectrogram file in the spectrogram divided for each emotion;

상기 추출된 각각의 파일별 베이시스 벡터를 하나의 그룹으로 합친 후 2차 비 능동 매트릭스 분해 기능을 적용하여 감정별 베이시스 그룹 벡터를 추출하는 과정과,Extracting the basis group vector for each emotion by combining the extracted basis vectors for each file into one group and applying a second-order inactive matrix decomposition function;

상기 추출된 감정별 베이시스 그룹 벡터를 취합한 후 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력하는 과정A process of outputting emotion recognition result data by performing a matching test after collecting the extracted basis group vector for each emotion

을 포함하는 감정 인식 방법.Emotion recognition method comprising a.

제 10 항에 있어서,The method of claim 10,

상기 파일별 베이시스 벡터 및 상기 감정별 베이시스 그룹 벡터는,The file-based basis vector and the emotion-based basis group vector are:

상기 변환된 스펙트로그램을 화자별로 구분하고, 상기 화자별로 각각 구분된 스펙트로그램에서 스펙트로그램 파일별로 1차 비 능동 매트릭스 분해 기능을 적용하여 화자 각각의 파일별 베이시스 벡터를 추출하는 과정과,Extracting the transformed spectrogram by speaker and extracting a basis vector for each speaker by applying a first inactive matrix decomposition function for each spectrogram file from the spectrogram divided for each speaker;

상기 추출된 화자 각각의 파일별 베이시스 벡터를 하나의 그룹으로 합친 후 2차 비 능동 매트릭스 분해 기능을 적용하여 화자별 베이시스 그룹 벡터를 추출하는 과정과,Extracting the basis group vector for each speaker by combining the basis vector of each extracted speaker into one group and applying a second inactive matrix decomposition function;

상기 추출된 화자별 베이시스 그룹 벡터를 취합한 후 매칭 검사를 실시하여 감정 인식 결과 데이터를 출력하는 과정A process of outputting emotion recognition result data by performing a matching test after collecting the extracted basis group groups for each speaker;

을 포함하는 감정 인식 방법.Emotion recognition method comprising a.

제 12 항에 있어서,The method of claim 12,

상기 파일별 베이시스 벡터 및 상기 화자별 베이시스 그룹 벡터는,The file-specific basis vector and the speaker-specific basis group vector are: