KR101987605B1

KR101987605B1 - Method and apparatus of music emotion recognition

Info

Publication number: KR101987605B1
Application number: KR1020180172014A
Authority: KR
Inventors: 김은이
Original assignee: 건국대학교 산학협력단
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2019-06-10
Also published as: WO2020138618A1

Abstract

Disclosed are a method and an apparatus for recognizing sensibility of music. According to one embodiment of the present invention, the method for recognizing sensibility of music comprises the following steps: receiving data related to lyrics of the music; generating a vocabulary dictionary based on the data; extracting a feature vector based on a weight corresponding to words included in the vocabulary dictionary and data; and determining sensibility of the music using the feature vector as an input of an artificial neural network.

Description

음악 감성 인식 방법 및 장치{METHOD AND APPARATUS OF MUSIC EMOTION RECOGNITION}[0001] METHOD AND APPARATUS OF MUSIC EMOTION RECOGNITION [0002]

아래 실시예들은 음악 감성을 인식하는 방법 및 장치에 관한 것이다.The following embodiments relate to a method and apparatus for recognizing music emotion.

음악 감성 인식은 음악 정보를 검색하는 분야에서 주목 받고 있다. 감성을 이용한 음악 검색은 사용자가 이용하는 주된 기준 중 하나이다. 실제 음악 데이터베이스는 매일 증가하고 커지므로 업데이트를 유지하는데 많은 수작업이 요구된다.Recognition of music emotion is attracting attention in the field of retrieving music information. Music retrieval using emotion is one of the main criteria that users use. The actual music database grows and grows daily, so much work is required to maintain the updates.

하지만, 감정 태그로 음악에 수동으로 주석을 다는 것은 다소 주관적이고, 비용이 많이 소요되며 많은 시간을 소모하는 작업이다. 자동 인식 시스템을 통해 이러한 문제가 해결될 수 있다.However, manually annotating music with emotion tags is somewhat subjective, costly, and time consuming. This problem can be solved through an automatic recognition system.

초기 단계의 음악 감성 인식 시스템은 대부분 오디오 내용 분석을 기반으로 했다. 이후, 오디오와 가사를 결합하여 정확도가 향상된 바이 모달(bi-modal) 음악 감성 시스템이 개발되었다.Early stage music sensibility recognition systems were mostly based on audio content analysis. A bi-modal music sensibility system with improved accuracy by combining audio and lyrics has been developed.

오디오 또는 가사의 중요성은 음악의 스타일에 의존한다. 예를 들어, 댄스 음악에서는 오디오가 관련성이 높고, 시적인 음악에서는 가사가 핵심이다. 다양한 심리학 연구는 의미 정보를 전달하기 위해 가사의 중요성을 확인한 바 있다. 그러나, 가사의 중요성에도 불구하고 가사 기반 음악 감성 인식에 관한 종래의 연구들은 한계를 가지고 있다.The importance of audio or lyrics depends on the style of the music. For example, in dance music, audio is relevant, and in poetic music, lyrics are the key. Various psychological studies have confirmed the importance of the words to convey semantic information. However, in spite of the importance of housework, conventional research on the perception of housework - based music emotion has limitations.

실시예들은 음악 감성을 인식하는 기술을 제공할 수 있다.Embodiments can provide a technique for recognizing music emotion.

일 실시예에 따른 음악 감성 인식 방법은, 음악의 가사에 관련된 데이터를 수신하는 단계와, 상기 데이터에 기초하여 어휘 사전을 생성하는 단계와, 상기 어휘 사전 및 상기 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출하는 단계와, 상기 특징 벡터를 인공 신경망의 입력으로 사용하여 상기 음악의 감성을 결정하는 단계를 포함한다.According to an embodiment of the present invention, there is provided a music emotion recognition method comprising the steps of: receiving data related to lyrics of music; generating a lexicon dictionary based on the data; and generating a lexicon dictionary and a weight corresponding to words included in the data Extracting a feature vector based on the feature vector, and determining the emotion of the music using the feature vector as an input to the artificial neural network.

상기 생성하는 단계는, 상기 데이터를 언어의 종류에 기초하여 필터링하는 단계와, 언어의 종류에 기초하여 필터링된 데이터를 단어의 품사에 기초하여 필터링하는 단계와, 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거하여 상기 어휘 사전을 생성하는 단계를 포함할 수 있다.Wherein the generating step comprises the steps of: filtering the data based on the type of language; filtering the filtered data based on the part of speech based on the kind of language; And removing the meaningless words from the dictionary to generate the dictionary dictionary.

상기 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거하여 상기 어휘 사전을 생성하는 단계는, 상기 품사에 기초하여 필터링된 데이터로부터 숫자, 감탄사, 알파벳 및 관계 대명사 중 적어도 하나를 제거하여 상기 어휘 사전을 생성하는 단계를 포함할 수 있다.The step of removing the meaningless word from the filtered data based on the part of speech of the word to generate the lexical dictionary may include removing at least one of numbers, adjectives, alphabets, and related pronouns from the filtered data based on the part of speech And generating the vocabulary dictionary.

상기 추출하는 단계는, 상기 데이터에 기초하여 단어의 집합을 생성하는 단계와, 상기 어휘 사전 및 상기 단어의 집합에 기초하여 발생 벡터를 생성하는 단계와, 상기 발생 벡터의 성분을 상기 가중치에 기초하여 변환함으로써 상기 특징 벡터를 추출하는 단계를 포함할 수 있다.Wherein the extracting step comprises the steps of: generating a set of words based on the data; generating a generation vector based on the lexical dictionary and the set of words; And extracting the feature vector by converting the feature vector.

상기 단어의 집합을 생성하는 단계는, 상기 데이터에 포함된 단어들을 분할하는 단계와, 상기 단어의 원형을 복구하여 상기 단어의 집합을 생성하는 단계를 포함할 수 있다.The step of generating the set of words may include dividing words included in the data, and recovering a prototype of the word to generate a set of words.

상기 발생 벡터의 성분을 상기 가중치에 기초하여 변환함으로써 상기 특징 벡터를 추출하는 단계는, 상기 단어의 집합에 포함된 단어의 수에 기초하여 제1 가중치를 계산하는 단계와, 미리 결정된 상수에 따른 비선형 함수에 기초하여 제2 가중치를 계산하는 단계와, 상기 제1 가중치 및 상기 제2 가중치의 곱에 기초하여 상기 발생 벡터의 성분을 변환함으로써 상기 특징 벡터를 추출하는 단계를 포함할 수 있다.Wherein the step of extracting the feature vector by converting a component of the occurrence vector based on the weight comprises: calculating a first weight based on the number of words contained in the set of words; Calculating a second weight based on the first weight and the second weight, and extracting the feature vector by transforming the components of the generation vector based on the product of the first weight and the second weight.

상기 제1 가중치를 계산하는 단계는, TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 상기 제1 가중치를 계산하는 단계를 포함할 수 있다.The step of calculating the first weight may include calculating the first weight using a TF-IDF (Term Frequency - Inverse Document Frequency).

상기 제2 가중치를 계산하는 단계는, 시그모이드(sigmoid) 함수에 기초하여 상기 제2 가중치를 계산하는 단계를 포함할 수 있다.The step of calculating the second weight may comprise calculating the second weight based on a sigmoid function.

상기 결정하는 단계는, 상기 인공 신경망을 이용하여 복수의 감성 그룹들에 대응하는 확률 값을 계산하는 단계와, 상기 확률 값에 기초하여 상기 음악의 감성을 결정하는 단계를 포함할 수 있다.The determining may comprise calculating a probability value corresponding to a plurality of emotion groups using the artificial neural network, and determining emotion of the music based on the probability value.

상기 인공 신경망은 DBN(Deep Belief Network)이고, 상기 DBN은 전이 학습(transfer learning)을 사용하여 학습될 수 있다.The artificial neural network is a DBN (Deep Belief Network), and the DBN can be learned using transfer learning.

일 실시예에 따른 음악 감성 인식 장치는, 음악의 가사에 관련된 데이터를 수신하는 수신기와, 상기 데이터에 기초하여 어휘 사전을 생성하고, 상기 어휘 사전 및 상기 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출하고, 상기 특징 벡터를 인공 신경망의 입력으로 사용하여 상기 음악의 감성을 결정하는 프로세서를 포함한다.A music emotion recognition apparatus according to an embodiment of the present invention includes a receiver for receiving data related to lyrics of music, and a lexical dictionary generating unit for generating a lexical dictionary based on the data, based on the lexical dictionary and a weight corresponding to a word included in the data And a processor for extracting a feature vector and using the feature vector as an input to an artificial neural network to determine emotion of the music.

상기 프로세서는, 상기 데이터를 언어의 종류에 기초하여 필터링하고, 언어의 종류에 기초하여 필터링된 데이터를 단어의 품사에 기초하여 필터링하고, 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거하여 상기 어휘 사전을 생성할 수 있다.Wherein the processor is configured to filter the data based on the type of language, filter the filtered data based on the part of the word based on the type of the language, and extract a meaningless word from the filtered data based on the part of the word To generate the dictionary dictionary.

상기 프로세서는, 상기 품사에 기초하여 필터링된 데이터로부터 숫자, 감탄사, 알파벳 및 관계 대명사 중 적어도 하나를 제거하여 상기 어휘 사전을 생성할 수 있다.The processor may generate the lexical dictionary by removing at least one of a number, an alias, an alphabet, and a relative pronoun from the filtered data based on the part-of-speech.

상기 프로세서는, 상기 데이터에 기초하여 단어의 집합을 생성하고, 상기 어휘 사전 및 상기 단어의 집합에 기초하여 발생 벡터를 생성하고, 상기 발생 벡터의 성분을 상기 가중치에 기초하여 변환함으로써 상기 특징 벡터를 추출할 수 있다.Wherein the processor is configured to generate a set of words based on the data, generate a generation vector based on the lexical dictionary and the set of words, and convert the components of the generation vector based on the weights, Can be extracted.

상기 프로세서는, 상기 데이터에 포함된 단어들을 분할하고, 상기 단어의 원형을 복구하여 상기 단어의 집합을 생성할 수 있다.The processor may divide words contained in the data and recover a prototype of the word to generate a set of words.

상기 프로세서는, 상기 단어의 집합에 포함된 단어의 수에 기초하여 제1 가중치를 계산하고, 미리 결정된 상수에 따른 비선형 함수에 기초하여 제2 가중치를 계산하고, 상기 제1 가중치 및 상기 제2 가중치의 곱에 기초하여 상기 발생 벡터의 성분을 변환함으로써 상기 특징 벡터를 추출할 수 있다.Wherein the processor is further configured to calculate a first weight based on the number of words contained in the set of words and to calculate a second weight based on a nonlinear function according to a predetermined constant and to calculate the first weight and the second weight The feature vector can be extracted by converting the component of the generation vector based on the product of the feature vectors.

상기 프로세서는, TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 상기 제1 가중치를 계산할 수 있다.The processor may calculate the first weight using a TF-IDF (Term Frequency - Inverse Document Frequency).

상기 프로세서는, 시그모이드(sigmoid) 함수에 기초하여 상기 제2 가중치를 계산할 수 있다.The processor may calculate the second weight based on a sigmoid function.

상기 프로세서는, 상기 인공 신경망을 이용하여 복수의 감성 그룹들에 대응하는 확률 값을 계산하고, 상기 확률 값에 기초하여 상기 음악의 감성을 결정할 수 있다.The processor may calculate a probability value corresponding to a plurality of emotion groups using the artificial neural network, and determine emotion of the music based on the probability value.

도 1은 일 실시예에 따른 음악 감성 인식 장치의 개략적인 블록도를 나타낸다.
도 2는 도 1에 도시된 음악 감성 인식 장치의 전체 동작을 나타낸다.
도 3은 도 1에 도시된 음악 감성 인식 장치가 어휘 사전을 생성하는 동작을 나타낸다.
도 4a는 TF-IDF 가중치에 의한 특징 벡터의 분포를 나타낸다.
도 4b는 도 1에 도시된 음악 감성 인식 장치에 따른 가중치에 의한 특징 벡터의 분포를 나타낸다.
도 5는 종래 기술과 도 1에 도시된 음악 감성 인식 장치의 인식 정확도의 비교 결과를 나타낸다.
도 6은 도 1에 도시된 음악 감성 인식 장치의 동작의 순서도를 나타낸다.1 is a schematic block diagram of a music emotion recognition apparatus according to an embodiment.
Fig. 2 shows the overall operation of the music emotion recognition apparatus shown in Fig.
Fig. 3 shows an operation in which the music emotion recognition apparatus shown in Fig. 1 generates a lexical dictionary.
4A shows the distribution of the feature vectors by the TF-IDF weight.
FIG. 4B shows distribution of feature vectors according to weights according to the music emotion recognition apparatus shown in FIG. 1. FIG.
Fig. 5 shows a result of comparison between the recognition accuracy of the conventional art and the music emotion recognition apparatus shown in Fig.
6 is a flowchart of the operation of the music emotion recognition apparatus shown in Fig.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.In the following, embodiments will be described in detail with reference to the accompanying drawings. However, various modifications may be made in the embodiments, and the scope of the patent application is not limited or limited by these embodiments. It is to be understood that all changes, equivalents, and alternatives to the embodiments are included in the scope of the right.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used for descriptive purposes only and are not to be construed as limiting. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms "comprises" or "having" and the like refer to the presence of stated features, integers, steps, operations, elements, components, or combinations thereof, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

제1 또는 제2등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해서 한정되어서는 안 된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 실시예의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다. The terms first, second, or the like may be used to describe various elements, but the elements should not be limited by terms. The terms may be named for the purpose of distinguishing one element from another, for example, without departing from the scope of the right according to the concept of the embodiment, the first element being referred to as the second element, The second component may also be referred to as a first component.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this embodiment belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In the following description of the present invention with reference to the accompanying drawings, the same components are denoted by the same reference numerals regardless of the reference numerals, and redundant explanations thereof will be omitted. In the following description of the embodiments, a detailed description of related arts will be omitted if it is determined that the gist of the embodiments may be unnecessarily blurred.

도 1은 일 실시예에 따른 음악 감성 인식 장치의 개략적인 블록도를 나타낸다.1 is a schematic block diagram of a music emotion recognition apparatus according to an embodiment.

도 1을 참조하면, 음악 감성 인식 장치(10)는 음악의 감성을 인식할 수 있다. 음악 감성 인식 장치(10) 음악의 가사에 기초하여 음악의 감성을 인식할 수 있다. 음악 감성 인식 장치(10)는 인공 신경망을 이용하여 음악의 가사를 분석함으로써 음악의 감성을 인식할 수 있다.Referring to FIG. 1, the music emotion recognition apparatus 10 can recognize emotion of music. The music sensibility recognition device 10 can recognize the sensitivity of music based on the lyrics of the music. The music sensibility recognition apparatus 10 can recognize the sensitivity of music by analyzing the lyrics of the music using the artificial neural network.

음악의 감성은 음악을 듣는 사람의 감정을 포함할 수 있다. 감정은 어떤 현상이나 일에 대하여 일어나는 마음이나 느끼는 기분을 의미할 수 있다. 음악의 감성은 Russel 감성 그룹에 포함된 감성을 의미할 수 있다. 예를 들어, 음악의 감성은 행복(happy), 긴장(tense), 슬픔(sad) 및 이완(relax)를 포함할 수 있다.The emotion of music may include the emotion of the person listening to the music. Emotions can mean feelings or feelings that happen to a phenomenon or work. The sensibility of music can mean emotions included in the Russel emotion group. For example, the sensibility of music can include happiness, tense, sad, and relax.

음악 감성 인식 장치(10)는 마더보드(motherboard)와 같은 인쇄 회로 기판(printed circuit board(PCB)), 집적 회로(integrated circuit(IC)), 또는 SoC(system on chip)로 구현될 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 애플리케이션 프로세서(application processor)로 구현될 수 있다.The music sensibility recognition apparatus 10 may be implemented as a printed circuit board (PCB) such as a motherboard, an integrated circuit (IC), or a system on chip (SoC). For example, the music sensibility recognition apparatus 10 may be implemented as an application processor.

음악 감성 인식 장치(10)는 PC(personal computer), 데이터 서버, 또는 휴대용 장치 내에 구현될 수 있다.The music sensibility recognition apparatus 10 may be implemented in a personal computer (PC), a data server, or a portable device.

휴대용 장치는 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트 폰(smart phone), 태블릿(tablet) PC, 모바일 인터넷 디바이스(mobile internet device(MID)), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), e-북(e-book), 또는 스마트 디바이스(smart device)로 구현될 수 있다. 스마트 디바이스는 스마트 와치(smart watch), 스마트 밴드(smart band), 또는 스마트 링(smart ring)으로 구현될 수 있다.Portable devices include laptop computers, mobile phones, smart phones, tablet PCs, mobile internet devices (MIDs), personal digital assistants (PDAs), enterprise digital assistants (EDAs) A digital still camera, a digital video camera, a portable multimedia player (PMP), a personal navigation device or a portable navigation device (PND), a handheld game console, an e-book e-book, or a smart device. A smart device can be implemented as a smart watch, a smart band, or a smart ring.

음악 감성 인식 장치(10)는 기존의 단순 BoW(Bag of Word)를 활용한 음악 감성 인식 시스템과 달리 감성어와 관련된 단어에 보다 높은 가중치를 부여하는 시멘틱을 활용한 새로운 특징 추출 기술을 사용하여 보다 높은 인식률을 제공할 수 있다.The music sensibility recognition apparatus 10 uses a new feature extraction technique that utilizes a semantic that gives a higher weight to the words related to the emotion word, unlike the conventional emotion recognition system using the BoW (Bag of Word) It is possible to provide a recognition rate.

음악 감성 인식 장치(10)는 수신기(100) 및 프로세서(200)를 포함한다. 음악 감성 인식 장치(10)는 메모리(300)를 더 포함할 수 있다.The music emotion recognition apparatus 10 includes a receiver 100 and a processor 200. [ The music sensibility recognition apparatus 10 may further include a memory 300. [

수신기(100)는 음악의 가사에 관련된 데이터를 수신할 수 있다. 수신기(100)는 외부로부터 음악의 가사에 관련된 데이터를 수신하거나, 메모리(300)로부터 데이터를 수신할 수 있다. 수신기(100)는 음악의 가사에 관련된 데이터를 프로세서(200)로 출력할 수 있다.The receiver 100 may receive data related to the lyrics of the music. The receiver 100 can receive data from the memory 300 or receive data related to music lyrics from the outside. The receiver 100 may output data related to the lyrics of the music to the processor 200.

음악의 가사에 관련된 데이터는 인식하려는 대상 음악, 인식하려는 대상 음악의 가사, 인공 신경망을 학습시키기 위한 음악의 가사, 어휘 사전을 생성하기 위한 가사 데이터셋을 포함할 수 있다.The data related to the lyrics of the music may include a target music to be recognized, a lyrics of a target music to be recognized, lyrics of music for learning an artificial neural network, and a lexical data set for generating a vocabulary dictionary.

프로세서(200)는 수신한 음악의 가사에 관련된 데이터에 기초하여 어휘 사전을 생성할 수 있다. 프로세서(200)는 음악의 가사에 관련된 데이터를 필터링함으로써 어휘 사전을 생성할 수 있다.The processor 200 may generate a vocabulary dictionary based on data related to the lyrics of the received music. The processor 200 may generate a vocabulary dictionary by filtering data related to the lyrics of the music.

예를 들어, 프로세서(200)는 음악의 가사에 관련된 데이터를 언어의 종류에 기초하여 필터링할 수 있다. 프로세서(200)는 언어의 종류에 기초하여 필터링된 데이터를 단어의 품사에 기초하여 필터링할 수 있다.For example, the processor 200 may filter data related to lyrics of music based on the type of language. The processor 200 may filter the filtered data based on the part of the word based on the type of language.

프로세서(200)는 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거할 수 있다. 구체적으로, 프로세서(200)는 품사에 기초하여 필터링된 데이터로부터 숫자, 감탄사, 알파벳 및 관계 대명사 중 적어도 하나를 제거할 수 있다. 필요에 따라 필터링의 순서 및 종류는 변경될 수 있다. 필터링 동작은 도 3을 참조하여 상세하게 설명할 것이다.The processor 200 may remove words that are not meaningful from the filtered data based on the part of speech. Specifically, the processor 200 may remove at least one of numbers, exclamations, alphabets, and relative pronouns from the filtered data based on the parts of speech. The order and type of filtering may be changed as needed. The filtering operation will be described in detail with reference to Fig.

프로세서(200)는 생성한 어휘 사전 및 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출할 수 있다. 프로세서(200)는 음악의 가사에 관련된 데이터에 기초하여 단어의 집합을 생성할 수 있다.The processor 200 can extract a feature vector based on the generated lexical dictionary and a weight corresponding to a word included in the data. The processor 200 may generate a set of words based on data related to the lyrics of the music.

구체적으로, 프로세서(200)는 데이터에 포함된 단어들을 분할할 수 있다. 프로세서(200)는 분할된 단어의 원형을 복구하여 단어의 집합을 생성할 수 있다.Specifically, the processor 200 may divide words included in the data. The processor 200 may recover the original form of the segmented word to generate a set of words.

프로세서(200)는 어휘 사전 및 단어의 집합에 기초하여 발생 벡터를 생성할 수 있다. 프로세서(200)는 발생 벡터의 성분을 가중치에 기초하여 변환함으로써 특징 벡터를 추출할 수 있다.Processor 200 may generate a generation vector based on a lexical dictionary and a set of words. The processor 200 may extract the feature vector by transforming the components of the occurrence vector based on the weights.

구체적으로, 프로세서(200)는 단어의 집합에 포함된 단어의 수에 기초하여 제1 가중치를 계산할 수 있다. 프로세서(200)는 TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 제1 가중치를 계산할 수 있다.Specifically, the processor 200 may calculate the first weight based on the number of words contained in the set of words. The processor 200 may calculate the first weight using a TF-IDF (Term Frequency - Inverse Document Frequency).

프로세서(200)는 미리 결정된 상수에 따른 비선형 함수에 기초하여 제2 가중치를 계산할 수 있다. 예를 들어, 프로세서(200)는 시그모이드(sigmoid) 함수에 기초하여 제2 가중치를 계산할 수 있다.The processor 200 may calculate the second weight based on a non-linear function according to a predetermined constant. For example, the processor 200 may calculate a second weight based on a sigmoid function.

프로세서(200)는 제1 가중치 및 제2 가중치의 곱에 기초하여 발생 벡터의 성분을 변환함으로써 특징 벡터를 추출할 수 있다.The processor 200 may extract the feature vector by transforming the components of the generation vector based on the product of the first weight and the second weight.

프로세서(200)는 특징 벡터를 인공 신경망의 입력으로 사용하여 음악의 감성을 결정할 수 있다. 프로세서(200)는 인공 신경망을 이용하여 복수의 감성 그룹들에 대응하는 확률 값을 계산할 수 있다. 이 때, 프로세서(200)는 확률 값에 기초하여 음악의 감성을 결정할 수 있다.The processor 200 may use the feature vector as an input to the artificial neural network to determine the emotion of the music. The processor 200 may calculate a probability value corresponding to a plurality of emotion groups using the artificial neural network. At this time, the processor 200 can determine the sensitivity of the music based on the probability value.

프로세서(200)는 인공 신경망을 학습시킬 수 있다. 인공 신경망은 RNN(Recurrent Neural Network), CNN(Convolutional Neural Network) 및 DBN(Deep Belief Network)를 포함할 수 있다. 예를 들어, 인공 신경망은 DNB을 포함할 수 있다.The processor 200 can learn an artificial neural network. The artificial neural network may include RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), and DBN (Deep Belief Network). For example, an artificial neural network may include DNB.

프로세서(200)는 전이 학습을 사용하여 인공 신경망을 학습시킬 수 있다. 예를 들어, 프로세서(200)는 전이 학습을 사용하여 DBN을 학습시킬 수 있다.The processor 200 can learn the artificial neural network using the transition learning. For example, the processor 200 may learn the DBN using transition learning.

메모리(300)는 인공 신경망의 학습 파라미터, 감성 모델에 따른 확률 값, 수신한 음악에 관한 데이터, 가사 정보 등을 저장할 수 있다.The memory 300 may store a learning parameter of the artificial neural network, a probability value according to the emotion model, data on the received music, lyric information, and the like.

메모리(300)는 휘발성 메모리 장치 또는 불휘발성 메모리 장치로 구현될 수 있다.The memory 300 may be implemented as a volatile memory device or a non-volatile memory device.

휘발성 메모리 장치는 DRAM(dynamic random access memory), SRAM(static random access memory), T-RAM(thyristor RAM), Z-RAM(zero capacitor RAM), 또는 TTRAM(Twin Transistor RAM)으로 구현될 수 있다.The volatile memory device may be implemented in a dynamic random access memory (DRAM), a static random access memory (SRAM), a thyristor RAM (T-RAM), a zero capacitor RAM (Z-RAM), or a twin transistor RAM (TTRAM).

불휘발성 메모리 장치는 EEPROM(Electrically Erasable Programmable Read-Only Memory), 플래시(flash) 메모리, MRAM(Magnetic RAM), 스핀전달토크 MRAM(Spin-Transfer Torque(STT)-MRAM), Conductive Bridging RAM(CBRAM), FeRAM(Ferroelectric RAM), PRAM(Phase change RAM), 저항 메모리(Resistive RAM(RRAM)), 나노 튜브 RRAM(Nanotube RRAM), 폴리머 RAM(Polymer RAM(PoRAM)), 나노 부유 게이트 메모리(Nano Floating Gate Memory(NFGM)), 홀로그래픽 메모리(holographic memory), 분자 전자 메모리 소자(Molecular Eelectronic Memory Device), 또는 절연 저항 변화 메모리(Insulator Resistance Change Memory)로 구현될 수 있다.Non-volatile memory devices include electrically erasable programmable read-only memory (EEPROM), flash memory, magnetic RAM (MRAM), spin transfer torque (MRT) , FeRAM (Ferroelectric RAM), PRAM (Phase change RAM), Resistive RAM (RRAM), Nanotube RRAM, Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), a holographic memory, a molecular electronic memory device, or an insulator resistance change memory.

도 2는 도 1에 도시된 음악 감성 인식 장치의 전체 동작을 나타내고, 도 3은 도 1에 도시된 음악 감성 인식 장치가 어휘 사전을 생성하는 동작을 나타낸다.Fig. 2 shows the overall operation of the music emotion recognition apparatus shown in Fig. 1, and Fig. 3 shows an operation in which the music emotion recognition apparatus shown in Fig. 1 generates a lexical dictionary.

도 2 및 도 3을 참조하면, 음악 감성 인식 장치(10)는 음악의 가사에 숨겨진 사람의 감성을 인식하여 음악들을 감성기반으로 분류하고 추천할 수 있다. 음악 감성 인식 장치(10)는 음악 가사 분석을 위해 필요한 가사에 대한 어휘 사전을 구축할 수 있다.Referring to FIG. 2 and FIG. 3, the music sensibility recognition apparatus 10 recognizes the emotion of a person hidden in the lyrics of the music, and classifies and recommends the music based on emotion. The music sensibility recognition apparatus 10 can construct a dictionary of words necessary for music lyrics analysis.

음악 감성 인식 장치(10)는 어휘 사전에 기초하여 음악 가사에 벡터 양자화 및 감성 어휘에 강화된 가중치 부여기법을 통해 입력된 음악 가사에 대하여 1082차원의 특징 벡터 추출할 수 있다.The music sensibility recognition apparatus 10 can extract a feature vector of 1082 dimensions for the music lyrics inputted through the vector quantization and the weighted technique for the emotional vocabulary based on the lexical dictionary.

음악 감성 인식 장치(10)는 전이 학습을 사용하여 개발된 DBN(Deep Belief Network)을 통해 개별 음악의 러셀(Russell)의 감성 그룹에 대한 확률 값을 생성할 수 있다.The music sensibility recognition apparatus 10 can generate a probability value for the emotional group of the Russell of the individual music through the DBN (Deep Belief Network) developed using the transition learning.

음악 감성 인식 장치(10)는 수신한 음악의 가사에 관련된 데이터에 기초하여 어휘 사전을 생성할 수 있다.The music sensibility recognition apparatus 10 can generate a dictionary of words based on data related to the lyrics of the received music.

음악 감성 인식 장치(10)는 감성 기반 음악 인식(Music Emotion Recognition(MER)) 분야에서 가장 많이 사용되는 데이터 셋 인 MSD(Million Song Dataset)에서 가장 빈번하게 사용되는 어휘들을 기반으로 어휘 사전을 구축할 수 있다.The music sensibility recognition apparatus 10 constructs a vocabulary dictionary based on the vocabulary most frequently used in the MSD (Million Song Dataset), which is the most used data set in the field of music emotion recognition (MER) .

음악 감성 인식 장치(10)는 음악의 가사에 관련된 데이터를 필터링함으로써 어휘 사전을 생성할 수 있다.The music sensibility recognition apparatus 10 can generate a vocabulary dictionary by filtering data related to music lyrics.

음악 감성 인식 장치(10)는 음악의 가사에 관련된 데이터를 필터링함으로써 어휘 사전을 생성할 수 있다. 음악 감성 인식 장치(10)는 감성과 무관한 단어 및 노이즈에 해당하는 단어들을 제거하고, 특징 벡터에 사용될 어휘만을 선택할 수 있다.The music sensibility recognition apparatus 10 can generate a vocabulary dictionary by filtering data related to music lyrics. The music sensibility recognition apparatus 10 can remove words corresponding to noises and noises irrespective of sensitivity, and can select only vocabulary to be used for the feature vector.

구체적으로, 음악 감성 인식 장치(10)는 언어의 종류에 기초하여 데이터를 필터링할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 어베일러블 랭귀지 디텍션(available language detection) API(Application Programming Interface)를 사용하여 영어를 제외한 다른 언어로 쓰여진 단어를 제거할 수 있다.Specifically, the music sensibility recognition apparatus 10 can filter data based on the type of language. For example, the music sensibility recognition apparatus 10 can remove words written in a language other than English by using an available language detection API (Application Programming Interface).

음악 감성 인식 장치(10)는 언어의 품사에 기초하여 데이터를 필터링할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 파트 오브 스피치(Part-of-speech)를 이용하여 음악의 감성어에 영향을 줄 수 있는 품사(예를 들어, 형용사, 명사, 동사)들만을 남기고 필터링할 수 있다.The music sensibility recognition apparatus 10 can filter data based on the part of speech. For example, the musical-feeling recognition apparatus 10 uses part-of-speech to leave only parts of speech (for example, adjectives, nouns, and verbs) that can affect the emotional word of music Can be filtered.

음악 감성 인식 장치(10)는 의미가 없는 단어를 제거할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 숫자 및 단순 감탄사 (예를 들어, yeah)등의 언어를 필터링할 수 있다.The music sensibility recognition apparatus 10 can remove words that are meaningless. For example, the music emotion recognition apparatus 10 may filter languages such as numbers and simple exclamations (e.g., yeah).

이를 통해, 음악 감성 인식 장치(10)는 복수의 단어를 포함하는 어휘 사전을 생성할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 1082개의 영어단어로 구성된 어휘 사전을 생성할 수 있다.Thus, the music sensibility recognition apparatus 10 can generate a lexical dictionary including a plurality of words. For example, the music sensibility recognition apparatus 10 can generate a lexical dictionary composed of 1082 English words.

음악 감성 인식 장치(10)는 생성한 어휘 사전 및 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출할 수 있다. 음악 감성 인식 장치(10)는 기존의 BoW 표현을 개선하여 감성 어휘에 더 큰 가중치를 부여하는 새로운 BoW를 이용하여 가사를 표현할 수 있다.The music sensibility recognition apparatus 10 can extract a feature vector based on the generated lexical dictionary and a weight corresponding to a word included in the data. The music sensibility recognition apparatus 10 can improve the existing BoW expression and express the lyric using a new BoW which gives a larger weight to the emotional vocabulary.

음악 감성 인식 장치(10)는 세 가지 과정을 통해 특징 벡터를 추출할 수 있다. 첫 번째로, 음악 감성 인식 장치(10)는 수신한 음악의 가사에 대하여 전처리를 수행할 수 있다.The music sensibility recognition apparatus 10 can extract a feature vector through three processes. First, the music sensibility recognition apparatus 10 can perform preprocessing on the lyrics of the received music.

구체적으로, 음악 감성 인식 장치(10)는 데이터에 포함된 단어들을 분할할 수 있다. 음악 감성 인식 장치(10)는 분할된 단어의 원형을 복구하여 단어의 집합을 생성할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 각 음악의 가사를 단어의 집합으로 분할하는 토큰화(tokenization)과 분할된 단어에 대해 각 원형을 복구하는 스테밍(stemming)을 수행할 수 있다.Specifically, the music sensibility recognition apparatus 10 can divide words included in the data. The music sensibility recognition apparatus 10 can recover a prototype of a divided word to generate a set of words. For example, the music sensibility recognition apparatus 10 can perform tokenization in which the lyrics of each music are divided into a set of words, and stemming in which each original is restored to the divided words.

두 번째로, 음악 감성 인식 장치(10)는 벡터 양자화를 수행할 수 있다. 음악 감성 인식 장치(10)는 생성한 어휘 사전에 기초하여 개별 음악 가사를 양자화함으로써 1082차원의 발생 벡터(occurrence vector)로 표현할 수 있다.Second, the music sensibility recognition apparatus 10 can perform vector quantization. The music sensibility recognition apparatus 10 can represent the occurrence music of 1082 dimensions by quantizing individual music lyrics based on the generated lexical dictionary.

세 번째로, 음악 감성 인식 장치(10)는 감성 어휘에 기초하여 가중치를 계산할 수 있다. 음악 감성 인식 장치(10)는 단어의 집합에 포함된 단어의 수에 기초하여 제1 가중치를 계산할 수 있다. 음악 감성 인식 장치(10)는 TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 제1 가중치를 계산할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 수학식 1을 이용하여 제1 가중치 ω_a,i를 계산할 수 있다.Thirdly, the music sensibility recognition apparatus 10 can calculate the weight based on the emotional vocabulary. The music sensibility recognition apparatus 10 can calculate the first weight based on the number of words included in the set of words. The music sensibility recognition apparatus 10 can calculate the first weight using the TF-IDF (Term Frequency - Inverse Document Frequency). For example, the music sensibility recognition apparatus 10 may calculate the first weight? _{A, i} using Equation (1).

여기서, N은 데이터 베이스에 있는 모든 음악의 가사에서 해당 단어가 나타나는 빈도 수이며, N_i는 i번째 음악의 가사에서 해당 단어가 나타나는 빈도 수를 의미할 수 있다.Here, N is the frequency with which the corresponding word appears in the lyrics of all the music in the database, and N _i may be the frequency with which the word appears in the lyrics of the i-th music.

음악 감성 인식 장치(10)는 미리 결정된 상수에 따른 비선형 함수에 기초하여 제2 가중치를 계산할 수 있다. 음악 감성 인식 장치(10)는 시그모이드(sigmoid) 함수에 기초하여 제2 가중치를 계산할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 수학식 2를 이용하여 제2 가중치 ω_s,i를 계산할 수 있다. 이하에서, 제2 가중치는 감성 점수(sentiment score)로 정의될 수 있다.The music sensibility recognition apparatus 10 can calculate the second weight based on a nonlinear function according to a predetermined constant. The music sensibility recognition apparatus 10 can calculate the second weight based on the sigmoid function. For example, the music sensibility recognition apparatus 10 can calculate the second weight? _{S, i} using Equation (2). Hereinafter, the second weight may be defined as a sentiment score.

여기서, α는 기울기 결정 상수를 의미하고, S_i는 감성어 사전에 의해 결정되는 상수를 의미할 수 있다. 예를 들어, S_i는 감성과 관련이 깊은 단어에 대해 -3에서 +3까지 값을 매긴 감성어 사전에서 제공되는 값일 수 있다.Here, α means a slope decision constant, and S _i can mean a constant determined by an emotional word dictionary. For example, S _i may be a value provided in an emoticon that is valued from -3 to +3 for words related to emotion.

음악 감성 인식 장치(10)가 사용하는 감성어 사전은 데이터 마이닝에서 사용되는 SentiStrength와 SentiWordNet을 포함할 수 있다. 또한, 기울기 결정 상수 α는 다양한 실험을 통해 결정될 수 있다.The emotional dictionary used by the music sensibility recognition apparatus 10 may include SentiStrength and SentiWordNet used in data mining. Also, the slope decision constant < RTI ID = 0.0 > a < / RTI >

음악 감성 인식 장치(10)는 제1 가중치 및 제2 가중치의 곱에 기초하여 발생 벡터의 성분을 변환함으로써 특징 벡터를 추출할 수 있다. 예를 들어, 음악 감성 인식 장치(10)는 수학식 3을 이용하여 발생 벡터의 각 요소의 가중치 값인 ω_i를 계산할 수 있다.The music sensibility recognition apparatus 10 can extract the feature vector by converting the components of the generation vector based on the product of the first weight and the second weight. For example, the music sensibility recognition apparatus 10 can calculate? _I , which is a weight value of each element of the generation vector, using Equation (3).

음악 감성 인식 장치(10)는 인공 신경망을 이용하여 음악의 감성을 인식할 수 있다. 또한, 음악 감성 인식 장치(10)는 인공 신경망을 학습시킬 수 있다.The music sensibility recognition apparatus 10 can recognize the emotion of the music using the artificial neural network. In addition, the music sensibility recognition apparatus 10 can learn an artificial neural network.

예를 들어, 음악 감성 인식 장치(10)는 DBN을 이용하여 감성 인식을 수행할 수 있다. 음악 감성 인식 장치(10)가 사용하는 감성 클래스는 어라우절(arousal)과 밸런스(valance) 축으로 구성된 러셀(Russell)의 감성 모델에서 각 사분면의 대표 감성인 행복(happy), 이완(relaxed), 슬픔(sad), 긴장(tense)을 포함할 수 있다.For example, the music emotion recognition apparatus 10 can perform emotion recognition using a DBN. The emotional class used by the musical emotion recognition apparatus 10 is a representative emotion of each quadrant in the emotional model of Russell composed of arousal and valance axes as follows: happy, relaxed, Sadness, and tension.

일반적으로 심층 신경망(deep neural network)의 구조를 정의하고 파라미터를 최적화하기 위해서는 방대한 데이터와 고성능 컴퓨팅과 학습시간이 요구될 수 있다. 음악 감성 인식 장치(10)는 이러한 비용을 줄이기 위해서 전이 학습(transfer learning)을 사용할 수 있다.In general, defining the structure of a deep neural network and optimizing parameters can require vast amounts of data and high-performance computing and learning time. The music sensibility recognition apparatus 10 can use transfer learning to reduce such a cost.

음악 감성 인식 장치(10)가 이용하는 DBN의 구조는 입력층, 두 개의 은닉층과 출력층을 포함할 수 있다. 이 때 입력 노드는 1082개, 첫 번째 및 두 번째의 은닉층의 노드 수는 각각 1000개와 500개로 구성될 수 있다. The structure of the DBN used by the music sensibility recognition apparatus 10 may include an input layer, two hidden layers, and an output layer. In this case, the number of input nodes may be 1082, and the number of nodes of the first and second hidden layers may be 1000 and 500, respectively.

마지막으로 출력층은 러셀의 사분면 감성에 대응되기 때문에 4개가 될 수 있다. 최종적으로 음악 감성 인식 장치(10)는 수집된 3000곡의 음악가사 데이터를 이용하여 파인 튜닝(fine-tuning)을 통해 DBN의 파라미터 값을 학습시킬 수 있다.Finally, the output layer can be four because it corresponds to Russell quadrant emotion. Finally, the music sensibility recognition apparatus 10 can learn the parameter values of the DBN through fine-tuning using the collected music lyrics data of 3000 songs.

도 4a는 TF-IDF 가중치에 의한 특징 벡터의 분포를 나타내고, 도 4b는 도 1에 도시된 음악 감성 인식 장치에 따른 가중치에 의한 특징 벡터의 분포를 나타낸다.FIG. 4A shows the distribution of the feature vectors according to the TF-IDF weights, and FIG. 4B shows the distribution of the feature vectors according to the weights according to the music emotion recognition apparatus shown in FIG.

도 4a 및 도 4b를 참조하면, 음악 감성 인식 장치(10)는 감성 점수를 이용한 감성어 기반 가중치를 고려함으로써 개별 음악 가사의 식별력을 향상시킬 수 있다. 도 4b를 참조하면, 상술한 특징 벡터 추출 방식을 통해 개별 음악가사는 더욱 식별가능한 형태로 표현될 수 있다.Referring to FIGS. 4A and 4B, the music emotion recognition apparatus 10 can improve the discrimination power of the individual music lyrics by considering emotional word based weights using emotion scores. Referring to FIG. 4B, individual musicians can be represented in a more recognizable form through the above-described feature vector extraction method.

도 4a 및 도 4b를 비교하면, TF-IDF만을 사용했을 때와는 달리 음악 감성 인식 장치(10)가 생성한 특징 벡터는 감성 그룹에 따라 서로 다른 특징벡터 분포를 나타낼 수 있다. 이를 통해, 음악 감성 인식 장치(10)는 음악 분류 결과의 성능을 더욱 향상시킬 수 있다.4A and 4B, the feature vectors generated by the music emotion recognition apparatus 10 can represent different feature vector distributions according to emotional groups, unlike the case where only the TF-IDF is used. Accordingly, the music sensibility recognition apparatus 10 can further improve the performance of the music classification result.

도 5는 종래 기술과 도 1에 도시된 음악 감성 인식 장치의 인식 정확도의 비교 결과를 나타낸다.Fig. 5 shows a result of comparison between the recognition accuracy of the conventional art and the music emotion recognition apparatus shown in Fig.

도 5를 참조하면, 음악 감성 인식 장치(10)는 종래의 기술에 비하여 높은 수준의 인식률을 보이는 것을 확인할 수 있다. 음악 감성 인식 장치 1은 SentiWordNet 감성어 사전을 이용한 경우의 성능을 나타내고, 음악 감성 인식 장치 2는 SentiStrenth 감성어 사전을 이용한 경우의 성능을 나타낼 수 있다.Referring to FIG. 5, it can be seen that the music sensibility recognition apparatus 10 has a higher recognition rate than the conventional technique. The music sensibility recognition device 1 shows the performance when the SentiWordNet emoticon is used and the music sensibility recognition device 2 can show the performance when the SentiStrenth emotion word dictionary is used.

최종적으로, SentiStrenth를 이용한 경우에 가장 높은 인식 성능을 나타내는 것을 확인할 수 있다.Finally, it can be confirmed that the highest recognition performance is obtained when SentiStrenth is used.

도 6은 도 1에 도시된 음악 감성 인식 장치의 동작의 순서도를 나타낸다.6 is a flowchart of the operation of the music emotion recognition apparatus shown in Fig.

도 6을 참조하면, 수신기(100)는 음악의 가사에 관련된 데이터를 수신할 수 있다(610). 수신기(100)는 외부로부터 음악의 가사에 관련된 데이터를 수신하거나, 메모리(300)로부터 데이터를 수신할 수 있다. 수신기(100)는 음악의 가사에 관련된 데이터를 프로세서(200)로 출력할 수 있다.Referring to FIG. 6, the receiver 100 may receive data related to the lyrics of the music (610). The receiver 100 can receive data from the memory 300 or receive data related to music lyrics from the outside. The receiver 100 may output data related to the lyrics of the music to the processor 200.

프로세서(200)는 수신한 음악의 가사에 관련된 데이터에 기초하여 어휘 사전을 생성할 수 있다(630). 프로세서(200)는 음악의 가사에 관련된 데이터를 필터링함으로써 어휘 사전을 생성할 수 있다.The processor 200 may generate a vocabulary dictionary based on data related to the lyrics of the received music (630). The processor 200 may generate a vocabulary dictionary by filtering data related to the lyrics of the music.

프로세서(200)는 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거할 수 있다. 구체적으로, 프로세서(200)는 품사에 기초하여 필터링된 데이터로부터 숫자, 감탄사, 알파벳 및 관계 대명사 중 적어도 하나를 제거할 수 있다.The processor 200 may remove words that are not meaningful from the filtered data based on the part of speech. Specifically, the processor 200 may remove at least one of numbers, exclamations, alphabets, and relative pronouns from the filtered data based on the parts of speech.

프로세서(200)는 생성한 어휘 사전 및 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출할 수 있다(650). 프로세서(200)는 음악의 가사에 관련된 데이터에 기초하여 단어의 집합을 생성할 수 있다.The processor 200 may extract the feature vector based on the generated lexical dictionary and a weight corresponding to the word included in the data (650). The processor 200 may generate a set of words based on data related to the lyrics of the music.

프로세서(200)는 특징 벡터를 인공 신경망의 입력으로 사용하여 음악의 감성을 결정할 수 있다(670). 프로세서(200)는 인공 신경망을 이용하여 복수의 감성 그룹들에 대응하는 확률 값을 계산할 수 있다. 이 때, 프로세서(200)는 확률 값에 기초하여 음악의 감성을 결정할 수 있다.The processor 200 may use the feature vector as an input to the artificial neural network to determine the emotion of the music (670). The processor 200 may calculate a probability value corresponding to a plurality of emotion groups using the artificial neural network. At this time, the processor 200 can determine the sensitivity of the music based on the probability value.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. Program instructions to be recorded on the medium may be those specially designed and constructed for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with reference to the drawings, various technical modifications and variations may be applied to those skilled in the art. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

음악 감성 인식 장치가 음악의 가사에 관련된 데이터를 수신하는 단계;
음악 감성 인식 장치가 상기 데이터에 기초하여 어휘 사전을 생성하는 단계;
음악 감성 인식 장치가 상기 어휘 사전 및 상기 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출하는 단계; 및
상기 특징 벡터를 인공 신경망의 입력으로 사용하여 상기 음악의 감성을 결정하는 단계
를 포함하고,
상기 추출하는 단계는,
상기 데이터에 기초하여 단어의 집합을 생성하는 단계;
상기 어휘 사전 및 상기 단어의 집합에 기초하여 발생 벡터를 생성하는 단계; 및
상기 발생 벡터의 성분을 상기 가중치에 기초하여 변환함으로써 상기 특징 벡터를 추출하는 단계를 포함하는
음악 감성 인식 방법.
A step of the music sensibility recognition device receiving data related to the lyrics of the music;
Generating a lexicon dictionary based on the data;
Extracting a feature vector based on a weight corresponding to a word dictionary included in the lexical dictionary and the data; And
Determining the emotion of the music using the feature vector as an input to the artificial neural network
Lt; / RTI >
Wherein the extracting comprises:
Generating a set of words based on the data;
Generating a generation vector based on the lexical dictionary and the set of words; And
And extracting the feature vector by transforming the components of the occurrence vector based on the weights
Music Sensibility Recognition Method.

제1항에 있어서,
상기 생성하는 단계는,
상기 데이터를 언어의 종류에 기초하여 필터링하는 단계;
언어의 종류에 기초하여 필터링된 데이터를 단어의 품사에 기초하여 필터링하는 단계; 및
단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거하여 상기 어휘 사전을 생성하는 단계
를 포함하는 음악 감성 인식 방법.
The method according to claim 1,
Wherein the generating comprises:
Filtering the data based on a language type;
Filtering the filtered data based on the part of speech based on the type of language; And
Removing the meaningless word from the filtered data based on the part of speech to generate the dictionary of words
And a music emotion recognition method.

제2항에 있어서,
상기 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거하여 상기 어휘 사전을 생성하는 단계는,
상기 품사에 기초하여 필터링된 데이터로부터 숫자, 감탄사, 알파벳 및 관계 대명사 중 적어도 하나를 제거하여 상기 어휘 사전을 생성하는 단계
를 포함하는 음악 감성 인식 방법.
3. The method of claim 2,
Wherein the step of generating the lexical dictionary by removing a meaningless word from the filtered data based on the part of speech of the word comprises:
Removing at least one of a number, an annotation, an alphabet, and a relative pronoun from the filtered data based on the part-of-speech to generate the lexicon dictionary
And a music emotion recognition method.

삭제delete

제1항에 있어서,
상기 단어의 집합을 생성하는 단계는,
상기 데이터에 포함된 단어들을 분할하는 단계; 및
상기 단어의 원형을 복구하여 상기 단어의 집합을 생성하는 단계
를 포함하는 음악 감성 인식 방법.
The method according to claim 1,
Wherein the step of generating the set of words comprises:
Dividing words included in the data; And
Recovering the original form of the word to generate a set of words
And a music emotion recognition method.

제1항에 있어서,
상기 발생 벡터의 성분을 상기 가중치에 기초하여 변환함으로써 상기 특징 벡터를 추출하는 단계는,
상기 단어의 집합에 포함된 단어의 수에 기초하여 제1 가중치를 계산하는 단계;
미리 결정된 상수에 따른 비선형 함수에 기초하여 제2 가중치를 계산하는 단계; 및
상기 제1 가중치 및 상기 제2 가중치의 곱에 기초하여 상기 발생 벡터의 성분을 변환함으로써 상기 특징 벡터를 추출하는 단계
를 포함하는 음악 감성 인식 방법.
The method according to claim 1,
Wherein the step of extracting the feature vector by transforming the components of the generation vector based on the weights comprises:
Calculating a first weight based on the number of words included in the set of words;
Calculating a second weight based on a non-linear function according to a predetermined constant; And
Extracting the feature vector by transforming a component of the occurrence vector based on a product of the first weight and the second weight;
And a music emotion recognition method.

제6항에 있어서,
상기 제1 가중치를 계산하는 단계는,
TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 상기 제1 가중치를 계산하는 단계
를 포함하는 음악 감성 인식 방법.
The method according to claim 6,
Wherein the step of calculating the first weight comprises:
Calculating the first weight using a TF-IDF (Term Frequency - Inverse Document Frequency)
And a music emotion recognition method.

제6항에 있어서,
상기 제2 가중치를 계산하는 단계는,
시그모이드(sigmoid) 함수에 기초하여 상기 제2 가중치를 계산하는 단계
를 포함하는 음악 감성 인식 방법.
The method according to claim 6,
Wherein the calculating the second weight comprises:
Calculating the second weight based on a sigmoid function,
And a music emotion recognition method.

제1항에 있어서,
상기 결정하는 단계는,
상기 인공 신경망을 이용하여 복수의 감성 그룹들에 대응하는 확률 값을 계산하는 단계; 및
상기 확률 값에 기초하여 상기 음악의 감성을 결정하는 단계
를 포함하는 음악 감성 인식 방법.
The method according to claim 1,
Wherein the determining comprises:
Calculating a probability value corresponding to a plurality of emotion groups using the artificial neural network; And
Determining the emotion of the music based on the probability value
And a music emotion recognition method.

제1항에 있어서,
상기 인공 신경망은 DBN(Deep Belief Network)이고,
상기 DBN은 전이 학습(transfer learning)을 사용하여 학습되는
음악 감성 인식 방법.
The method according to claim 1,
The artificial neural network is a DBN (Deep Belief Network)
The DBN is learned using transfer learning
Music Sensibility Recognition Method.

음악의 가사에 관련된 데이터를 수신하는 수신기; 및
상기 데이터에 기초하여 어휘 사전을 생성하고, 상기 어휘 사전 및 상기 데이터에 포함된 단어에 대응하는 가중치에 기초하여 특징 벡터를 추출하고, 상기 특징 벡터를 인공 신경망의 입력으로 사용하여 상기 음악의 감성을 결정하는 프로세서
를 포함하고,
상기 프로세서는,
상기 데이터에 기초하여 단어의 집합을 생성하고, 상기 어휘 사전 및 상기 단어의 집합에 기초하여 발생 벡터를 생성하고, 상기 발생 벡터의 성분을 상기 가중치에 기초하여 변환함으로써 상기 특징 벡터를 추출하는
음악 감성 인식 장치.
A receiver for receiving data related to lyrics of music; And
Generating a lexicon dictionary based on the data, extracting a feature vector based on the lexicon dictionary and a weight corresponding to a word included in the data, and using the feature vector as an input of an artificial neural network, Determining processor
Lt; / RTI >
The processor comprising:
Generating a set of words based on the data, generating a generation vector based on the lexical dictionary and the set of words, and extracting the feature vector by converting a component of the generation vector based on the weight
Music sensibility recognition device.

제11항에 있어서,
상기 프로세서는,
상기 데이터를 언어의 종류에 기초하여 필터링하고, 언어의 종류에 기초하여 필터링된 데이터를 단어의 품사에 기초하여 필터링하고, 단어의 품사에 기초하여 필터링된 데이터로부터 의미가 없는 단어를 제거하여 상기 어휘 사전을 생성하는
음악 감성 인식 장치.
12. The method of claim 11,
The processor comprising:
Filtering the data based on the type of the language, filtering the filtered data based on the part of the word based on the type of the language, removing the meaningless word from the filtered data based on the part of speech of the word, To create a dictionary
Music sensibility recognition device.

제12항에 있어서,
상기 프로세서는,
상기 품사에 기초하여 필터링된 데이터로부터 숫자, 감탄사, 알파벳 및 관계 대명사 중 적어도 하나를 제거하여 상기 어휘 사전을 생성하는
음악 감성 인식 장치.
13. The method of claim 12,
The processor comprising:
Removing at least one of numbers, exclamations, alphabets, and relative pronouns from the filtered data based on the part-of-speech to generate the lexical dictionary
Music sensibility recognition device.

삭제delete

제11항에 있어서,
상기 프로세서는,
상기 데이터에 포함된 단어들을 분할하고, 상기 단어의 원형을 복구하여 상기 단어의 집합을 생성하는
음악 감성 인식 장치.
12. The method of claim 11,
The processor comprising:
Dividing words included in the data, restoring the original form of the word to generate a set of words
Music sensibility recognition device.

제11항에 있어서,
상기 프로세서는,
상기 단어의 집합에 포함된 단어의 수에 기초하여 제1 가중치를 계산하고, 미리 결정된 상수에 따른 비선형 함수에 기초하여 제2 가중치를 계산하고, 상기 제1 가중치 및 상기 제2 가중치의 곱에 기초하여 상기 발생 벡터의 성분을 변환함으로써 상기 특징 벡터를 추출하는
음악 감성 인식 장치.
12. The method of claim 11,
The processor comprising:
Calculating a first weight based on a number of words included in the set of words, calculating a second weight based on a nonlinear function according to a predetermined constant, and calculating a second weight based on a product of the first weight and the second weight And extracts the feature vector by converting the component of the occurrence vector
Music sensibility recognition device.

제16항에 있어서,
상기 프로세서는,
TF-IDF(Term Frequency - Inverse Document Frequency)를 이용하여 상기 제1 가중치를 계산하는
음악 감성 인식 장치.
17. The method of claim 16,
The processor comprising:
The first weight is calculated using a TF-IDF (Term Frequency - Inverse Document Frequency)
Music sensibility recognition device.

제16항에 있어서,
상기 프로세서는,
시그모이드(sigmoid) 함수에 기초하여 상기 제2 가중치를 계산하는
음악 감성 인식 장치.
17. The method of claim 16,
The processor comprising:
Calculating the second weight based on a sigmoid function
Music sensibility recognition device.

제11항에 있어서,
상기 프로세서는,
상기 인공 신경망을 이용하여 복수의 감성 그룹들에 대응하는 확률 값을 계산하고, 상기 확률 값에 기초하여 상기 음악의 감성을 결정하는
음악 감성 인식 장치.
12. The method of claim 11,
The processor comprising:
Calculating a probability value corresponding to a plurality of emotion groups using the artificial neural network, and determining emotion of the music based on the probability value
Music sensibility recognition device.

제11항에 있어서,
상기 인공 신경망은 DBN(Deep Belief Network)이고,
상기 DBN은 전이 학습(transfer learning)을 사용하여 학습되는
음악 감성 인식 장치.
12. The method of claim 11,
The artificial neural network is a DBN (Deep Belief Network)
The DBN is learned using transfer learning
Music sensibility recognition device.