KR20070061626A

KR20070061626A - Method for music mood classification and system thereof

Info

Publication number: KR20070061626A
Application number: KR1020050121252A
Authority: KR
Inventors: 박근한; 박상용
Original assignee: 삼성전자주식회사
Priority date: 2005-12-10
Filing date: 2005-12-10
Publication date: 2007-06-14
Also published as: KR100772386B1; CN1979491A; US20070131095A1

Abstract

A method for classifying music files and a system therefor are provided to select and extract an audio feature value capable of improving a speed and accuracy and classify music by using the extracted feature value. A preprocessing unit(210) preprocesses at least a part of an inputted music file. A feature extracting unit(220) extracts one or more feature values from the preprocessed data. A mood determining unit(240) determines a mood of the inputted music file by using the one or more extracted feature values. A storing unit(230) stores the one or more extracted feature values and the determined mood.

Description

음악 무드 분류 방법 및 그 시스템{Method for music mood classification and system thereof}Method for music mood classification and system

도 1은 본 발명에 따른 음악 무드 분류 방법의 흐름도,1 is a flowchart of a music mood classification method according to the present invention;

도 2는 본 발명에 따른 음악 무드 분류 시스템의 구성도,2 is a block diagram of a music mood classification system according to the present invention;

도 3은 본 발명에 따른 전처리 방법의 흐름도,3 is a flowchart of a pretreatment method according to the present invention;

도 4는 본 발명에 따른 특징값 추출을 위한 텍스쳐 윈도 이동 방법,4 is a texture window movement method for feature value extraction according to the present invention;

도 5는 본 발명에 따른 특징값 추출 방법의 흐름도,5 is a flowchart of a feature value extraction method according to the present invention;

도 6은 본 발명에 따른 특징값 저장을 위한 데이터 형식.6 is a data format for storing characteristic values according to the present invention.

본 발명은 음악 파일의 내용 분석에 관한 것으로, 더욱 상세하게는 컴퓨터, MP3 플레이어, 휴대형 멀티미디어 플레이어(PMP) 등의 멀티미디어 기기에서 음악 파일의 특징을 분석하여 음악의 무드를 분류하는 방법 및 그 시스템에 관한 것이다.The present invention relates to the analysis of the contents of a music file, and more particularly, to a method and system for classifying a mood of music by analyzing characteristics of a music file in a multimedia device such as a computer, an MP3 player, a portable multimedia player (PMP), and the like. It is about.

멀티미디어 기술의 발전과 더불어 오디오 데이터의 분류 기술에 대한 관심이 높아지고 있다. 그러나, 텍스트 기반의 오디오 정보를 이용하여 음악 파일을 분류 하고 검색하는 종래의 방법은 몇 가지 문제점이 있다. 텍스트 데이터에 대한 검색 기술은 놀랄 만큼 발전하여 그 성능이 매우 뛰어나지만, 실제로 음악에 대한 정보를 텍스트로 일일이 작성하는 것은 거의 불가능한 일이므로 대용량의 오디오 데이터에 이 기술을 적용하는 것은 한계가 있다. 또한, 텍스트 데이터를 모두 작성한다고 해도, 작성자마다 각기 다른 형태의 텍스트로 표현할 수 있기 때문에 정보의 일관성을 유지하기가 쉽지 않다. With the development of multimedia technology, interest in the classification technology of audio data is increasing. However, the conventional method of classifying and searching for music files using text-based audio information has some problems. The search technology for text data has evolved remarkably, and its performance is very high. However, since it is almost impossible to write music information in text, it is limited to apply this technology to a large amount of audio data. In addition, even if all the text data is written, it is not easy to maintain the consistency of information because each author can be expressed in different forms of text.

이에 따라 컴퓨터에 의한 음악의 자동 분류 기술이 연구되고 있다. 음악 분류는 사람에 의해서든 컴퓨터에 의해서든 매우 어려운 작업이다. 음악의 무드(mood)라는 것은 매우 주관적이며 문화, 교육, 경험과 같은 많은 요소들에 의존적일 수밖에 없기 때문이다. 이와 같은 불명확한 부분이 있음에도 자동 음악 분류는 사람에 의한 분류보다 빠르고 일관성 있게 음악을 분류할 수 있다는 장점도 가지고 있다. 즉, 컴퓨터는 실험적 결과에 영향을 줄 수 있는 사람의 선호도나 선입견 등을 제거할 수 있다. 이러한 이유로 무드 모델링을 통한 자동 음악 무드 분류 방법이 활발히 연구되고 있다. 이러한 자동 음악 분류에 대한 기존 연구들은 기본적으로 음성인식 분야의 기술들을 응용하고 있다. 기존 연구들에서 사용하고 있는 방법론을 분석해보면 크게 3가지 유형으로 나눌 수 있는데, 스펙트럴 방법(Spectral Method), 시간적 방법(Temporal Method), 캡스트럴 방법(Cepstral Method)이 그것이다. 스펙트럴 방법은 스펙트럴 중심(Spectral Centroid), 스펙트럴 플럭스(Spectral Flux)와 같은 특징값(Feature)을, 시간적 방법은 제로 교차율(Zero Crossing Rate)과 같은 특징값을, 캡스트럴 방법은 MFCC(Mel-Freqeuncy Cepstral Coefficients), LPC(Linear Prediction Coding), 캡스트럼(Cepstrum)과 같은 특징값을 사용하고 있다. 그러나, 아직까지 만족할만한 속도와 정확도를 모두 갖춘 자동 음악 무드 분류 방법이 개발되고 있지 않은 실정이다.Accordingly, the automatic classification technology of music by computer is being studied. Music classification is a very difficult task, whether by people or by computers. The mood of music is very subjective and inevitably dependent on many factors such as culture, education and experience. Despite this ambiguity, automatic music classification also has the advantage of being able to classify music faster and more consistently than human classification. In other words, computers can remove people's preferences and prejudices that can affect experimental results. For this reason, automatic music mood classification method through mood modeling has been actively studied. Existing studies on automatic music classification are basically applying techniques in the field of speech recognition. The analysis of the methodologies used in the existing studies can be divided into three types: the spectral method, the temporal method, and the capstral method. The spectral method uses features such as Spectral Centroid and Spectral Flux, the temporal method uses feature values such as Zero Crossing Rate, and the spectral method uses MFCC. Features such as Mel-Freqeuncy Cepstral Coefficients, Linear Prediction Coding (LPC), and Capstrum are used. However, the automatic music mood classification method with both satisfactory speed and accuracy has not been developed yet.

본 발명은 상기와 같은 문제를 해결하기 위하여, 속도 및 정확도를 향상시킬 수 있는 오디오 특징값을 선정하여 추출하고 추출된 특징값을 이용하여 음악을 분류하는 방법과 그 시스템을 제공하는 것을 목적으로 한다. In order to solve the above problems, an object of the present invention is to provide a method and a system for selecting and extracting audio feature values that can improve speed and accuracy, and classifying music using the extracted feature values. .

상기 목적의 달성을 위해 본 발명은, 음악 전체의 통계치를 이용하지 않고 일부분만 분석함으로써 만족할만한 성능을 유지하면서도 특징값 추출 시간을 획기적으로 단축하고, 기존에 음악 분류에 사용되던 특징값들에 비해 성능을 향상시킬 수 있는 특징값들을 추출하며, 커널 기반 기계 학습 방법인 SVM(Support Vector Machine)을 이용하여 분류 정확도를 높인 음악 무드 분류 방법 및 그 시스템을 제공한다. In order to achieve the above object, the present invention significantly reduces the feature extraction time while maintaining satisfactory performance by analyzing only a part of the music without using statistical values of the whole music, and compared with the feature values previously used for music classification. The present invention provides a music mood classification method and a system for extracting feature values that can improve performance and classifying accuracy using SVM (Support Vector Machine), a kernel-based machine learning method.

본 발명의 특징은, 음악 무드 분류 방법에 있어서, 입력된 음악 파일의 적어도 일부분을 디코딩하고 정규화하는 전처리 단계와, 상기 전처리된 데이터에 대해 하나 이상의 특징값을 추출하는 단계와, 상기 하나 이상의 특징값을 이용해 상기 입력된 음악 파일의 무드를 결정하는 단계를 포함하는 것이다.A feature of the present invention is a music mood classification method comprising: a preprocessing step of decoding and normalizing at least a portion of an input music file, extracting one or more feature values for the preprocessed data, and the one or more feature values Determining the mood of the input music file by using.

바람직하게는 상기 전처리 단계가, 상기 입력된 음악 파일의 소정 위치부터 10초에 해당하는 부분을 전처리하는 단계를 포함한다.Preferably, the preprocessing step includes preprocessing a portion corresponding to 10 seconds from a predetermined position of the input music file.

바람직하게는 상기 전처리 단계가, 상기 입력된 음악 파일의 시작으로부터 30초가 경과한 지점부터 10초에 해당하는 부분을 전처리하는 단계를 포함한다.Preferably, the preprocessing step includes preprocessing a portion corresponding to 10 seconds from the point where 30 seconds have elapsed from the start of the input music file.

바람직하게는 상기 하나 이상의 특징값을 추출하는 단계가, 상기 전처리된 데이터에 대해 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스, BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들, 상기 BFCC 계수들 간의 차이값 중 적어도 하나를 추출하여 상기 특징값으로 결정하는 단계를 포함한다.Advantageously extracting said one or more feature values comprises: spectral center, spectral rolloff, spectral flux, Bark-scale Frequency Cepstral Coefficients (BFCC) coefficients, and the difference between said BFCC coefficients for said preprocessed data. Extracting at least one of the values to determine the feature value.

바람직하게는 상기 특징값으로 결정하는 단계가, 상기 전처리된 데이터를 복수의 분석 윈도로 나누는 단계와, 상기 분석 윈도를 소정의 개수만큼 포함하는 텍스쳐 윈도를 각 분석 윈도 단위로 이동해 가며 상기 텍스쳐 윈도 단위로 상기 스펙트럴 중심, 상기 스펙트럴 롤오프, 상기 스펙트럴 플럭스 및 상기 BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들의 평균값과 분산값을 획득하는 단계와, 상기 전처리된 데이터 전체에 대해, 상기 텍스쳐 윈도 단위로 획득된 평균값들과 분산값들의 평균값을 각각 계산하여 상기 특징값으로 결정하는 단계를 더 포함한다.Preferably, the determining of the feature value comprises: dividing the preprocessed data into a plurality of analysis windows, and moving a texture window including a predetermined number of the analysis windows by each analysis window unit. Obtaining an average value and a variance value of the spectral center, the spectral rolloff, the spectral flux, and the BFCC coefficients, and for the entirety of the preprocessed data, The method may further include calculating the average values of the average values and the variance values obtained as and determine the feature values.

바람직하게는 상기 음악 파일의 무드를 결정하는 단계가, SVM(Support Vector Machine) 분류기를 이용하여 상기 음악 파일의 무드를 결정하는 단계를 포함한다.Advantageously, determining the mood of said music file comprises determining the mood of said music file using a support vector machine (SVM) classifier.

본 발명의 또 다른 특징은, 음악 무드 분류 시스템에 있어서, 입력된 음악 파일의 적어도 일부분을 전처리하는 전처리부와, 상기 전처리된 데이터에 대해 하나 이상의 특징값을 추출하는 특징 추출부와, 상기 추출된 하나 이상의 특징값을 이용해 상기 입력된 음악 파일의 무드를 결정하는 무드 결정부와, 상기 추출된 하나 이상의 특징값과 상기 결정된 무드를 저장하는 저장부를 포함하는 것이다.Still another aspect of the present invention provides a music mood classification system, comprising: a preprocessor for preprocessing at least a portion of an input music file, a feature extractor for extracting one or more feature values for the preprocessed data, and the extracted And a mood determining unit for determining a mood of the input music file using at least one feature value, and a storage unit for storing the extracted at least one feature value and the determined mood.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명한다.Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명에 따른 음악 무드 분류 방법의 흐름도이다. 1 is a flowchart of a music mood classification method according to the present invention.

먼저, 입력된 음악 파일의 일부 또는 전체를 전처리(preprocessing)한다(S102). 전처리 과정은 MP3, OGG 등의 포맷으로 인코딩된 음악 파일을 디코딩하고 정규화하는 과정이다. 본 발명의 일 실시예에서는 음악 파일의 일부분에 대해서만 특징을 추출한다. 본 발명에 따르면 음악 파일의 일부분만을 분석하여도 전체를 분석한 것과 유사한 정확도를 가진 결과를 얻을 수 있다. 또한, 실험을 통해 최대한의 정확도를 얻을 수 있는 구간을 알아냈는데, 음악 파일의 시작으로부터 30초가 경과한 지점으로부터 40초까지의 구간이 바로 그것이다. 즉, 음악 파일에서 10초 동안의 음악 구간에 해당하는 데이터로부터 추출한 특징값에 의해 음악의 무드를 분류함으로써 특징값 추출 및 분류 시간을 크게 단축시켰다.First, some or all of the input music file is preprocessed (S102). Preprocessing is the process of decoding and normalizing music files encoded in MP3, OGG, etc. formats. In one embodiment of the present invention, the feature is extracted only for a part of the music file. According to the present invention, even if only a part of the music file is analyzed, a result having similar accuracy to that of the entire analysis can be obtained. In addition, experiments have found a section that can obtain the maximum accuracy, that is, a section from the beginning of the music file from 30 seconds to 40 seconds. In other words, the mood of music is classified by the feature values extracted from the data corresponding to the music section for 10 seconds in the music file, thereby greatly reducing the feature value extraction and classification time.

다음은, 전처리된 데이터에서 하나 이상의 특징값을 추출한다(S104). 본 발명에서는 오디오 데이터에서 추출할 수 있는 특징값들 중 음악의 무드를 분류하는데 효율적인 특징값을 선정하여 사용한다. 특히 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스, BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들, BFCC 계수들 간의 차이값(delta)이라는 5가지 종류의 특징값을 사용한다. Next, at least one feature value is extracted from the preprocessed data (S104). In the present invention, a feature value that is effective for classifying a mood of music among feature values that can be extracted from audio data is selected and used. Specifically, five types of feature values are used: spectral centers, spectral rolloffs, spectral fluxes, Park-scale Frequency Cepstral Coefficients (BFCC) coefficients, and deltas between BFCC coefficients.

마지막으로, 추출된 특징값들을 이용해 음악 파일의 무드를 결정한다(S106). 본 발명에서는 SVM(Support Vector Machine) 분류기를 이용하여 음악 파일의 무드를 결정한다.Finally, the mood of the music file is determined using the extracted feature values (S106). In the present invention, the mood of the music file is determined using a support vector machine (SVM) classifier.

도 2는 본 발명에 따른 음악 무드 분류 시스템의 구성도이다. 본 발명에 따른 음악 분류 시스템은 입력된 음악 파일(201)의 전처리를 위한 전처리부(210), 전처리된 데이터(211)에 대해 하나 이상의 특징값을 추출하는 특징 추출부(220), 훈련 데이터(242)와 추출된 특징값(221)을 이용해 입력된 음악 파일의 무드를 결정하는 무드 결정부(240)와, 추출된 특징값(221)과 결정된 무드(241)를 저장하는 저장부(230)를 포함한다.2 is a block diagram of a music mood classification system according to the present invention. The music classification system according to the present invention includes a preprocessor 210 for preprocessing the input music file 201, a feature extractor 220 for extracting one or more feature values from the preprocessed data 211, and training data ( 242 and a mood determination unit 240 for determining a mood of an input music file using the extracted feature value 221, and a storage unit 230 for storing the extracted feature value 221 and the determined mood 241. It includes.

도시된 실시예에 따르면, 입력 파일(201)의 포맷은 MP3, OGG 또는 WMA이나, 다른 포맷의 오디오 파일도 처리가 가능하다. 또한, 입력된 파일은 후술되는 일련의 전처리 과정을 거쳐 22050 Hz의 모노(MONO) PCM(Pulse Code Modulation) 데이터(211)로 변환되나, 다른 형태로의 응용이 가능함은 자명하다. 전처리된 데이터(211)는 특징 추출부(220)에 의해 분석되어 특징값들(221)이 출력된다. 이 실시예에 따르면, 총 21개의 특징값이 추출되는데, 스펙트럴 중심의 평균값 및 분산값, 스펙트럴 롤오프의 평균값 및 분산값, 스펙트럴 플럭스의 평균값 및 분산값, BFCC 계수들 중 최초 5개의 평균값과 분산값, BFCC 계수들 간의 차이값(delta) 5개가 그것이다. 본 발명에서는 다양한 실험을 통해 음악 분류에 효과적이면서도 성능을 최대한 높일 수 있는 특징값들을 선정하여 사용하였다. 추출된 특징값들(221)은 저장부(230)에 저장되어 무드 분류에 사용된다. 이 실시예에 따르면, 무드 결정부(240)는 SVM 분류기를 사용한다. SVM 분류기(240)에 의해, 입력된 특징값들을 가 진 음악의 무드가 "신나는", "정열적인", "감미로운", "차분한" 등의 무드(241) 중 하나로 결정된다. According to the illustrated embodiment, the format of the input file 201 is MP3, OGG or WMA, but audio files of other formats can be processed. In addition, the input file is converted into mono (pulse) Mono Code Modulation (MONM) PCM (22050 Hz) data 211 of 22050 Hz through a series of pre-processing to be described later, it is obvious that other forms of application is possible. The preprocessed data 211 is analyzed by the feature extractor 220 and the feature values 221 are output. According to this embodiment, a total of 21 feature values are extracted, the mean and variance of the spectral centers, the mean and variance of the spectral rolloff, the mean and variance of the spectral fluxes, and the average of the first five of the BFCC coefficients. And the delta between the BFCC coefficients and the variance value. In the present invention, a variety of experiments are used to select feature values that are effective in classifying music and can be as high as possible. The extracted feature values 221 are stored in the storage unit 230 and used for mood classification. According to this embodiment, the mood determiner 240 uses an SVM classifier. The SVM classifier 240 determines the mood of the music with the input feature values to one of the moods 241 such as "exciting", "passive", "sweet", "smooth", and the like.

SVM은 커널 기반의 기계학습방법으로 무감독 학습 방법 중 하나이다. 간단한 수식만을 가지고서도 복잡한 패턴인식 문제를 쉽게 해결할 수 있는 명료한 이론적 근거에 기반하고 있다. 실제 응용에서 복잡한 구조를 가지는 패턴의 분류를 위해 SVM 기법은 입력 공간인 높은 차수의 비선형 특징 벡터공간을 선형적으로 투영하여 처리할 수 있도록 해주고, 각 특징 벡터 사이의 최적의 경계 분리면(maximum margin hyperplane)을 제시한다. SVM is a kernel-based machine learning method that is one of the unsupervised learning methods. It is based on a clear theoretical basis for solving complex pattern recognition problems with simple equations. To classify patterns with complex structures in practical applications, the SVM technique enables the linear projection of high-order nonlinear feature vector spaces, which are input spaces, and the optimal margin between each feature vector. hyperplane).

SVM은 다음과 같은 방법으로 구현된다. 여기서 설명하는 방법은 일대일 분류방법에 대한 것으로, 멀티 클래스 분류기를 위해서는 일대일 분류기를 여러 개 구성하여 구현하면 된다. 먼저 양성(positive)과 음성(negative) 특성의 두개의 클래스에 속하는 훈련 데이터를 다음 수학식 1과 같이 정의한다.SVM is implemented in the following way: The method described here relates to a one-to-one classification method. For a multi-class classifier, a plurality of one-to-one classifiers may be configured and implemented. First, training data belonging to two classes of positive and negative characteristics is defined as in Equation 1 below.

x_i는 i번째 샘플의 n차원의 특징값 벡터를 나타낸다. 본 발명에서는 상기한 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스, BFCC, BFCC의 차이값을 x_i로 사용한다. y_i는 i번째 데이터의 클래스 라벨을 나타내며, 기본적인 SVM 프레임워크에서 양성 특성의 데이터와 음성 특성의 데이터를 다음 수학식 2와 같은 하이퍼 플레 인으로 분리한다.x _i represents an n-dimensional feature value vector of the i th sample. In the present invention, the difference between the spectral center, spectral rolloff, spectral flux, BFCC, and BFCC is used as x _i . y _i represents the class label of the i-th data, and the basic SVM framework separates the data of the positive characteristic and the data of the negative characteristic into the hyperplane shown in Equation 2 below.

SVM은 훈련 데이터들을 이러한 두개의 클래스들로 정밀하게 나누는 최적의 하이퍼플레인을 찾는다. 최적의 하이퍼플레인을 찾는다는 것은 다음 수학식 3과 같은 최적 문제를 푸는 것과 동일하다.SVM finds the best hyperplane to precisely divide training data into these two classes. Finding the optimal hyperplane is equivalent to solving the optimal problem as shown in Equation 3 below.

라그랑지 곱셈 방법(Lagrange Multiplier Method)에 의해 다음 수학식 4와 같은 다른 최적화 문제를 얻는다.The Lagrange Multiplier Method yields another optimization problem as shown in Equation 4 below.

이 식을 만족시키는 계수를 찾는 것이 SVM에서 구하는 하이퍼플레인을 찾는 것이 되고, 이것을 분류기 모델이라고 부른다. 훈련 데이터들에 의해 구해진 분류 기에 의해 실제 데이터값들을 분류하게 된다. SVM은 상기와 같은 선형적 모델의 내적 (x_iㆍy_i)을 대치하여 커널 함수(K(x_i,y_i))를 사용할 수 있으며, 어떤 커널을 사용하느냐에 따라 선형 혹은 비선형 모델을 구할 수 있다.Finding the coefficient that satisfies this equation is to find the hyperplane obtained from SVM, which is called the classifier model. The classifier obtained from the training data classifies the actual data values. SVM can use the kernel function (K (x _i , y _i )) by replacing the inner product (x _i ㆍ y _i ) of the linear model as above and obtain a linear or nonlinear model depending on which kernel is used have.

도 3은 본 발명에 따른 전처리 방법의 흐름도이다. 특징값을 추출하기 전에 여러가지 압축 포맷과 샘플링 특성 등에 대한 영향을 제거하기 위하여 몇가지 전처리 과정이 필요하다. 3 is a flowchart of a pretreatment method according to the present invention. Before extracting feature values, some preprocessing is required to remove the effects on various compression formats and sampling characteristics.

먼저, 인코딩된 음악 파일이 입력되면(S302) 디코딩하여 압축을 푼다(S304). 다음으로, 음악 파일이 특정한 샘플링율을 갖도록 변환한다(S306). 변환을 하는 이유는 크게 2가지인데, 첫째는 샘플링율이 특징값에 영향을 주기 때문이고, 둘째는 음악 파일에서 유용한 정보의 대부분이 저주파수 대역에 있기 때문이다. 따라서, 다운 샘플링을 하면 특성값을 구하는 시간을 단축할 수 있다. 채널 병합은 스테레오(또는 다채널)로 녹음된 음악을 모노로 바꾸는 과정이다(S308). 모노로 변환하여 특징값을 계산하면 일정한 특징값을 얻을 수 있으며 계산 시간을 단축할 수 있다. 샘플링된 수치값을 정규화하는 것은 소리 크기(loudness) 등의 영향을 최소화하는데 매우 중요한 과정이다(S310). 마지막으로, 구간 선정(windowing)을 수행한다(S312). 즉, 특징값 분석을 위한 최소의 단위 구간인 분석 윈도를 설정한다.First, when an encoded music file is input (S302), it is decoded and decompressed (S304). Next, the music file is converted to have a specific sampling rate (S306). There are two main reasons for the conversion: first, because the sampling rate affects the characteristic value, and second, because most of the useful information in the music file is in the low frequency band. Therefore, downsampling can shorten the time for obtaining the characteristic value. Channel merging is a process of converting music recorded in stereo (or multichannel) into mono (S308). When the feature value is calculated by converting it to mono, a constant feature value can be obtained and the calculation time can be shortened. Normalizing the sampled numerical values is a very important process to minimize the effects of loudness (S310). Finally, section selection is performed (S312). That is, an analysis window that is the minimum unit section for feature value analysis is set.

도 4는 본 발명에 따른 특징값 추출을 위한 텍스쳐 윈도(texture window) 이동 방법을 도시한 것이다. 기본적으로 특징값 추출은 기본 단위인 분석 윈도(analysis window, 410) 단위로 처리된다. 도시된 예에서, 분석 윈도(410)는 512 샘플의 크기를 가진다. 22050 Hz의 정규화된 데이터를 사용하는 경우, 분석 윈도의 크기는 대략 23msec 정도이다. 이 단위들에 대해 단시간 푸리에 변환(Short Time Fourier Transform)을 통하여 음악 파일의 특징값들을 계산하게 된다. 도시된 예에서는 40개의 분석 윈도를 하나의 텍스쳐 윈도(420)로 설정하여 특징값을 계산한다. 첫번째 텍스쳐 윈도(420)를 처리한 후에는 하나의 분석 윈도만큼 이동한 두번째 텍스쳐 윈도(430)를 처리한다. 이런 방법에 의해 텍스쳐 윈도를 분석 윈도 단위로 이동해 가면서, 텍스쳐 윈도에 속한 각 분석 윈도에 대해 추출된 특징값들의 평균값 및 분산값을 구하고, 분석 대상이 되는 음악 구간의 모든 텍스쳐 윈도에 대해 계산된 평균값들 및 분산값들에 대해 각각 다시 평균값을 구함으로써 최종 특징값으로 결정한다. 분석 윈도와 텍스쳐 윈도의 크기는 계산량과 성능에 영향을 미치며, 도시된 예의 값들은 다양한 실험을 통해 결정된 값으로 응용에 따라 변경 가능하다. 4 illustrates a texture window movement method for feature value extraction according to the present invention. Basically, feature value extraction is processed in the unit of analysis window (410) which is a basic unit. In the example shown, analysis window 410 has a size of 512 samples. Using normalized data of 22050 Hz, the size of the analysis window is approximately 23 msec. For these units, the characteristic values of the music file are calculated through a Short Time Fourier Transform. In the example shown, 40 analysis windows are set to one texture window 420 to calculate feature values. After processing the first texture window 420, the second texture window 430 moved by one analysis window is processed. By moving the texture windows in the analysis window unit by this method, the average and variance values of the extracted feature values are obtained for each analysis window belonging to the texture window, and the average value calculated for all texture windows of the music section to be analyzed. And the variance values are again averaged to determine final feature values. The size of the analysis window and the texture window affects the calculation amount and performance, and the values in the illustrated examples are values determined through various experiments and can be changed according to the application.

상기한 바와 같이, 본 발명에 따라 추출되는 특징값들은 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스 및 BFCC 계수들에 대한 평균값과 분산값, 그리고 BFCC 계수들 간의 차이값이다. 도 5는 이 특징값들을 구하는 과정을 도시한 것이다.As described above, the feature values extracted according to the present invention are the mean and variance values for the spectral center, the spectral rolloff, the spectral flux and the BFCC coefficients, and the difference between the BFCC coefficients. 5 shows a process of obtaining these feature values.

먼저, 특징값 추출을 위한 메모리와 테이블을 초기화하고(S502), 분석 윈도에 포함되어 있는 PCM에 대해 해밍 윈도잉(Hamming Windowing)을 통해 노이즈를 제거한다(S504). 해밍 윈도잉에 의해 변환된 데이터를 FFT(Fast Fourier Transform)에 의해 주파수 대역으로 변환하고 그 크기값(Magnitude)을 구한다(S506). 기본적 으로 이 크기값을 이용하여 스펙트럴 값들을 계산하고, 같은 크기값을 바탕으로 Bark 척도 단위(Bark-Scale)의 필터를 통과시킨다. First, a memory and a table for feature value extraction are initialized (S502), and noise is removed through Hamming Windowing for PCM included in the analysis window (S504). The data transformed by Hamming windowing is transformed into a frequency band by FFT (Fast Fourier Transform) and the magnitude (Magnitude) is obtained (S506). Basically, the spectral values are calculated using this size value, and then passed through a Bark scale filter based on the same size value.

첫번째 특징값의 추출을 위해, 스펙트럴 중심을 계산한다(S508). 스펙트럴 중심은 주파수 대역에서 에너지 분포의 평균 지점이다. 이 특징값은 음정에 대한 인지 척도로 사용된다. 즉, 음의 높낮이에 대한 주파수 내용을 판단하는 기준이다. 스펙트럴 중심은 신호 에너지의 대부분이 집중하는 주파수 영역을 결정하며, 다음 수학식 5에 의해 계산된다.In order to extract the first feature value, the spectral center is calculated (S508). The spectral center is the average point of the energy distribution in the frequency band. This feature is used as a perceptual measure of pitch. That is, it is a criterion for determining the frequency content of the pitch of the sound. The spectral center determines the frequency region where most of the signal energy is concentrated and is calculated by the following equation.

여기서, M_t[n]은 프레임 t와 주파수 n에서 푸리에 변환의 크기를 나타낸다.Where M _t [n] represents the magnitude of the Fourier transform at frame t and frequency n.

두번째 특징값의 추출을 위해, 스펙트럴 롤오프를 계산한다(S510). 스펙트럴 롤오프 지점은 주파수 대역에서 에너지의 85%가 어디에서 얻어지는가를 결정한다. 이 특징값은 스펙트럴 모양을 측정하는데, 음정의 분포 정도를 나타낼 수 있기 때문에 서로 다른 음악을 구분하는데 유용하게 사용할 수 있다. 음악의 경우 그 노래 특성에 따라 주파수 대역의 전 범위에 걸쳐 더 잘 분포되어 있거나 모여 있을 수 있는데 이를 구분할 수 있게 된다. 스펙트럴 롤오프 지점은 다음 수학식 6에 의해 계산된다.In order to extract the second feature value, the spectral rolloff is calculated (S510). The spectral rolloff point determines where 85% of the energy in the frequency band is obtained. This feature measures the spectral shape and can be used to distinguish between different pieces of music because it can represent the distribution of pitches. Music can be better distributed or grouped across the entire range of frequency bands, depending on the nature of the song. The spectral rolloff point is calculated by the following equation.

스펙트럴 롤오프 주파수 R_t는 크기 분포의 85%인 지점의 주파수로 정의된다.The spectral rolloff frequency R _t is defined as the frequency at the point that is 85% of the magnitude distribution.

세번째 특징값의 추출을 위해, 스펙트럴 플럭스를 계산한다(S512). 스펙트럴 플럭스는 2개의 연속하는 주파수 대역의 에너지 분포의 변화를 나타낸다. 음악의 특성에 따라 에너지 분포의 변화가 크거나 작을 수 있으므로 이러한 변화를 각 음악을 구분하는 특징으로 사용하는 것이다. 스펙트럴 플럭스는 연속되는 스펙트럴 분포의 정규화된 크기 사이의 차이값의 제곱으로 정의되며 다음 수학식 7과 같이 계산된다.In order to extract the third feature value, the spectral flux is calculated (S512). The spectral flux represents a change in the energy distribution of two consecutive frequency bands. The change in energy distribution may be large or small depending on the characteristics of the music, so the change is used as a feature to distinguish each music. The spectral flux is defined as the square of the difference between normalized magnitudes of consecutive spectral distributions and is calculated as in Equation 7 below.

여기서, N_t[n]은 프레임 t에서의 푸리에 변환의 정규화된 크기를 나타낸다.Where N _t [n] represents the normalized magnitude of the Fourier transform in frame t.

네번째 특징값의 추출을 위해, BFCC 계수를 계산한다. BFCC는 캡스트럼 특징을 이용하는 방법으로 비균일 필터 뱅크(non-uniform filter banks) 중에서 발성(speech articulation)에 똑같은 기여를 하는 밴드(band)로 구분하는 크리티컬 밴드 스케일 필터 뱅크(critical band scale filter banks)를 이용하며, 그 중에서 톤 인식(tone perception)을 주파수에 적용한 기법이다. 이와 같이 Bark 척도 단 위 필터는 톤을 기반으로 하기 때문에 주관적 피치 구분 등에서 사용되는 다른 척도 단위의 필터들보다 음악 분석에 적합하다. 톤은 기본적으로 음색(timbre)을 나타내는 것으로 목소리/악기 등을 구분하게 하는 소리의 중요한 요소이기 때문이다. Bark 척도 단위 필터는 기본적으로 인간의 가청범위를 약 24개의 밴드로 나눈다. 특정 대역(예, 1000 Hz) 이하에서는 선형적으로 증가하다가, 특정 대역 이상에서는 로그함수(logarithmic)로 증가한다. For the extraction of the fourth feature value, the BFCC coefficients are calculated. BFCC utilizes the capstrum feature to allow critical band scale filter banks to be divided into bands that contribute equally to speech articulation among non-uniform filter banks. Among them, tone perception is applied to frequencies. As the Bark scale filter is based on tone, it is more suitable for music analysis than other scale filters used in subjective pitch division. Tone is basically a timbre and is an important element of sound that distinguishes voice / instrument. Bark scale unit filters basically divide the human audible range into about 24 bands. It increases linearly below a certain band (eg 1000 Hz) and then increases logarithmic above a certain band.

BFCC 계산을 위해, 먼저 Bark 척도 단위의 필터 뱅크 응답을 계산한다(S514). 그 응답의 로그(Log)를 계산하고(S516) 계산된 로그값의 DCT(Discrete Cosine Transform)를 계산하여 BFCC 계수들을 구한다(S518). 또한, BFCC 계수들 간의 차이값(delta)을 계산하여 특징값으로 결정한다(S520). To calculate the BFCC, first, a filter bank response in units of Bark scale is calculated (S514). The log of the response is calculated (S516), and the DCT (Discrete Cosine Transform) of the calculated log value is calculated to obtain BFCC coefficients (S518). In addition, the difference (delta) between the BFCC coefficients is calculated and determined as a feature value (S520).

상기와 같은 방법으로 음악의 일정 구간 동안 계산된 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스 및 BFCC 계수들에 대한 평균값과 분산값을 계산하여 특징값으로 결정한다(S522). BFCC 계수의 경우 최초 5개의 BFCC 계수들에 대해 이러한 작업을 수행하는 것이 바람직하며, 결과적으로 21개의 특징값이 추출된다. 추출된 특징값들은 추후 음악 분류나 검색을 위해 저장된다(S524).In this manner, the average value and the variance of the spectral centers, the spectral rolloffs, the spectral fluxes, and the BFCC coefficients calculated during a certain period of music are calculated and determined as feature values (S522). For BFCC coefficients it is desirable to perform this operation on the first five BFCC coefficients, resulting in 21 feature values being extracted. The extracted feature values are stored for later music classification or search (S524).

도 6은 본 발명에 따른 특징값 저장을 위한 데이터 형식의 예이다. "MuSE"로 명명한 데이터 포맷은 총 200 바이트이다. 헤더 4 바이트(610)는 데이터 포맷의 이름을 기술하기 위한 것이고, 그 뒤로 버전 10 비트(620), 장르 6 비트(630), 음성/음악 구분을 위한 플래그 2 비트(640), 무드 6 비트(650), 4 바이트 크기의 특징값 21개를 위한 84 바이트(660), 데이터 포맷의 확장 여부를 표시하는 2 바이 트(670) 및 예약 데이터 107 바이트가 배치된다. 버전(620)은 향후 다양한 형태로 포맷이 업그레이드되었을 때 이를 나타내기 위한 필드이다. 데이터 포맷의 확장 여부(670)는 기본적인 데이터 포맷을 여러 개 붙여서 사용하기 위한 필드이다.6 is an example of a data format for storing characteristic values according to the present invention. The data format named "MuSE" is a total of 200 bytes. The header 4 bytes 610 are for describing the name of the data format, followed by version 10 bits 620, genre 6 bits 630, flag 2 bits 640 for voice / music discrimination, mood 6 bits ( 650), 84 bytes 660 for 21 feature values of 4 bytes size, 2 bytes 670 indicating whether the data format is expanded, and 107 bytes of reserved data are disposed. The version 620 is a field to indicate when the format is upgraded in various forms in the future. Whether to expand the data format 670 is a field for attaching several basic data formats.

상술한 바와 같은 본 발명에 따른 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. The method according to the present invention as described above can be embodied as computer readable codes on a computer readable recording medium.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

이상에서 설명한 바와 같은 본 발명의 구성에 의하면, 음악 파일의 무드 분류가 자동으로 수행되므로 사용자의 기분에 따라 편리하게 음악 선정을 할 수 있다. According to the configuration of the present invention as described above, since the mood classification of music files is automatically performed, it is possible to conveniently select music according to the user's mood.

특히, 음악의 일부만을 분석하므로 곡 전체를 분석하는 방법에 비해 평균적으로 24배 이상 특징값 추출 속도가 빠르다는 이점이 있다. 또한, 스펙트럼 특징 중 성능에 영향을 주지 않는 중복된 특징을 제거하는 한편, Mel-Frequency 대신 음 색의 정보를 담아낼 수 있고 계산이 보다 간단한 Bark-Frequency 방법을 사용하여 성능을 보다 향상시켰다. 또한, BFCC 계수들간의 차이값을 활용함으로써 분류의 정확도를 더욱 향상시켰다.In particular, since only a part of the music is analyzed, the feature extraction speed is 24 times faster than the method of analyzing the whole song on average. In addition, it eliminates duplicate features that do not affect the performance of spectral features, and improves the performance by using Bark-Frequency method, which can contain information of sound instead of Mel-Frequency and simpler calculation. In addition, the accuracy of the classification is further improved by utilizing the difference between the BFCC coefficients.

Claims

입력된 음악 파일의 적어도 일부분을 디코딩하고 정규화하는 전처리 단계와,A preprocessing step of decoding and normalizing at least a portion of the input music file,

상기 전처리된 데이터에 대해 하나 이상의 특징값을 추출하는 단계와,Extracting one or more feature values for the preprocessed data;

상기 하나 이상의 특징값을 이용해 상기 입력된 음악 파일의 무드를 결정하는 단계를 포함하는 것을 특징으로 하는 음악 무드 분류 방법.Determining a mood of the input music file using the one or more feature values.

제1항에 있어서,The method of claim 1,

상기 전처리 단계는,The pretreatment step,

상기 입력된 음악 파일의 소정 위치부터 10초에 해당하는 부분을 전처리하는 단계를 포함하는 것을 특징으로 하는 음악 무드 분류 방법.And preprocessing a portion corresponding to 10 seconds from a predetermined position of the input music file.

제2항에 있어서,The method of claim 2,

상기 전처리 단계는,The pretreatment step,

상기 입력된 음악 파일의 시작으로부터 30초가 경과한 지점부터 10초에 해당하는 부분을 전처리하는 단계를 포함하는 것을 특징으로 하는 음악 무드 분류 방법.And preprocessing a portion corresponding to 10 seconds from the point where 30 seconds have elapsed since the start of the input music file.

제1항에 있어서,The method of claim 1,

상기 하나 이상의 특징값을 추출하는 단계는,Extracting the one or more feature values may include:

상기 전처리된 데이터에 대해 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스, BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들, 상기 BFCC 계수들 간의 차이값 중 적어도 하나를 추출하여 상기 특징값으로 결정하는 단계를 포함하는 것을 특징으로 하는 음악 무드 분류 방법.Extracting at least one of spectral centers, spectral rolloffs, spectral fluxes, Park-scale Frequency Cepstral Coefficients (BFCC) coefficients, and differences between the BFCC coefficients with respect to the preprocessed data to determine the feature values Music mood classification method comprising a.

제4항에 있어서,The method of claim 4, wherein

상기 특징값으로 결정하는 단계는,Determining the feature value,

상기 전처리된 데이터를 복수의 분석 윈도로 나누는 단계와,Dividing the preprocessed data into a plurality of analysis windows;

상기 분석 윈도를 소정의 개수만큼 포함하는 텍스쳐 윈도를 각 분석 윈도 단위로 이동해 가며 상기 텍스쳐 윈도 단위로 상기 스펙트럴 중심, 상기 스펙트럴 롤오프, 상기 스펙트럴 플럭스 및 상기 BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들의 평균값과 분산값을 획득하는 단계와,The spectral center, the spectral rolloff, the spectral flux, and the BFCC (Bark-scale Frequency Cepstral Coefficients) are moved by moving the texture windows including a predetermined number of analysis windows in each analysis window unit. Obtaining an average value and a variance value of the coefficients,

상기 전처리된 데이터 전체에 대해, 상기 텍스쳐 윈도 단위로 획득된 평균값들과 분산값들의 평균값을 각각 계산하여 상기 특징값으로 결정하는 단계를 더 포함하는 것을 특징으로 하는 음악 무드 분류 방법.And calculating the average values of the average values and the variance values obtained in the texture window unit for the entire preprocessed data, and determining the average values as the feature values.

제1항에 있어서,The method of claim 1,

상기 음악 파일의 무드를 결정하는 단계는,Determining the mood of the music file,

SVM(Support Vector Machine) 분류기를 이용하여 상기 음악 파일의 무드를 결정하는 단계를 포함하는 것을 특징으로 하는 음악 무드 분류 방법.Determining a mood of the music file using a support vector machine (SVM) classifier.

입력된 음악 파일의 적어도 일부분을 전처리하는 전처리부와,A preprocessor for preprocessing at least a portion of the input music file;

상기 전처리된 데이터에 대해 하나 이상의 특징값을 추출하는 특징 추출부와,A feature extractor for extracting one or more feature values of the preprocessed data;

상기 추출된 하나 이상의 특징값을 이용해 상기 입력된 음악 파일의 무드를 결정하는 무드 결정부와,A mood determination unit which determines a mood of the input music file using the extracted one or more feature values;

상기 추출된 하나 이상의 특징값과 상기 결정된 무드를 저장하는 저장부를 포함하는 것을 특징으로 하는 음악 무드 분류 시스템.And a storage unit for storing the extracted one or more feature values and the determined mood.

제7항에 있어서,The method of claim 7, wherein

상기 음악 파일의 일부분은,A portion of the music file is

상기 음악 파일의 소정 위치부터 10초에 해당하는 부분인 것을 특징으로 하는 음악 무드 분류 시스템.The music mood classification system, characterized in that the portion corresponding to 10 seconds from the predetermined position of the music file.

제8항에 있어서,The method of claim 8,

상기 소정 위치는,The predetermined position is,

상기 음악 파일의 시작으로부터 30초가 경과한 지점인 것을 특징으로 하는 음악 무드 분류 시스템.A music mood classification system, characterized in that 30 seconds have elapsed since the start of the music file.

제7항에 있어서,The method of claim 7, wherein

상기 특징 추출부는,The feature extraction unit,

상기 전처리된 데이터에 대해 스펙트럴 중심, 스펙트럴 롤오프, 스펙트럴 플럭스, BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들, 상기 BFCC 계수들 간의 차이값 중 적어도 하나 이상을 추출하여 상기 특징값으로 결정하는 것을 특징으로 하는 음악 무드 분류 시스템.Extracting at least one or more of spectral centers, spectral rolloffs, spectral fluxes, BFCC coefficients, and difference values between the BFCC coefficients with respect to the preprocessed data to determine the feature values Music mood classification system, characterized in that.

제10항에 있어서,The method of claim 10,

상기 특징 추출부는,The feature extraction unit,

상기 전처리된 데이터를 복수의 분석 윈도로 나누고, 상기 분석 윈도를 소정의 개수만큼 포함하는 텍스쳐 윈도를 각 분석 윈도 단위로 이동해 가며 상기 텍스쳐 윈도 단위로 상기 스펙트럴 중심, 상기 스펙트럴 롤오프, 상기 스펙트럴 플럭스 및 상기 BFCC(Bark-scale Frequency Cepstral Coefficients) 계수들의 평균값과 분산값을 획득하고, 상기 전처리된 데이터 전체에 대해, 상기 텍스쳐 윈도 단위로 획득된 평균값들과 분산값들의 평균값을 각각 계산하여 상기 특징값으로 결정하는 것을 특징으로 하는 음악 분류 시스템.The preprocessed data is divided into a plurality of analysis windows, and a texture window including the analysis window by a predetermined number is moved in each analysis window unit, and the spectral center, the spectral rolloff, and the spectral unit in the texture window unit. The average value and the variance value of the flux and the BFCC coefficients are obtained, and the average value and the variance value of the texture window are calculated for the entire preprocessed data, respectively. Music classification system, characterized in that determined by the value.

제7항에 있어서,The method of claim 7, wherein

상기 무드 결정부는,The mood determination unit,

SVM(Support Vector Machine) 분류기를 이용하여 상기 음악 파일의 무드를 결정하는 것을 특징으로 하는 음악 분류 시스템.And a mood of the music file is determined using an SVM (Support Vector Machine) classifier.