CN111159463A

CN111159463A - Music emotion recognition method and system

Info

Publication number: CN111159463A
Application number: CN201911174633.3A
Authority: CN
Inventors: 杨辞源; 孟泽; 任续超
Original assignee: Heihezi Technology Beijing Co ltd
Current assignee: Heihezi Technology Beijing Co ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2020-05-15

Abstract

The embodiment of the invention relates to a music emotion recognition method and a system, wherein the method comprises the following steps: constructing a music emotion classification model based on a Thayer theoretical model; acquiring a lyric sample set, constructing an emotion dictionary according to the lyric sample set, and calculating an emotion score of a song to be analyzed according to the emotion dictionary, wherein the emotion score is used for identifying a first position of the song to be analyzed on an abscissa of the music classification model; detecting a BPM value of the song to be analyzed, correcting the detected BPM value, and identifying a second position of the song to be analyzed on a vertical coordinate of the music classification model through the corrected BPM value; and determining the corresponding target emotion of the song to be analyzed in the music emotion classification model according to the first position and the second position. The technical scheme provided by the application can improve the accuracy of music emotion recognition.

Description

Music emotion recognition method and system

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a music emotion recognition method and system.

Background

It is a difficult problem to classify the emotion of music by a computer or to make a computer understand the emotion of music, but it is a very desirable technique. If the underlying technology is realized, the forms of a plurality of internet products can be changed, and simultaneously, a great amount of manpower and material resource cost is saved.

The existing music classification method usually trains existing data by using a neural network, so that a classification model is obtained to realize music classification. These trained models have a slight effect on music style classification and hardly any effect on emotion classification. Because such methods are completely dependent on the quality of the data, the current music tag data quality is often not high, and the emotion classification accuracy is not high.

Disclosure of Invention

The application aims to provide a music emotion recognition method and system, which can improve the emotion recognition precision.

In order to achieve the above object, the present application provides a music emotion recognition method, including:

constructing a music emotion classification model based on a Thayer theoretical model, wherein the music emotion classification model comprises a horizontal coordinate and a vertical coordinate; wherein the abscissa is used to characterize the positive/negative degree of the music and the ordinate is used to characterize the energy of the music;

acquiring a lyric sample set, constructing an emotion dictionary according to the lyric sample set, and calculating an emotion score of a song to be analyzed according to the emotion dictionary, wherein the emotion score is used for identifying a first position of the song to be analyzed on an abscissa of the music classification model;

detecting a BPM value of the song to be analyzed, correcting the detected BPM value, and identifying a second position of the song to be analyzed on a vertical coordinate of the music classification model through the corrected BPM value;

and determining a target emotion corresponding to the song to be analyzed in the music emotion classification model according to the first position and the second position, and taking the target emotion as the music emotion identified by the song to be analyzed.

Further, constructing an emotion dictionary from the lyric sample set comprises:

identifying high-frequency words of each lyric in the lyric sample set, and performing emotion marking on the identified high-frequency words to set respective emotion numerical values for the high-frequency words;

and constructing an emotion dictionary based on the high-frequency vocabulary with the emotion numerical values.

Further, calculating the emotion score of the song to be analyzed includes:

identifying emotion units in the lyrics of the song to be analyzed; the emotion unit comprises an emotion vocabulary and a modified vocabulary of the emotion vocabulary;

calculating the emotion value of each emotion unit according to the emotion dictionary, and determining the emotion score of each sentence in the lyric according to the calculated emotion value;

counting the total score of the sentences representing the positive emotion, and determining a positive parameter according to the total score of the sentences representing the positive emotion and the total number of the sentences representing the positive emotion; counting the total score of the sentences representing the negative emotion, and determining a negative parameter according to the total score of the sentences representing the negative emotion and the total number of the sentences representing the negative emotion;

and taking the sum of the positive parameters and the negative parameters as the emotion score of the song to be analyzed.

Further, determining an emotion score for each sentence in the lyrics comprises:

and taking the sum of the emotion values of all emotion units in the current sentence as the emotion score of the current sentence.

Further, the correcting the detected BPM value includes:

acquiring a song sample set, detecting the BPM value of each song in the song sample set, and determining the conventional interval of BPM according to the counted BPM value;

judging whether the BMP value of the song to be analyzed is in the conventional interval or not; if so, not correcting the BMP value of the song to be analyzed; and if not, carrying out double or halved correction on the BMP value of the song to be analyzed.

Further, the correcting the detected BPM value includes:

counting the average speed of each word in the lyrics of the song to be analyzed, and judging whether the average speed is matched with the detected BPM value; if the song to be analyzed is matched with the BMP value, the BMP value of the song to be analyzed is not corrected; and if not, carrying out double or halved correction on the BMP value of the song to be analyzed.

Further, when the average speed of each word appearing in the lyrics of the song to be analyzed is counted, the method further comprises the following steps:

and identifying the starting and stopping time of each lyric, taking the sum of the identified starting and stopping times as the total duration of the lyrics, and counting the average speed by adopting the standard deviation and the quartile.

To achieve the above object, the present application further provides a music emotion recognition system, including:

the model building unit is used for building a music emotion classification model based on a Thayer theoretical model, and the music emotion classification model comprises horizontal coordinates and vertical coordinates; wherein the abscissa is used to characterize the positive/negative degree of the music and the ordinate is used to characterize the energy of the music;

the abscissa analysis unit is used for acquiring a lyric sample set, constructing an emotion dictionary according to the lyric sample set, and calculating an emotion score of a song to be analyzed according to the emotion dictionary, wherein the emotion score is used for identifying a first position of the song to be analyzed on the abscissa of the music classification model;

the longitudinal coordinate analysis unit is used for detecting the BPM value of the song to be analyzed, correcting the detected BPM value and identifying a second position of the song to be analyzed on the longitudinal coordinate of the music classification model through the corrected BPM value;

and the emotion determining unit is used for determining a target emotion corresponding to the song to be analyzed in the music emotion classification model according to the first position and the second position, and taking the target emotion as the music emotion identified by the song to be analyzed.

Further, the abscissa analyzing unit includes:

the word recognition module is used for recognizing high-frequency words of each lyric in the lyric sample set and carrying out emotion marking on the recognized high-frequency words so as to set respective emotion numerical values for the high-frequency words;

and the construction module is used for constructing an emotion dictionary based on the high-frequency vocabulary with the emotion numerical values.

Further, the abscissa analyzing unit further includes:

the emotion unit identification module is used for identifying emotion units in the lyrics of the song to be analyzed; the emotion unit comprises an emotion vocabulary and a modified vocabulary of the emotion vocabulary;

the emotion score calculation module is used for calculating the emotion value of each emotion unit according to the emotion dictionary and determining the emotion score of each sentence in the lyric according to the calculated emotion value;

the parameter determining module is used for counting the total score of the sentences representing the positive emotion and determining positive parameters according to the total score of the sentences representing the positive emotion and the total number of the sentences representing the positive emotion; counting the total score of the sentences representing the negative emotion, and determining a negative parameter according to the total score of the sentences representing the negative emotion and the total number of the sentences representing the negative emotion;

and the emotion score determining module is used for taking the sum of the positive parameters and the negative parameters as the emotion score of the song to be analyzed.

Therefore, according to the technical scheme provided by the invention, the Thayer theoretical model of psychology is modified, so that the model suitable for music emotion classification is constructed. The abscissa of the model may characterize the degree of aggressiveness/passivity of the music and the ordinate may be used to characterize the energy of the music. In practical applications, the values of the abscissa and the ordinate may be determined for a large number of music samples. When the emotion of the song to be analyzed needs to be classified, a first position of the song to be analyzed on an abscissa in the model can be determined by calculating the emotion score of the song to be analyzed, and then a second position of the song to be analyzed on an ordinate in the model can be determined according to the BPM value of the song. Thus, the emotion type at the intersection of the first position and the second position can be used as the musical emotion of the song to be analyzed. Therefore, the music emotion of the song can be quickly and accurately identified in the mode.

Drawings

FIG. 1 is a schematic diagram of a music emotion recognition method;

FIG. 2 is a schematic diagram of ten emotion types;

FIG. 3 is a schematic diagram of six simplified emotion types.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.

The present application provides a music emotion recognition method, please refer to fig. 1, the method includes:

s1: constructing a music emotion classification model based on a Thayer theoretical model, wherein the music emotion classification model comprises a horizontal coordinate and a vertical coordinate; wherein the abscissa is used to characterize the positive/negative degree of the music and the ordinate is used to characterize the energy of the music;

s2: acquiring a lyric sample set, constructing an emotion dictionary according to the lyric sample set, and calculating an emotion score of a song to be analyzed according to the emotion dictionary, wherein the emotion score is used for identifying a first position of the song to be analyzed on an abscissa of the music classification model;

s3: detecting a BPM value of the song to be analyzed, correcting the detected BPM value, and identifying a second position of the song to be analyzed on a vertical coordinate of the music classification model through the corrected BPM value;

s4: and determining a target emotion corresponding to the song to be analyzed in the music emotion classification model according to the first position and the second position, and taking the target emotion as the music emotion identified by the song to be analyzed.

In one embodiment, constructing an emotion dictionary from the set of lyric samples comprises:

In one embodiment, calculating the sentiment score for the song to be analyzed comprises:

In one embodiment, determining the sentiment score for each sentence in the lyrics comprises:

In one embodiment, correcting the detected BPM value includes:

In one embodiment, when counting the average speed of occurrence of each word in the lyrics of the song to be analyzed, the method further comprises:

Specifically, in an application scenario, a new multi-modal music emotion analysis model, hereinafter referred to as a CYME model for short, is proposed by the characteristics of music reflected in time (such as BPM (Beat per minute, tempo), tonality, and audio energy) and the characteristics of music reflected in lyrics, in combination with the Thayer theory in psychology. The construction rules of this model are presented below.

1. Thayer theory and music emotion

The Thayer two-dimensional emotion model is used for dividing emotions, the ordinate of the Thayer two-dimensional emotion model represents the energy dimension, and the change from low to high represents the emotional activity of a subject; the abscissa represents the degree of positive and negative, and the change from "negative" to "positive" is mapped onto the abscissa of the model. According to this theory, some basic emotions of human can be mapped around this coordinate system, such as Annoying, exciting, and so on. This psychological classification method is basically followed in the emotional dimension.

2. Mapping and classification optimization of Thayer coordinate system

In practice, however, it has been found that emotions in this coordinate system are not equally applicable to emotions in music. Few emotions such as Annoying, Angry, Nervous, Bored, Sleepy, and call rarely appear in music, few music expressing emotions such as Annoying, Angry, and the like are frequently listened to and sung, and only a few metals, blackmetals, Hard rock, and the like have the emotions, and these songs are relatively small and popular. After the problem is found, the classification of the Thayer model is optimized and upgraded. Through extensive music analysis and music tag clustering, 10 categories of "inspirational and enthusiastic", "positive and full hope", "joy", "enthusiasm and HIGH", "lyric and feeling", "relax and enthusiasm", "sad", "low-fall", "extreme", "other" are divided, as generally shown in fig. 2.

3. Emotion classification dimension reduction and separation and coordinate migration

The abscissa in the Thayer model represents the positive and negative aspects of music. The real computer algorithm analyzes from the audio perspective, and the positive/negative degree of the song is difficult to obtain, because the commonly seen single-track audio such as mp3, wav and the like is a music presentation mode after the information is highly compressed. Thus creating a way to analyze the positive/negative degree of a song in combination with the music size in a lyric text perspective.

From the aspect of music theory, generally, the song presented by using the major scale is positive upwards, and the minor scale is negative and melancholy, so that the detection of the major scale and the minor scale of the music can be used as the judgment basis of the positive/negative degree of the music. On the other hand, the lyrics and the music are inseparable, the lyrics of a song with a worry are also certain to be vaguely and graceful, and according to the characteristic, the degree of the corresponding song on the horizontal axis Valence can be reflected to a great extent as long as the positive/negative degree of the lyrics is analyzed from the language perspective.

Also for the vertical axis, a mapping is found. In general, the BPM of music is greatly related to the vertical axis, and music with higher BPM will be more "high" in emotional expression, while music with lower BPM will be more "Low".

Generally speaking, the relatively complex problem of music emotion is decomposed into two one-dimensional vectors according to a psychological model and related music theory knowledge, and original high-dimensional information is converted, so that the music emotion classification can be realized by an unsupervised method.

3. Pruning classifications from the abscissa and ordinate attributes of a category

After the horizontal axis and the vertical axis in the previous step are mapped to a specific task, the model can be checked in a simple simulation mode, and meanwhile, the emotion classification standard is refined. By performing manual bpm analysis and emotion analysis on a plurality of songs, two main results were found to be shown: 1. the association of BPM and musical emotion is relatively robust; 2. the 10 classes assumed before substantially meet the classification requirements. To this end, various demonstrations are essentially mature, and the ten categories before optimization can be: the six categories of "inspiring enthusiasm", "happy", "enthusiasm", "expressing emotion", "relaxing flat", "sad" create the final CYME model, as shown in FIG. 3. So far, the CYME model is completely created, and then emotion classification is mainly realized by using the model.

And a second stage: emotion processing of horizontal axis/lyric text

On the horizontal axis, positive and negative scores on emotion are mainly obtained through lyric text analysis. To achieve this goal, the lyrics must be read by a computer. There are currently many processing tools and data sets for conventional white-language, conversational, in terms of natural language processing. However, the information presentation modes of lyrics writing articles and daily conversation are very different, and compared with the latter two lyrics, the lyrics usually have the characteristics of lack of context, less strict logic, short and concise sentences, scene jumping and the like, and are more popular on the whole than the presentation modes of conventional characters. Therefore, the existing natural language processing method cannot be directly applied to the lyric processing, and a data set of the existing natural language processing method needs to be constructed and a processing method closer to the lyric is adopted.

1. And (3) construction of a data set:

selecting a plurality of songs from a lyric library, carrying out Chinese word segmentation by using a word segmentation tool, segmenting a sentence to obtain individual words, and screening the words and words by using a stop word list. Finally, it is found that in popular songs, the statistical word frequency of the words such as "love", "say", "want", etc. is the highest in both parts, and other high-frequency words have many similarities. Subsequently, 3070 words with the frequency of more than 100 before ranking can be screened for emotion marking, and the marks are marked with positive or negative emotion to mark the words from-5 (most negative) to 5 (most positive) to form a new emotion dictionary. Then, the high-frequency words are combined with the emotion dictionary processed by the traditional natural language to replace the repeated high-frequency words. The emotion dictionary for the lyrics of the popular songs is constructed in this way, and has high universality.

2. Emotion unit:

the emotion units are the most basic elements for composing the emotion of the song, and each emotion unit is composed of an emotion word and a modifier thereof. The emotion words come from an emotion dictionary, and the modifiers are divided into two types: negative words and conjunctions.

The negative word is negative to the expression emotion of the emotion words, the polarity of the emotion words can be reversed, positive emotion is changed into negative emotion, and negative emotion is changed into positive emotion. The common negative words include "no, other, no", etc. For example, the word "give up" is itself a negative emotion word, while the emotion unit modified by the negation word "no" is "not give up" or is a positive emotion unit.

The conjunctions are important ways to connect the components in the sentence, and they are also frequently appeared in the lyrics, such as ' if you can't ' conjunctions in ' if you can't disapprove and can't be seen '. Only the conjunctive word connected with the emotional word is considered, and if the emotional word appears in a part before and after the conjunctive word and does not appear in another part, only the part containing the emotional word is considered. The influence of the conjunctive word on the emotional word is different according to the kind of the conjunctive word. Two classes of conjunctions representing a disjunctive relationship and representing a progressive relationship are primarily considered herein. Conjunctions representing turning relationships, such as "though, but" reverse the emotion of the first half and reinforce the emotion of the second half; conjunctions that represent a progressive relationship, such as "rather than" can reduce the emotion in the first half and enhance the emotion in the second half.

The sentiment unit is divided by a syntax-based division mode. The Dependency matching method uses a Dependency matching open source tool of StanfordCoreNLP, and can obtain the Dependency relationship between each entity in a sentence, and modifiers and emotion words are matched by the aid of the Dependency matching open source tool. The sentence "several sentences are non-sentences and can not cool my enthusiasm" together with a modifier "cannot" and two emotional words "enthusiasm" and "cool", and the syntactic analysis tool can see that the modifier "cannot" and the emotional word "cool" have a dependency relationship and the emotional word "enthusiasm" has no dependency relationship, so that two emotional units "enthusiasm" and "uncooled" can be obtained finally. By the aid of the dividing method, each emotion unit can be divided from the semantic perspective as much as possible, and accordingly quality of a final result is improved.

3. Song emotion score:

firstly, the emotion units in each song word are found out, and then the scores of all the emotion units in each song word are added to obtain the emotion score of one song word.

The emotion scores for an entire song cannot simply sum the scores for each sentence, since it is noted that the emotion of a song is emotionally founded, i.e., a song is either entirely positive or entirely negative, while the emotion scores for each sentence are not stable. If the emotion score of a song is significantly more positive than negative, the song can be considered to convey positive emotion in its entirety, and it is desirable to reduce the influence of negative emotion score on the overall score.

The final score calculation formula for the entire song is as follows:

S＝S_P·N_P+S_N·N_N

wherein S represents the emotion total score of the song, S_PTotal score, N, representing positive sentiment sentences_PNumber of sentences representing positive emotion, N_PTotal score, N, representing negative emotion sentences_NRepresenting the number of negative emotion sentences.

And a third stage: audio energy analysis of vertical axis/songs

The audio energy analysis for a song is mainly to extract the tempo of the song. The extraction of the tempo is done using the library, tempo function. It can output the speed of a song, i.e. the value of BPM. However, a problem is found in the use process, namely the problem of double difference of the BPM, which is a ubiquitous problem of BPM detection, and other algorithms and software have no good solution.

Specifically, the bpm detection algorithm finds the data that is most likely to be the real bpm from a set of data that is likely to be the song bpm as the detection result, and thus may cause errors. But the characteristic of identifying the music beat according to the computer algorithm, the error is limited to be one time faster or one time slower than the real speed, and other larger errors generally do not occur. For example, when a song is detected to have a bpm of 100, the bpm of the song may be 100, or 200 or 50, and the computer cannot calculate the actual bpm with complete accuracy.

The following solution is adopted for this doubling error problem.

1. A research analysis of song BPM from a musical perspective was performed to determine the BPM threshold for most songs, which was locked between 40-140. That is, if the bpm calculated by the conventional method is not within the range, it is considered to be miscalculated with a high probability. For example, if the bpm of a song calculated by the original method is 180, then few songs can normally have such a high speed, especially popular songs, and the speed of the song should be considered to be 180/2-90, so that most of wrong calculations can be filtered.

2. Whether the calculation result of the song BPM is normal or not is analyzed from the speed of the lyrics. The lyric text contains the time of each word, so the average speed of each word in a lyric can be counted, the speed reflects the speed of the song to a certain extent, and the going speed of the lyric in the fast song is also faster in popular terms. There are two key points: (1) and the addition of the average to prevent the part of the song without a person singing (no person singing means no lyrics) such as the prelude and the interlude can influence the real average speed of the lyrics. Only the valid part of the lyrics is counted, namely the product of the starting and stopping time of each sentence and the total number of the sentences is taken as the total time, but not the starting and stopping time of a song. (2) In order to prevent the rap (fast rap speech speed) and the lingering tone or the whole note (slow speech speed) inserted in part of music from influencing the speed tone of the whole song, the standard deviation and the quartile are adopted for processing, so that the calculated lyric speed is matched with the song speed as well as possible.

And a fourth stage: fitting of models

Selecting 200 popular songs in front of the ranking list, carrying out artificial emotion marking on the popular songs, and then calculating the horizontal and vertical coordinates in the above mode, thereby drawing the model shown in the figure 3.

FIG. 3 shows that six different emotions present a cluster distribution in coordinates. The model can be fitted by a support vector machine subsequently to obtain classification functions of different emotions. The results show that the fitted model is very similar to the originally proposed CYME model, validating the feasibility of this approach. At this point, any song can be classified by emotion through the model.

The present application further provides a music emotion recognition system, the system including:

In one embodiment, the abscissa analyzing unit includes:

In one embodiment, the abscissa analyzing unit further includes:

The foregoing description of various embodiments of the present application is provided for the purpose of illustration to those skilled in the art. It is not intended to be exhaustive or to limit the invention to a single disclosed embodiment. As described above, various alternatives and modifications of the present application will be apparent to those skilled in the art to which the above-described technology pertains. Thus, while some alternative embodiments have been discussed in detail, other embodiments will be apparent or relatively easy to derive by those of ordinary skill in the art. This application is intended to cover all alternatives, modifications, and variations of the invention that have been discussed herein, as well as other embodiments that fall within the spirit and scope of the above-described application.

Claims

1. A music emotion recognition method, characterized in that the method comprises:

2. The method of claim 1, wherein constructing an emotion dictionary from the set of lyric samples comprises:

3. The method of claim 2, wherein calculating the sentiment score for the song to be analyzed comprises:

4. The method of claim 3, wherein determining an emotion score for each sentence in the lyrics comprises:

5. The method of claim 1, wherein correcting the detected BPM value comprises:

6. The method of claim 1, wherein correcting the detected BPM value comprises:

7. The method of claim 6, wherein in counting the average speed of occurrence of each word in the lyrics of the song to be analyzed, the method further comprises:

8. A music emotion recognition system, the system comprising:

9. The system of claim 8, wherein the abscissa analysis unit comprises:

10. The system of claim 9, wherein the abscissa analysis unit further comprises: