CN112883145B

CN112883145B - Emotion multi-tendency classification method for Chinese comments

Info

Publication number: CN112883145B
Application number: CN202011547122.4A
Authority: CN
Inventors: 张少中
Original assignee: Zhejiang Wanli University
Current assignee: Zhejiang Wanli University
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2022-10-11
Anticipated expiration: 2040-12-24
Also published as: CN112883145A

Abstract

The invention provides an emotion multi-tendency classification method for Chinese comments, which comprises the following steps of: firstly, extracting morpheme words and emotion words; secondly, constructing a similarity relation between morpheme emotional variables; finally, calculating a morpheme emotion tight path; the morpheme emotion variables are regarded as nodes in a directed weighted acyclic graph, directed weighted relation connection is built among the morpheme emotion nodes and serves as directed weighted link edges, and effective paths meeting certain weight conditions are searched on the basis of the directed weighted link edges. The invention combines a directed weighted acyclic graph model with emotion tendency analysis, realizes emotion multi-tendency classification of comments by three steps of extracting various morpheme emotions of the comments, analyzing similarity relation among the morpheme emotions and calculating morpheme emotion tight paths, more accurately distinguishes various attitudes expressed by a user to objects, and reflects the attribute and characteristic opinion of the user to the objects.

Description

Emotion multi-tendency classification method for Chinese comments

Technical Field

The invention relates to sentiment tendency classification, in particular to a sentiment multi-tendency classification method for Chinese comments.

Background

With the rapid popularization and development of applications such as blogs, microblogs, comments and the like, various comments in a network become important ways for users to express opinions and communicate online. Comment information in a network typically expresses a user's opinion of things in the form of short text, such as a review of news events, a comment on performance of goods, and so on. All of these commenting information is published by a large number of users, making their own opinions and claims about things from different sides and different perspectives. The evaluation information is accumulated day by day, and a data set with complex structure, various contents and various emotional combinations is formed.

The relevant comments made by the user on the interested things are an important way to reflect the user's opinion on the attributes and characteristics of things. Users express the attitudes of the users in various aspects such as attitude of the users to events, performance of commodities, quality of service and the like through comments. The existing comment emotion tendency classification researches mainly divide emotion tendencies into positive emotions, negative emotions and neutral emotions, and some researches divide the emotion tendencies into several grades, such as: the classification of the emotion tendency into a plurality of fixed types is very favorable, neutral, unfavorable, very unfavorable and the like, and the classification is difficult to process more complicated emotion classification conditions.

Different users may experience different experiences with events, things, services, etc., as users' knowledge of the events, understanding of the things, experience of the services, etc., may all vary widely. Such diverse receptors may now express a wide variety of emotions and attitudes in their reviews. Meanwhile, in a single comment of a certain object (event, thing, service, etc.), a user sometimes expresses a single attitude, such as approval or disapproval, which is an overall evaluation about the object and expresses a certain tendency of emotion. However, due to the richness and complexity of human emotions, users often individually comment on and evaluate different aspects of a target, for example, evaluating a commodity involves different details and aspects of price, performance, appearance, etc., and express different attitudes on those details and aspects. This results in that the emotional tendencies expressed by the user in the same comment are not always of a single emotional type. In many cases, a user may agree to or deny something to the same thing, rather than affirming or denying something in its entirety. Thus, these different attitudes are a more comprehensive description of an object by the user, expressing a multi-faceted emotional orientation. In order to more accurately distinguish the multiple attitudes expressed by the user on the objects, it is necessary to classify the comments of the user into more detailed emotional trends.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide the sentiment multi-tendency classification method for Chinese comments, which can more accurately distinguish multiple attitudes expressed by users to objects.

The invention discloses an emotion multi-tendency classification method for Chinese comments, which comprises the following steps of:

s1, extracting a morpheme emotion variable; extracting various morpheme words and emotional words related to a commented object in a comment text according to a Chinese morpheme lexicon and an emotional corpus lexicon, calculating a correlation coefficient between the morpheme words and the emotional words by adopting a Pearson correlation coefficient method, and forming morpheme emotional variables through the correlation coefficient;

s2, constructing a similarity relation between morpheme emotion variables; calculating the approximate relation of the two morpheme emotion variables by adopting a conditional mutual information calculation formula, and describing the relation between the morpheme emotion variables;

s3, calculating a morpheme emotion tight path; the method comprises the steps of regarding morpheme emotion variables as nodes in a directed weighted acyclic graph, wherein the nodes are called morpheme emotion nodes or morpheme emotion node variables, constructing directed weighted relation connection among the morpheme emotion nodes and serving as directed weighted link edges, designing an improved shortest path search algorithm based on a directed weighted acyclic graph model on the basis of the directed weighted link edges, searching effective paths meeting certain weight conditions, and each path is an emotion tendency classification.

Preferably, in step S1, the chinese morphemes are divided into two types, namely, noun morphemes and emotion morphemes, and the two types of morphemes are combined in one or more of combination, deviation, domination, statement and supplement, and are extracted by a supervised machine learning method, and the morphemes in the comment text are associated with the emotion by using the pearson correlation coefficient between the morphemes and the emotion as the correlation coefficient, so as to construct a morpheme emotion variable.

Preferably, in step S2, two morpheme emotion nodes with similarity are connected by using a directed edge to form a directed link; the direction of the directional link is determined according to the sequence of occurrence of the morpheme emotional variables in the comments, and the sequence determines the connection direction of the link edges.

Preferably, in step S3, after directional link edges between all morpheme emotion nodes are obtained, the shortest paths from a certain start node to all end nodes are found, the morpheme emotion nodes on each shortest path form the strongest emotion tendency set, which represents an emotion tendency, and by setting a reasonable empirical threshold of maximum path length, those paths meeting the emotion intensity requirement are found, and the morpheme emotion nodes and the directional weighted edges on those paths form an effective emotion tendency classification.

Preferably, in step S1, the extraction of the morpheme emotion variables includes the following steps:

a1, selecting a comment training sample set, referring to the existing Chinese morpheme library, searching all Chinese name part-of-speech morphemes, and recording a morpheme set M;

a2, selecting a comment training sample set, referring to the existing emotion corpus, searching all Chinese emotion type morphemes, and recording an emotion set S;

a3, the morpheme elements in the morpheme set M and the emotion elements in the emotion set S form an independent morpheme emotion variable v _i Calculating a Pearson correlation coefficient r between each morpheme element and the emotion element; setting a threshold r _θ Will satisfy r ≧ r _θ Morpheme emotion variable v _i Recording effective morpheme emotion variable set V

Wherein n in the formula (1) is the number of effective morpheme emotion variables;

and a4, circularly executing a3 until all elements in the morpheme set and the emotion set are processed.

Preferably, the pearson correlation coefficient r between morphemes and emotions is calculated by the following formula:

wherein, in the formula (2)

And σ _M Are respectively paired with M _i The standard score, the mean, and the standard deviation of (a), n is the number of review training samples.

Preferably, in step S2, regarding the morpheme emotion variables as nodes in a directed weighted acyclic graph, which are called morpheme emotion nodes or morpheme emotion node variables, and calculating the approximate relationship of the morpheme emotion nodes includes the following steps:

b1, finding out a sub-node set of each morpheme emotion node, and constructing a directed acyclic graph of the morpheme emotion nodes;

firstly initializing a sub-node set, and firstly, initializing the sub-nodes of all morpheme emotional nodesEmptying the point set; then each pair of morpheme sentiment nodes v are calculated _i And morpheme sentiment node v _j When the condition mutual information is larger than a preset empirical value, the morpheme emotional node v is processed _j Node v regarded as morpheme sentiment _i A child node of (1); finally, outputting a sub-node set of all morpheme emotion nodes and a directed acyclic graph, wherein the directed acyclic graph is represented by G = (V, D); wherein v is _i 、v _j The semantic emotion nodes are directed acyclic graphs, G is an effective semantic emotion node set, and D is a directed edge set from a father node to a child node;

calculating the condition mutual information of each pair of morpheme emotion nodes:

wherein f (G) in the formula (3) is conditional mutual information, p (v) _i ,v _j ) For joint probability density function, chirld (v) _i ) Is a node v _i A set of child nodes of; the value range of i is [1, n-1 ]]J has a value range of [ i +1, n ]]；

b2, calculating similarity weights among the morpheme emotional nodes, and executing in a circulating mode until all the morpheme emotional nodes are traversed;

wherein, in the formula (4), W _i,j Is the weight of the similarity relation of two morpheme emotional nodes with a parent-child relation, N (v) _i ) And N (v) _j ) For the number of times that the nodes appear in the same comment text, N (v) _i ,v _j ) Is the number of times both occur simultaneously in the same comment text.

Preferably, in step S3, the computation of the morpheme emotion dense path includes the following steps:

c1, calculating the length of the directed link edge of the directed weighted acyclic graph, and converting the similarity weight into the length of the directed edge, L _i,j ＝-lnW _i,j In which，L _i,j Is the length of the directed edge;

c2, calculating an emotional tendency classification path, initializing variables, and sequentially executing the following steps:

c21, selecting a morpheme emotion node variable without a father node from the morpheme emotion variable set V as a starting node, and marking as V _s ；

c22, initializing the child nodes of the starting node to be self, and initializing the child nodes of other morpheme emotion nodes in the morpheme emotion variable set V to be null;

c23, sentiment nodes v of morphemes _i To morpheme emotion node v _j Has a path length of D _i,j The path length from the starting node to the self node is 0, and the initial value of the path length from the starting node to other morpheme emotion nodes is infinite; morpheme sentiment node v _i And morpheme emotion node v _j The length of the path between the two semantic emotion nodes is equal to the algebraic sum of the lengths of all the directed edges between the two semantic emotion nodes;

c24, initializing classification and candidate node set, C _k ＝{v _s }；Q＝{v _s }; wherein, C _k For the kth emotional tendency classification, Q is a candidate node set, v _s Is a start node;

c3, when the morpheme emotion node variable set V is not empty, searching the morpheme emotion nodes in the candidate node set Q, finding out the morpheme emotion nodes with the shortest path length, and executing the following steps:

c31, a variable v of a morpheme sentiment node _i And v _j Are all in the candidate node set Q, and i ≠ j, if the start node v _s To morpheme sentiment node V _i Is less than or equal to the starting node v _s To morpheme emotion node v _j If the path length is not equal to the threshold value, deleting the morpheme emotional node v with the shortest path length from the candidate node set Q _i ；

c32, obtaining the morpheme sentiment node v with the shortest path length _i Adding the shortest path set into the shortest path set;

c33, for each subordinate morpheme sentiment node v _i Starting connection to morpheme emotion node v _t When starting node v _s To morpheme sentiment node v _i Path length of (d) plus morpheme sentiment node v _i To morpheme emotion node v _t Is smaller than the start node v _s To morpheme emotion node v _t When the path length of (b) is greater than (c), then the start node v is used _s To morpheme emotion node v _i Path length of plus morpheme sentiment node v _i To morpheme emotion node v _t Updating shortest path length D by algebraic sum of directed edge lengths _s,t And will be associated with morpheme emotion node v _i Setting the successor node with the shortest path length as a morpheme sentiment node v _t (ii) a If morpheme emotion node v _t If not in the candidate set Q, the morpheme sentiment node v is connected _t Adding a candidate node set Q;

c34, when morpheme sentiment node v _i If no successor node exists, searching for the next classification;

c35, if the morpheme emotion node v _i If the node belongs to the morpheme emotional node set V, the morpheme emotional node V with the shortest path is deleted from the morpheme emotional node set V _i ；

And c4, if the path length is smaller than the set maximum path length threshold, the classification is valid, and the algorithm is ended.

Compared with the prior art, the sentiment multi-tendency classification method for Chinese comments has the following remarkable advantages that:

the invention combines a directed weighted acyclic graph model with emotion tendentiousness analysis, introduces a method for Chinese morphemes, divides the traditional Chinese morphemes into morphemes and emotion types, realizes emotion multi-trend classification of comments by three steps of extracting various morpheme emotions of the comments, analyzing similarity relation between the morpheme emotions and calculating a morpheme emotion tight path, more accurately distinguishes multiple attitudes expressed by a user to objects, and reflects the attribute and characteristic opinion of the user to the objects.

Drawings

FIG. 1 is an emotional multi-tendency classification model facing Chinese comments.

Fig. 2 is a comparison diagram of the convergence time of the algorithm with different values of epsilon =0.85 and ξ in the embodiment of the invention.

Fig. 3 is a comparison diagram of convergence time of the algorithm with ξ =2000 and epsilon as different values according to an embodiment of the invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

As shown in FIG. 1, the invention provides an emotion multi-tendency classification method facing Chinese comments, which comprises the following steps:

first, morpheme emotion variables are extracted. And extracting various morpheme words and emotional words about the object to be commented in the comment text according to the existing Chinese morpheme word library and emotion corpus word library. According to the characteristics of the Chinese language description objects, calculating the association coefficient between the morpheme words and the emotion words by adopting a Pearson correlation coefficient method, and forming morpheme emotion variables by the association coefficient. The morpheme emotion variable can be used as independent emotion content to describe a certain emotion type and is regarded as an independent emotion unit in emotion relation calculation.

Then, a similarity relation between morpheme emotion variables is constructed. And describing the relation between morpheme emotion variables by adopting a conditional mutual information calculation formula. The morpheme emotion variables are regarded as independent entities, a plurality of independent morpheme emotions are extracted from the comments, and the emotion tendencies of the users are expressed through the independent morpheme emotions. Through organic combination, the morpheme emotions may have certain similarity, the morpheme emotion variables with the similarity are descriptions of emotion tendencies similar to a certain morpheme, and the morpheme emotion variables with the similarity can reflect certain types of emotion tendencies.

And finally, calculating a morpheme emotion tight path. The morpheme emotion similarity relation obtained by adopting the conditional mutual information is a direct relation among morpheme emotion variables, and whether a morpheme emotion variable set with the direct relation can express the emotion tendencies of a user needs to determine the strength of a global relation formed by the morpheme emotion variables in the whole set. On the basis of the directed weighted acyclic graph model, the morpheme emotional variables are regarded as nodes in the directed weighted acyclic graph and are called morpheme emotional nodes or morpheme emotional node variables (in the directed weighted acyclic graph, the morpheme emotional nodes or the morpheme emotional node variables are called, and in the morpheme emotion set, the morpheme emotional node variables are only called). And constructing directed weighted relation connection between the morpheme emotion nodes to serve as directed weighted link edges. On the basis of the directed weighted link edges, an improved shortest path search algorithm is designed based on a directed weighted acyclic graph model, and effective paths meeting certain weight conditions are searched. All morpheme emotion node sets experienced on the path represent a certain type of emotion type, a plurality of paths can be found out as required by setting certain path length limitation, and each path is an emotion tendency classification.

The invention is directed to the emotional multi-tendency classification problem of Chinese comments. According to the characteristics of Chinese grammar and morphemes, chinese comments generally have specific topics and objects, and the topics and the objects can be obtained through related topics of websites or platforms where the comments are located, such as topics in microblogs, specific event objects in blogs, products and services in electronic commerce and the like. These related titles and specific descriptions of related topics, objects, products, services, etc. constitute metadata about the objects and may be considered objects and topics. Generally, the object and subject of the comment can be clearly determined in the comment of most online users.

Objects and topics are composed of a number of aspects, which are branch parts that describe the various components of the object. Sometimes a user is looking at the entire object in a review, but also at some aspect of the object. These aspects are the core content expressing the user's emotion and it is necessary to extract these aspects describing the object. The various aspects of the object are typically composed of morphemes, which are attributes and characteristics of the object and subject, and monosyllabic vocabulary, bi-syllabic vocabulary, and polysyllabic in the reviews may all be considered morphemes.

The invention divides Chinese morphemes into two types: noun morphemes and emotion morphemes. The nominal description about things, different sides, functions, attributes, features, etc. of an object in a comment belongs to noun class morphemes, which are abbreviated as morphemes; the expression emotion, attitude, preference, emotion and other contents belong to emotion morphemes, which are called emotion for short. The invention takes the morpheme emotion as the most basic unit of emotion analysis, the morpheme emotion is the inseparable part of the emotion tendentiousness expressed by the user, and a morpheme emotion represents the emotion tendentiousness of the user in the aspect of the morpheme.

The method is characterized in that the morpheme emotion is a basic unit for judging the emotion tendentiousness of a user, and the extraction and mining of effective morpheme emotion are the primary tasks of comment emotion multi-tendentiousness classification. In the Chinese sentence expression structure, the two morphemes can be combined in a compound mode of union, deviation, domination, statement, supplement and the like. The morpheme emotion mining is to find the effective morphemes in the comments and the emotions closely related to the morphemes, and then link the effective morphemes and the emotions to be used as the basic elements of the overall emotion classification. The method extracts the morphemes in the comment text by a supervised machine learning method, and the morphemes and the emotions are corresponding by taking the Pearson correlation coefficient between the morphemes and the emotions as a correlation coefficient to construct a morpheme emotion variable.

And analyzing similarity relation between morpheme emotions. The extracted morpheme emotion variables are regarded as effective emotion tendency nodes, and the morpheme emotion nodes with similarity indicate that the nodes have certain similar emotion tendencies on a certain side. The relation between different morpheme emotion nodes is analyzed whether similarity exists or not through emotion similarity calculation. The invention adopts a conditional mutual information method to calculate the approximate relationship between two morpheme emotional nodes. Two nodes with similarity are connected by using directed edges to form a directed link. The direction of the directional link is determined according to the sequence of occurrence of the morpheme emotional variables in the comments, and the sequence determines the connection direction of the link edges.

And calculating the emotional tendency classification path and determining effective emotional tendency classification. After directional link edges among all morpheme emotional nodes are obtained, the shortest path from a certain starting node to all termination nodes is found out by utilizing an improved shortest path algorithm, the morpheme emotional nodes on each shortest path form a strongest emotion tendency set which represents an emotion tendency, the paths meeting the emotion intensity requirement can be found out by setting a reasonable maximum path length empirical threshold, and the morpheme nodes and the directional weighting edges on the paths form effective emotion tendency classification. And if the number of paths meets the requirement, obtaining the number of emotional tendency classifications, thereby realizing the comment emotional multi-tendency classification of the invention.

The emotion multi-tendency classification algorithm for Chinese comments, which is provided by the invention, is as follows:

inputting: chinese comment text data sets (training samples, test samples);

and (3) outputting: an emotion multi-tendency classification set;

step1, selecting a comment training sample set, referring to the existing Chinese morpheme library, searching all Chinese name part-of-speech morphemes, and recording a morpheme set M;

step2, selecting a comment training sample set, referring to the existing emotion corpus, searching all Chinese emotion type morphemes, and recording an emotion set S;

and Step3, executing a loop until all elements in the morpheme set and the emotion set are processed:

calculating a Pearson correlation coefficient r between each morpheme and the emotion;

wherein

And σ _M Are respectively to M _i Standard score, mean and standard deviation of (a);

step4, the morpheme elements and the emotion elements form an independent morpheme emotion variable v _i Setting a threshold r _θ Will satisfy r ≥ r _θ The morpheme emotion variables are recorded into an effective morpheme emotion variable set

Step5, changing the effective morpheme emotion variable v _i Looking as nodes in the directed weighted acyclic graph, finding out a child node set of each node, constructing a directed acyclic graph G of the morpheme emotion node, and executing (Step 5-1 to Step 5-3):

step5-1, initializing a child node set, emptying the child node set of all morpheme emotion nodes, for i =1 to n, and executing Chirld (v) _i )←φ；

Step5-2, calculating a conditional mutual information function f (G), i =1 to n-1, j = i +1 to n, and circularly executing (S5-2-1 to S5-2-2):

step5-2-1, calculating the conditional mutual information of each pair of morpheme emotion nodes:

f (G) is conditional mutual information, p (v) _i ,v _j ) For joint probability density function, chirld (v) _i ) For morpheme emotion node v _i A set of child nodes of;

step5-2-2, judging that if f (G) is more than or equal to epsilon, chirld (v) _i )←v _j (ii) a Wherein epsilon is an empirical constant, and a morpheme sentiment node v _i And morpheme emotion node v _j If the conditional mutual information is greater than a certain empirical value, the morpheme emotion node v is determined _j Node v regarded as morpheme emotion _i A child node of (2);

step5-3, outputting a child node set Chirld (v) of all nodes _i ) And directed acyclic graph G = (V, D); d is a directed edge set from a parent node to a child node;

step6, calculating similarity weights among morpheme emotional nodes, and executing in a circulating mode until all nodes are traversed, wherein i =1 to n-1; j = i +1 to n;

do：

wherein W _i,j Is the weight of the similarity relation of two morpheme emotional nodes with a parent-child relation, N (v) _i ) And N (v) _j ) The number of times that the nodes appear in the same comment text, N (v) _i ,v _j ) The number of times that the two occur in the same comment text at the same time;

step7, calculating the length of a directed link edge between two morpheme emotional nodes in the directed weighted acyclic graph, L _i,j ＝-ln W _i,j (ii) a The similarity weight is converted into a directed edge length, L _i,j Is the length of the directed edge;

step8, calculating an emotional tendency classification path, initializing variables, and sequentially executing (Step 8-1 to Step 8-5):

Step8-1，k＝1，C _k = phi; wherein C is _k Classifying the kth emotional tendency;

step8-2, selecting a node variable without a father node from the morpheme emotion variable set V as a starting node, and marking as V _s ；

Step8-3, node v will start _s The child nodes are initialized to be other nodes V in the morpheme emotion variable set V _j The child node of (a) is initialized to null; chirld (v) _s )＝v _s ；Chirld(v _j )＝φ；

Step8-4，D _i,j For morpheme emotion node v _i To morpheme sentiment node v _j Has a path length from the start node to itself of 0 _s,s =0; the initial value of the path length from the starting node to the other morpheme emotion nodes is infinity, D _s,j = ∞; the length of the path between two morpheme emotion nodes is equal to the algebraic sum of the lengths of the directed edges between all the nodes passed by the path, D _i,j ＝L _i,1 +L _1,2 +...+L _j-1,j ；

Step8-5, initializing classification and candidate node set: c _k ＝{v _s }；Q＝{v _s In which C is _k For the kth emotional tendency classification, Q is a candidate node set, v _s Is a start node;

step9, when the morpheme sentiment node set V is not empty, namely V is not equal to phi, searching nodes in the candidate node set Q, finding out morpheme sentiment nodes with the shortest path length, and executing (Step 9-1 to Step 9-5);

step9-1, for i, j ∈ Q and i ≠ j, if D _s,i ≤D _s,j Then Q = Q- { v _i }; for morpheme sentiment node variable v _i And V _j All are in a candidate node set Q, and i is not equal to j, if a starting node reaches a morpheme sentiment node V _i Has a path length less than or equal to the length from the start node to the morpheme emotion node v _j If the path length is not equal to the threshold value, deleting the morpheme emotional node v with the shortest path length from the candidate node set Q _i ；

Step9-2，C _k ＝C _k ∪{v _i }; updating sentiment classification set C _k Morpheme emotion node v with shortest path length _i Add to Emotion Classification set C _k Performing the following steps;

step9-3, for each morpheme emotion node v _i Starting connection to morpheme emotion node v _t Directed edges of, i.e. all v _t ∈Chirld(v _i ) When path length D _s,i +L _i,t ＜D _s,t Then (S9-3-1 to S9-3-2) are executed:

Step9-3-1，D _s,t ＝D _s,i +L _i,t ；Next(v _i )＝v _t (ii) a For each sentiment node v _i Starting connection to morpheme emotion node v _t When starting node v _s To morpheme emotion node v _i Path length of plus morpheme sentiment node v _i To morpheme emotion node v _t Is smaller than the starting node v _s To morpheme sentiment node v _t When the path length of (b) is greater than (c), then the start node v is used _s To morpheme emotion node v _i Path length of plus morpheme sentiment node v _i To morpheme emotion node v _t Updating shortest path length D by algebraic sum of directed edge lengths of _s,t ，Next(v _i )＝v _t Denotes v _t Is a semantic emotion node v _i The subsequent node with the shortest length directional connection edge;

step9-3-2, judge if

Then Q = Q utou { v }is performed _t }; if morpheme sentiment node v _t If not in the candidate set Q, the node v is connected _t Adding a candidate node set Q;

step9-4, judge if Next (v) _i ) If = Φ, then k = k +1 is executed; when morpheme sentiment node v _i If no successor node exists, searching for the next classification;

step9-5, judge if v _i E.g. V, then execute V = V- { V _i }; if morpheme emotion node v _i If the node belongs to the morpheme emotional node set V, deleting the morpheme emotional node V with the shortest path found from the morpheme emotional node set V _i ；

Step10, if D _s,t Xi is less than or equal to, all C is output _k Gathering and finishing the algorithm; if the path length is smaller than the set maximum path length threshold xi, the classification is valid.

The following are specific examples of the present invention and further describe the technical solutions of the present invention, but the present invention is not limited to these examples.

The technical effect experiment adopts self-collected data to test, the data set is derived from user comments about mobile phones of a certain online shopping mall, and the data comprises comments in the period from 5 months in 2019 to 10 months in 2019. Data used in the experiment are preliminarily screened, so that at least 5 comments are provided for all consumers and commodities, and a comment data record structure is composed of a comment name, a product number, a comment text and a score. The detailed structure of the data set is shown in table 1.

TABLE 1 data Structure of comments

From the review data, 1,000 reviews related to 100 models of cell phones were selected as our test data, and manually labeled. Each comment may express multiple aspects of emotion, which requires multiple labels to be labeled, depending on its specific content. Table 2 is a sample of labels manually labeled.

TABLE 2 Emotion tendentiousness tag for review data

Technical test method

The records of the data set were equally divided into 5 sections in the experimental test, each section containing 200 reviews. First, one part of the data set is used as a test set and the remaining 4 parts are used as training data sets. Accuracy in the experiment included Precision, recall. Then, another section is selected as the test set, the remaining 4 sections in the data set are used as the training set, and the precision rate, recall rate, and CPU time consumed are calculated again until all 5 sections are used as the test set for one pass.

Testing the effects

The control parameter epsilon of the conditional mutual information of the algorithm in the accuracy test is respectively 0.65, 0.75, 0.85 and 0.95, and the maximum length xi of the path distance is respectively 1000, 2000, 3000, 4000 and 5000. The results are shown in Table 3:

TABLE 3 average of precision, recall and F values for the algorithm when the control parameters ε and ξ take different values, respectively

As can be seen from table 3, when epsilon is 0.95 and ξ is 2000, the accuracy value is the highest, but the recall rate is very low, which indicates that in this case, the correct case is good in the classification result of the algorithm, but the missing classification is also many; when epsilon is 0.85 and xi is 5000, the recall rate is high, but the accuracy rate is reduced, which shows that in the case, the missing classification in the classification result of the algorithm is few, but the correct condition is not good, so that the accuracy rate and the recall rate are not good only. From the overall consideration, when epsilon is 0.85 and xi is 2000, the precision rate and the recall rate are in a better condition.

When epsilon is a fixed value and xi is a variable value, the accuracy rate is slightly increased along with the increase of xi at the beginning, but the arrival peak value is reduced, mainly because under the condition that the node similarity relation is fixed, the maximum path distance is increased to a certain degree to obtain higher accuracy, but the node which is not in a close relation is added into the path by unlimited increase to cause the reduction of label classification accuracy; for the recall rate, the recall rate reflects the number of the missing tags, and the bigger ξ is, more tags are added, the missing is reduced, and the recall rate is also improved.

When xi is a fixed value and epsilon is a variable value, the accuracy rate increases with the increase of epsilon because larger mutual information finds out more accurate labels, but the recall rate reaches a maximum value in a certain range, which indicates that the number of missing labels is reduced at the beginning but is increased after reaching a certain degree, and the reason is that the strong similarity relationship of nodes is mainly over-emphasized, so that the missing of correct labels is caused.

The control parameter epsilon of the algorithm in the time efficiency test is 0.85, the xi is 2000, 3000 and 5000, and the convergence time of the algorithm in the three cases is shown in figure 2. As can be seen from FIG. 2, when ε is fixed, the convergence time of the algorithm increases with increasing ξ value. The value of the control parameter xi of the algorithm is 2000, the values of the parameter epsilon are respectively 0.75, 0.85 and 0.95, and the convergence time of the algorithm under the three conditions is shown in figure 3. As can be seen from fig. 3, when ξ takes a fixed value, the algorithm convergence time decreases as the value of ∈ increases.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention shall be subject to the claims.

Claims

1. A Chinese comment-oriented emotion multi-tendency classification method is characterized by comprising the following steps: the method comprises the following steps:

s2, constructing a similarity relation between morpheme emotion variables; calculating the approximate relation of the two morpheme emotional variables by adopting a condition mutual information calculation formula, and describing the relation between the morpheme emotional variables;

s3, calculating a morpheme emotion tight path; regarding the morpheme emotional variables as nodes in a directed weighted acyclic graph, namely morpheme emotional nodes or morpheme emotional node variables, constructing directed weighted relation connection among the morpheme emotional nodes as directed weighted link edges, designing an improved shortest path search algorithm based on a directed weighted acyclic graph model on the basis of the directed weighted link edges, searching effective paths meeting certain weight conditions, wherein each path is an emotional tendency classification;

in step S3, the computation of the morpheme emotion dense path includes the following steps:

c1, calculating the length of the directed link edge of the directed weighted acyclic graph, converting the similarity weight into the length of the directed edge, L _i,j ＝-lnW _i,j Wherein L is _i,j Is the length of the directed edge;

c21, selecting a morpheme emotion node variable without a father node from the morpheme emotion variable set V as a starting node and recording the starting node as V _s ；

c23, converting the morpheme emotion node v _i To morpheme sentiment node v _j Has a path length of D _i,j The path length from the starting node to the self node is 0, and the initial value of the path length from the starting node to other morpheme emotion nodes is infinite; morpheme sentiment node v _i And morpheme sentiment node v _j The length of the path between the two semantic emotion nodes is equal to the algebraic sum of the lengths of all the directed edges between the two semantic emotion nodes;

c31, when morpheme sentiment node variable v _i And v _j Are all in the candidate node set Q, and i ≠ j, if the start node v _s To morpheme emotion node V _i Is less than or equal to the starting node v _s To morpheme emotion node v _j If the path length is not equal to the threshold value, deleting the morpheme emotional node v with the shortest path length from the candidate node set Q _i ；

c32, obtaining the morpheme emotion node v with the shortest path length _i Adding the shortest path set into the shortest path set;

c33, for each subordinate morpheme sentiment node v _i Starting connection to morpheme sentiment node v _t When starting node v _s To morpheme emotion node v _i Path length of plus morpheme sentiment node v _i To morpheme sentiment node v _t Is smaller than the start node v _s To morpheme sentiment node v _t When the path length is greater than the predetermined value, the start node v is used _s To morpheme emotion node v _i Of (2) aLength plus morpheme sentiment node v _i To morpheme emotion node v _t Updating shortest path length D by algebraic sum of directed edge lengths _s,t And will be associated with morpheme emotion node v _i Setting the successor node with the shortest path length as a morpheme sentiment node v _t (ii) a If morpheme emotion node v _t If not in the candidate set Q, the morpheme sentiment node v is connected _t Adding a candidate node set Q;

c35, if morpheme sentiment node v _i If the node belongs to the morpheme emotional node set V, the morpheme emotional node V with the shortest path is deleted from the morpheme emotional node set V _i ；

2. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in the step S1, chinese morphemes are divided into noun morphemes and emotion morphemes, the two morphemes are combined in one or more of combination, correction, domination, statement and supplement modes, the morphemes in the comment text are extracted through a supervised machine learning method, the morphemes and the emotions are corresponding by taking a Pearson correlation coefficient between the morphemes and the emotions as a correlation coefficient, and morpheme emotion variables are constructed.

3. The method for classifying emotion and tendency facing Chinese comments as claimed in claim 1, wherein: in the step S2, two morpheme emotion nodes with similarity are connected by using directed edges to form a directed link; the direction of the directional link is determined according to the sequence of occurrence of the morpheme emotional variables in the comments, and the sequence determines the connection direction of the link edges.

4. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S3, after directional link edges between all morpheme emotional nodes are obtained, the shortest paths from a certain starting node to all termination nodes are found out, the morpheme emotional nodes on each shortest path form a strongest emotional tendency set which represents an emotional tendency, the paths meeting the emotional strength requirement are found by setting a reasonable maximum path length empirical threshold, and the morpheme emotional nodes and the directional weighting edges on the paths form effective emotional tendency classification.

5. The method for classifying emotion and tendency facing Chinese comments as claimed in claim 1, wherein: in step S1, the extraction of the morpheme emotion variables includes the following steps:

a3, the morpheme elements in the morpheme set M and the emotion elements in the emotion set S form an independent morpheme emotion variable v _i Calculating a Pearson correlation coefficient r between each morpheme element and the emotion element; setting a threshold r _θ Will satisfy r ≥ r _θ Morpheme emotion variable v _i Recording effective morpheme emotion variable set V

6. The method for classifying emotion and tendency facing Chinese comments as claimed in claim 5, wherein: the pearson correlation coefficient r between morphemes and emotions is calculated by the formula:

wherein, in the formula (2)

7. The method for classifying emotional multi-tendency to Chinese comments as claimed in claim 1, wherein: in step S2, regarding the morpheme emotion variables as nodes in the directed weighted acyclic graph, which are called morpheme emotion nodes or morpheme emotion node variables, and calculating the approximate relationship of the morpheme emotion nodes includes the following steps:

firstly, initializing a child node set, and emptying the child node sets of all morpheme emotion nodes; then, calculating sentiment nodes v of each pair of morphemes _i And morpheme sentiment node v _j When the condition mutual information is larger than a preset empirical value, the morpheme emotional node v is processed _j Node v regarded as morpheme emotion _i A child node of (1); finally, outputting a sub-node set of all morpheme emotion nodes and a directed acyclic graph, wherein the directed acyclic graph is represented by G = (V, D); wherein v is _i 、v _j The semantic emotion nodes are directed acyclic graphs, G is an effective semantic emotion node set, and D is a directed edge set from a father node to a child node;

wherein f (G) in the formula (3) is conditional mutual information, p (v) _i ,v _j ) Chirld (v) as a function of the joint probability density _i ) Is a node v _i A set of child nodes of; the value range of i is [1, n-1 ]]J has a value range of [ i +1, n ]]；

b2, calculating similarity weights among morpheme emotional nodes, and executing in a circulating mode until all the morpheme emotional nodes are traversed;

wherein, in the formula (4), W _i,j Is the weight of the similarity relation of two morpheme emotional nodes with a parent-child relation, N (v) _i ) And N (v) _j ) The number of times that the nodes appear in the same comment text, N (v) _i ,v _j ) Is the number of times both occur simultaneously in the same comment text.