CN110674849A

CN110674849A - Cross-domain emotion classification method based on multi-source domain integrated migration

Info

Publication number: CN110674849A
Application number: CN201910823443.3A
Authority: CN
Inventors: 相艳; 陆婷; 余正涛; 郭军军; 线岩团; 许莹
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2020-01-10
Anticipated expiration: 2039-09-02
Also published as: CN110674849B

Abstract

The invention relates to a multi-source domain integrated migration-based cross-domain emotion classification method, and belongs to the technical field of computers and information. The invention comprises the following steps: firstly, a feature-extended neural structure is utilized to correspondingly learn the FNSCL model to obtain a source domain D_sMigration to target Domain D_tAnd training different logetics classifiers; then, calculating the weight of the logistic classifier according to an integration consistency principle; and finally, optimizing the weight of each classifier by using a simulated annealing algorithm. The method can improve the emotion classification effect of the cross-domain target domain, and the emotion classification experimental results of product reviews (electronic product reviews, book reviews, kitchen ware reviews and DVD reviews) in four different domains of Amazon show that the cross-domain emotion classification method based on multi-source domain integrated migration is effective.

Description

Cross-domain emotion classification method based on multi-source domain integrated migration

Technical Field

The invention relates to a multi-source domain integrated migration-based cross-domain emotion classification method, and belongs to the technical field of computers and information.

Background

At present, electronic commerce has had a great impact on our lives. The comment texts of historical consumers on the E-commerce products are subtly influencing the purchasing behaviors of other consumers. Therefore, the research on sentiment classification of product reviews of e-commerce platforms gradually becomes a new research hotspot. The emotion classifier trained on the label data of a certain field has high performance reduction when applied to other fields. To solve this problem, the prior art includes: aue et al propose training classifiers by mixing a small amount of labeled data and unlabeled data, with good effect, and the trained models can identify labeled and unlabeled mixed data in the target domain; for labeled samples, Yang et al propose a feature-based selection for transfer learning of sentence level classification; pan et al propose to utilize the spectrum feature alignment algorithm to look for the relevant characteristic between the domains on the basis of the characteristic level, set up the relation between two domains, thus realize the emotional classification of cross-domain; a feature-based improved algorithm-Xie et al propose a potential spatial feature alignment algorithm.

However, when the label corpus of the domain is very small, the emotion classifier trained by using the label corpus of one source domain is often poor in effect because the feature distributions between different source domains and target domains are different. And the classifier applied to the migration of the single source domain only utilizes the characteristics of the migration of the single source domain to the target domain, and does not fully utilize the migration characteristics of a plurality of source domains. Aiming at the problem, the invention provides a cross-domain emotion classification method based on multi-source domain integrated migration.

Disclosure of Invention

The invention provides a multi-source domain integrated migration-based cross-domain emotion classification method, which is used for solving the problem of poor emotion classifier effect of label corpus training in one source domain and improving the emotion classification effect of a target domain.

The technical scheme of the invention is as follows: the method for classifying the emotion of the multi-source domain integration migration based on the cross-domain emotion comprises the following specific steps:

step1, obtaining a source domain D by using a Feature-extended neural structural Learning (FNSCL) model_sMigration to target Domain D_tTraining different logistic classifiers;

as a preferred embodiment of the present invention, in Step1, a FNSCL model, which is a model for finding migration characteristics based on a neural structure corresponding learning model NSCL, is used to obtain the source domain D_sMigration to target Domain D_tIn turn, training different logistic classifiers, D_s1To D_tTraining to obtain a logistic classifier 1, D_s2To D_tTraining to obtain a logistic classifier 2, D_s3To D_tTraining results in a logistic classifier 3.

As a preferred scheme of the invention, the FNSCL model utilizes the expanded pivot characteristics to carry out characteristic migration, not only considers the mutual information MI value between the characteristics and the labels in the training set, but also considers the word frequency of the characteristics when screening the pivot characteristics, and uses a TF-IDF characteristic selection algorithm; the specific steps for screening pivot characteristics are as follows:

step1.1, firstly calculating the MI value between the feature and the label in the source domain through a formula 1,

where x represents a binary feature vector, p (x) represents the probability of the selection x appearing in the text feature, p (y) represents the probability of the sentence label y appearing, and p (x, y) represents the joint probability of x and y;

step1.2, then, selecting the characteristics that the word frequency of the uni-gram or the bi-gram exceeds the minimum word frequency threshold value in two domains, and taking the characteristics as second-level candidate pivot characteristics;

step1.3, finally, selecting the uni-gram or bi-gram with the TF-IDF value arranged in the front row from the second-level candidate pivot features through a TF-IDF feature selection algorithm as the final pivot feature between two domains;

the FNSCL model is based on the NSCL model, a TF-IDF characteristic selection method is added during pivot characteristic extraction, and the pivot characteristic extraction method is expanded.

Step2, giving weights to the logics classifiers according to an integrated consistency principle, performing probability distribution prediction on input examples of a target domain by using the logics classifiers, obtaining average probability vectors of the classifiers by the probability distribution vectors predicted by the examples of each classifier, and obtaining the weight of each classifier by calculating the opposite number of the entropy of the consistency value, namely the average probability vector;

in Step2, the importance of each classifier in the integrated classifier for predicting the target domain data is considered to be different, and when the class prediction of the target domain samples by different classifiers reaches the maximum integration consistency, the weights of the logistic classifiers at the moment are considered to be optimal, so that different logistic classifiers are given different weights.

In Step2, as a preferred embodiment of the present invention, the trained logetics classifier is used to predict the probability distribution of the input instance of the target domain, and the probability distribution vector obtained by the logetics classifier 1 for the same input instance is P¹The probability distribution vector obtained by the logistic classifier 2 is P²The probability distribution vector obtained by the logistic classifier 3 is P³M is the number of classifiers and is represented by a formulaCalculating to obtain average probability vector of classifier, wherein the consistency value is the inverse number of entropy calculated by average probability vector, such as formula

Where E represents the information entropy of the mean probability distribution vector predicted by the classifier for the target domain instance.

Step3, optimizing by using a simulated annealing algorithm to obtain the optimal weight of each classifier;

step4, carrying out sentiment classification on the product comments by using a trained classifier optimized to the optimal weight, and carrying out sentiment classification experimental verification on the Amazon product comments in four fields; the amazon product reviews in the four fields comprise electronic product reviews, book reviews, kitchen ware reviews and DVD reviews.

The invention has the beneficial effects that:

the invention provides a method for combining a plurality of source domain classifiers together and utilizing the migration characteristics of the source domains, thereby finally improving the emotion classification effect of the cross-domain target domain. Theories and technologies are verified in four different fields (the field of electronic products, the field of books, the field of kitchen ware and the field of DVDs) of Amazon products, and the effectiveness of the method is proved by experimental results.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph showing the results of the experiment in the present invention;

FIG. 3 is a schematic representation of a NSCL model;

FIG. 4 is a graph of experimental results comparing the present invention with other models.

Detailed Description

Example 1: as shown in fig. 1-4, based on the cross-domain emotion classification method of multi-source domain integrated migration,

the emotion classification method comprises the following specific steps:

step1, obtaining a source domain D by using a Feature-extended neural structural Learning (FNSCL) model_sMigration to target Domain D_tIn turn, training different logistic classifiers, D_s1To D_tTraining to obtain a logistic classifier 1, D_s2To D_tTraining to obtain a logistic classifier 2, D_s3To D_tTraining to obtain a logistic classifier 3;

the FNSCL model is based on the NSCL model, a TF-IDF characteristic selection method is added during pivot characteristic extraction, and the pivot characteristic extraction method is expanded, wherein the NSCL model is structurally shown in figure 3.

specifically, the method comprises the following steps: considering that the importance of each classifier in the integrated classifier to the prediction of the target domain data is different, different weights are given to different logistic classifiers, and when the different classifiers predict the class of the target domain sample to the maximum integration consistency, the weights of the logistic classifiers at the moment are considered to be optimal, so that different logistic classifiers are given different weights.

The principle of integration consistency relates to a logistic regression model and the concept of entropy, and the basic idea is as follows: firstly, searching a proper classification function, and predicting a judgment result of an input sample; then, constructing a loss function to represent the deviation of the predicted output result and the actual category of the training sample; and finally, training the model, and storing the optimal model parameters when the model loss is minimum.

The trained logics classifier in Step1 is used to perform probability distribution prediction on the input instance of the target domain, and the logics model predicts the probability distribution p (Y | X) of the input sample X belonging to the category Y. The samples input to the logistic classifier are characterized by a probability distribution vector p, p ∈ R, p (i) represents the probability that the input sample belongs to class i, and

where d represents the number of categories. Given m

logistic classifiers l

1,2, … m, the prediction probability distribution vector for each classifier on the same input sample is p^lThen the mean probability distribution vector of the classifier is calculated by equation (2);

the definition of shannon entropy is given in the probability distribution vector, and the larger the probability that the input sample belongs to the category i, the larger the obtained information entropy. The information entropy of the average probability distribution vector predicted by the classifier for the target domain instance is calculated by equation (3).

An integrated classifier H is defined which is,

input instance x, and each of the logistic classifiers predicts the probability distribution vector for each instance, denoted p¹,p²,…p^mCalculating the average probability distribution vector of each example according to the formula (2)For emotion twoClassification problem, Table 1 records the results of the logistic classifier 1 (h)¹) Local classifier 2 (h)²) And a logistic classifier 3 (h)³) Prediction example x₁,x₂Is given as a probability distribution vector p¹,p²And the mean probability distribution vector corresponding to the instance

This is taken as an illustration to show that a consistency metric value of the classifier can be calculated through entropy. The dimension of the probability distribution vector for one instance is 2 dimensions, with the 1 st dimension representing the probability that each classifier prediction instance belongs to the first class and the 2 nd dimension representing the probability that each classifier prediction instance belongs to the second class. Similarly, the mean probability distribution vector

Also a two-dimensional vector, the 1 st dimension represents the average probability that three classifier prediction instances belong to the first class, and the 2 nd dimension represents the average probability that three classifier prediction instances belong to the second class. Detailed analysis, for the first example x₁All classifiers are completely identical, i.e. they have a 100% probability of belonging to class 2. In addition, when the consensus degree of the three classifiers on the example prediction result reaches the maximum value, the entropy E (0,1) of the average distribution vector reaches the minimum value. In addition, for the second example x₂The first two classifiers predict that they belong to class 1 and class 2, respectively, and the latter classifier predicts that the probability of belonging to class 1 is 50% and the probability of belonging to class 2 is 50%, at which time the consensus degree of the three classifiers is the lowest, whereas the entropy E (0.5 ) of the average distribution vector reaches the maximum.

TABLE 1 entropy and consistency metrics of probability distribution vectors

It follows that the inverse of the entropy in the mean probability distribution vector can be defined as a measure of consistency for different predictors, with the corresponding formula:

wherein E is the information entropy in formula (3), p¹,p²,…p^mThe probability distribution vectors predicted for the m classifier pairs of instances,for the example average probability distribution vector calculated by equation (2), the inverse of the sum of the information entropies is the consistency measure c_e。

Step3, optimizing classifier weight: optimizing by using a simulated annealing algorithm to obtain the optimal weight of each classifier;

combining classifiers for emotion classification does not simply average the prediction results of each classifier over the instances as the classifier's weights, but rather finds the optimal combination of weights. In the process of learning the weight, the prediction consensus degree of each classifier on the same example needs to be maximized, so that the integration consistency measurement function is used as an objective function, and the optimal weight w of each classifier is obtained by optimizing by using a simulated annealing algorithm₁、w₂And w₃. In order to obtain the optimal combination of the weights, the problem of parameter estimation in the integrated classifier is converted into the problem of solving the optimal solution of the objective function. The invention takes a consistency measurement function as an objective function f (w)_i) The weight w of the ith classifier_iAs an argument, the simulated annealing algorithm (SA) was used to find the optimal parameter w_iLet f (w)_i) The value of (c) reaches a maximum. The reason for using the simulated annealing algorithm is that the simulated annealing algorithm receives a solution which is worse than the current result with a certain probability, local optimization is easy to jump out to achieve global optimization, and the maximum value of integration consistency based on the global is obtained, and the flow of the simulated annealing algorithm is as follows;

beginning:

1. given an initial value t₀End value t₁Given an initial feasible solution w_iObjective function f (w)_i) The number of iterations L for each value of T is set.

2. The iteration counter L is 1,2, … L, and steps 3 to 6 are performed.

3. Generating a new solution w_{i_new}Changing the value w of the argument constantly_{i_new}＝w_i+ Δ w, Δ w is a random variable generated between [0, 1.

4. Calculating Δ f ═ f (w)_{i_new})-f(w_i) Optimization goal f (w)_i)。

5. If delta f is more than or equal to 0, take w_{i_new}Is the current solution, otherwise with a certain probability

Accepting the new solution as the current solution.

6. And judging whether the iteration times L under each T value is reached or not, and if the iteration times L reach the termination condition, exiting.

7. Judging whether the L value reaches the termination condition, and gradually decreasing the L value by setting the descending amplitude of the L as alpha, wherein T is alpha T, and T is more than T₀And turning to the second step. Otherwise, obtaining the current optimal solution maxf (w)_i)。

And (6) ending.

Step4, carrying out sentiment classification on the product comments by using a trained classifier optimized to the optimal weight;

aiming at the experimental verification of the method disclosed by the invention on the product comments (electronic product comments, book comments, kitchen ware comments and DVD comments) in four different fields of Amazon, the method comprises the following specific steps:

step a 1: e is recorded as the domain comment of the electronic product, B is recorded as the domain comment of the book, and D is recorded as the domain comment of K, DVD is recorded as the domain comment of the kitchen ware. In the four fields, 3 source domains and 1 target domain are set, the characteristics of the three source domains are combined to be used as a training set of the source domains, and 3 local logistic classifiers are trained. The experimental settings were: the left side of (D, E, K) - > B, (B, E, K) - > D, (B, D, K) - > E, (B, D, E) - > K, -, is the source domain and the right side of- > is the target domain. Here, the classifier using the multi-source domain feature migration training improves the accuracy of the classifier by 3.3% on average in the classification process compared with the classifier obtained by the single-source domain training, as shown in table 2.

TABLE 2 comparison of results for multi-source weighted ensemble classifier and single-source classifier

Step a 2: setting the number of pivot features from a single source domain to a target domain as 100, setting the dimension of a pivot feature word vector as 500 dimensions, and weighting the classifier according to the principle of maximum consistency of integrated classification.

Step a 3: the classifier weights are optimized using a simulated annealing algorithm, and as can be seen from table 2, the parameters used in the optimization model need to be artificially set to initial values. Setting an initial value t₀10,000, end value t₁The initial solution (i.e., the initial weights of the logistic classifier 1 and logistic classifier 2) is w, 0.1, respectively₁＝0.1,w₂When the weight of the logistic classifier 3 is 0.5, the weight is w₃＝1-w₁-w₂The decrease α of the T value was 0.95. Before optimization, the weight of each classifier is the same, and after optimization, the weight is reasonably distributed.

Step a 4: finally, the method provided by the invention is compared with an emotion classifier based on original features and an emotion classifier based on mapping features, and the experimental result is shown in FIG. 2. In four experimental groups of (D, E, K) - > B, (B, E, K) - > D, (B, D, K) - > E, (B, D, E) - > K, the evaluation criterion is the accuracy of the test set. On the basis of the first experimental group, the accuracy of the method provided by the invention is respectively improved by 1.97% and 4.62% compared with the accuracy of the other two groups; on the second experimental group, the accuracy of the method provided by the invention is respectively improved by 7.8% and 7.9% compared with the accuracy of the other two groups; in the third experimental group, the accuracy of the method provided by the invention is respectively improved by 0.45% and 3.5% compared with the accuracy of the other two groups; in the fourth experimental group, the accuracy of the method provided by the invention is respectively improved by 2.2% and 4.1% compared with the accuracy of the other two groups.

Compared with other models, the cross-domain emotion classification can be improved, the baseline model is a self-coding structure corresponding learning model (automatic encoder SCL, AE-SCL) proposed by Yftah Ziser and Roi Reichart in 2017 and a similarity regularization self-coding structure corresponding learning model (automatic encoder SCL with similarity regularization, AE-SCL-SR), and the evaluation standard is accuracy. As shown in FIG. 4, the accuracy of the method proposed by the present invention is higher than that of the AE-SCL model in four experiments. In the first three groups of experiments, the accuracy of the method provided by the invention is higher than that of the AE-SCL-SR model, only the last group is slightly lower than that of the AE-SCL-SR model, and the overall effect is better than that of the AE-SCL-SR model.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A multi-source domain integrated migration-based cross-domain emotion classification method is characterized by comprising the following steps:

the emotion classification method comprises the following specific steps:

step1, obtaining migration of a source domain Ds to a target domain D based on the FNSCL model_tTraining different logistic classifiers;

and Step4, carrying out sentiment classification on the product comments by using the trained classifier optimized to the optimal weight, and carrying out sentiment classification experimental verification on amazon product comments in a plurality of fields.

2. The multi-source domain-integrated migration-based cross-domain of claim 1The domain emotion classification method is characterized by comprising the following steps: in Step1, a model, FNSCL, which is a model for searching migration characteristics based on neural-structure-corresponding learning model NSCL is used to obtain the source domain D_sMigration to target Domain D_tIn turn, training different logistic classifiers, D_s1To D_tTraining to obtain a logistic classifier 1, D_s2To D_tTraining to obtain a logistic classifier 2, D_s3To D_tTraining results in a logistic classifier 3.

3. The multi-source domain integrated migration based cross-domain emotion classification method of claim 1, characterized in that:

the FNSCL model utilizes the expanded pivot characteristics to carry out characteristic migration, not only considers the mutual information MI value between the characteristics and the label in a training set when the pivot characteristics are screened, but also considers the word frequency of the characteristics, and uses a TF-IDF characteristic selection algorithm; the specific steps for screening pivot characteristics are as follows:

4. The multi-source domain integrated migration based cross-domain emotion classification method of claim 1, characterized in that: in Step2, the importance of each classifier in the integrated classifier to the predicted target domain data is considered to be different, and when the class prediction of the target domain samples by different classifiers reaches the maximum integration consistency, the weights of the logistic classifiers at the moment are considered to be optimal, so that different logistic classifiers are given different weights.

5. The multi-source domain integrated migration based cross-domain emotion classification method of claim 1, characterized in that: in Step2, the probability distribution of the input examples of the target domain is predicted by using the training obtained logics classifier, and the probability distribution vector obtained by the logics classifier 1 for the same input example is P¹The probability distribution vector obtained by the logistic classifier 2 is P²The probability distribution vector obtained by the logistic classifier 3 is P³M is the number of classifiers and is represented by a formula

Calculating to obtain average probability vector of classifier, wherein the consistency value is the inverse number of entropy calculated by average probability vector, such as formula