CN110674849A - Cross-domain emotion classification method based on multi-source domain integrated migration - Google Patents

Cross-domain emotion classification method based on multi-source domain integrated migration Download PDF

Info

Publication number
CN110674849A
CN110674849A CN201910823443.3A CN201910823443A CN110674849A CN 110674849 A CN110674849 A CN 110674849A CN 201910823443 A CN201910823443 A CN 201910823443A CN 110674849 A CN110674849 A CN 110674849A
Authority
CN
China
Prior art keywords
domain
classifier
migration
classifiers
logistic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910823443.3A
Other languages
Chinese (zh)
Other versions
CN110674849B (en
Inventor
相艳
陆婷
余正涛
郭军军
线岩团
许莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201910823443.3A priority Critical patent/CN110674849B/en
Publication of CN110674849A publication Critical patent/CN110674849A/en
Application granted granted Critical
Publication of CN110674849B publication Critical patent/CN110674849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a multi-source domain integrated migration-based cross-domain emotion classification method, and belongs to the technical field of computers and information. The invention comprises the following steps: firstly, a feature-extended neural structure is utilized to correspondingly learn the FNSCL model to obtain a source domain DsMigration to target Domain DtAnd training different logetics classifiers; then, calculating the weight of the logistic classifier according to an integration consistency principle; and finally, optimizing the weight of each classifier by using a simulated annealing algorithm. The method can improve the emotion classification effect of the cross-domain target domain, and the emotion classification experimental results of product reviews (electronic product reviews, book reviews, kitchen ware reviews and DVD reviews) in four different domains of Amazon show that the cross-domain emotion classification method based on multi-source domain integrated migration is effective.

Description

Cross-domain emotion classification method based on multi-source domain integrated migration
Technical Field
The invention relates to a multi-source domain integrated migration-based cross-domain emotion classification method, and belongs to the technical field of computers and information.
Background
At present, electronic commerce has had a great impact on our lives. The comment texts of historical consumers on the E-commerce products are subtly influencing the purchasing behaviors of other consumers. Therefore, the research on sentiment classification of product reviews of e-commerce platforms gradually becomes a new research hotspot. The emotion classifier trained on the label data of a certain field has high performance reduction when applied to other fields. To solve this problem, the prior art includes: aue et al propose training classifiers by mixing a small amount of labeled data and unlabeled data, with good effect, and the trained models can identify labeled and unlabeled mixed data in the target domain; for labeled samples, Yang et al propose a feature-based selection for transfer learning of sentence level classification; pan et al propose to utilize the spectrum feature alignment algorithm to look for the relevant characteristic between the domains on the basis of the characteristic level, set up the relation between two domains, thus realize the emotional classification of cross-domain; a feature-based improved algorithm-Xie et al propose a potential spatial feature alignment algorithm.
However, when the label corpus of the domain is very small, the emotion classifier trained by using the label corpus of one source domain is often poor in effect because the feature distributions between different source domains and target domains are different. And the classifier applied to the migration of the single source domain only utilizes the characteristics of the migration of the single source domain to the target domain, and does not fully utilize the migration characteristics of a plurality of source domains. Aiming at the problem, the invention provides a cross-domain emotion classification method based on multi-source domain integrated migration.
Disclosure of Invention
The invention provides a multi-source domain integrated migration-based cross-domain emotion classification method, which is used for solving the problem of poor emotion classifier effect of label corpus training in one source domain and improving the emotion classification effect of a target domain.
The technical scheme of the invention is as follows: the method for classifying the emotion of the multi-source domain integration migration based on the cross-domain emotion comprises the following specific steps:
step1, obtaining a source domain D by using a Feature-extended neural structural Learning (FNSCL) modelsMigration to target Domain DtTraining different logistic classifiers;
as a preferred embodiment of the present invention, in Step1, a FNSCL model, which is a model for finding migration characteristics based on a neural structure corresponding learning model NSCL, is used to obtain the source domain DsMigration to target Domain DtIn turn, training different logistic classifiers, Ds1To DtTraining to obtain a logistic classifier 1, Ds2To DtTraining to obtain a logistic classifier 2, Ds3To DtTraining results in a logistic classifier 3.
As a preferred scheme of the invention, the FNSCL model utilizes the expanded pivot characteristics to carry out characteristic migration, not only considers the mutual information MI value between the characteristics and the labels in the training set, but also considers the word frequency of the characteristics when screening the pivot characteristics, and uses a TF-IDF characteristic selection algorithm; the specific steps for screening pivot characteristics are as follows:
step1.1, firstly calculating the MI value between the feature and the label in the source domain through a formula 1,
Figure BDA0002188306210000021
where x represents a binary feature vector, p (x) represents the probability of the selection x appearing in the text feature, p (y) represents the probability of the sentence label y appearing, and p (x, y) represents the joint probability of x and y;
step1.2, then, selecting the characteristics that the word frequency of the uni-gram or the bi-gram exceeds the minimum word frequency threshold value in two domains, and taking the characteristics as second-level candidate pivot characteristics;
step1.3, finally, selecting the uni-gram or bi-gram with the TF-IDF value arranged in the front row from the second-level candidate pivot features through a TF-IDF feature selection algorithm as the final pivot feature between two domains;
the FNSCL model is based on the NSCL model, a TF-IDF characteristic selection method is added during pivot characteristic extraction, and the pivot characteristic extraction method is expanded.
Step2, giving weights to the logics classifiers according to an integrated consistency principle, performing probability distribution prediction on input examples of a target domain by using the logics classifiers, obtaining average probability vectors of the classifiers by the probability distribution vectors predicted by the examples of each classifier, and obtaining the weight of each classifier by calculating the opposite number of the entropy of the consistency value, namely the average probability vector;
in Step2, the importance of each classifier in the integrated classifier for predicting the target domain data is considered to be different, and when the class prediction of the target domain samples by different classifiers reaches the maximum integration consistency, the weights of the logistic classifiers at the moment are considered to be optimal, so that different logistic classifiers are given different weights.
In Step2, as a preferred embodiment of the present invention, the trained logetics classifier is used to predict the probability distribution of the input instance of the target domain, and the probability distribution vector obtained by the logetics classifier 1 for the same input instance is P1The probability distribution vector obtained by the logistic classifier 2 is P2The probability distribution vector obtained by the logistic classifier 3 is P3M is the number of classifiers and is represented by a formulaCalculating to obtain average probability vector of classifier, wherein the consistency value is the inverse number of entropy calculated by average probability vector, such as formula
Figure BDA0002188306210000032
Where E represents the information entropy of the mean probability distribution vector predicted by the classifier for the target domain instance.
Step3, optimizing by using a simulated annealing algorithm to obtain the optimal weight of each classifier;
step4, carrying out sentiment classification on the product comments by using a trained classifier optimized to the optimal weight, and carrying out sentiment classification experimental verification on the Amazon product comments in four fields; the amazon product reviews in the four fields comprise electronic product reviews, book reviews, kitchen ware reviews and DVD reviews.
The invention has the beneficial effects that:
the invention provides a method for combining a plurality of source domain classifiers together and utilizing the migration characteristics of the source domains, thereby finally improving the emotion classification effect of the cross-domain target domain. Theories and technologies are verified in four different fields (the field of electronic products, the field of books, the field of kitchen ware and the field of DVDs) of Amazon products, and the effectiveness of the method is proved by experimental results.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph showing the results of the experiment in the present invention;
FIG. 3 is a schematic representation of a NSCL model;
FIG. 4 is a graph of experimental results comparing the present invention with other models.
Detailed Description
Example 1: as shown in fig. 1-4, based on the cross-domain emotion classification method of multi-source domain integrated migration,
the emotion classification method comprises the following specific steps:
step1, obtaining a source domain D by using a Feature-extended neural structural Learning (FNSCL) modelsMigration to target Domain DtIn turn, training different logistic classifiers, Ds1To DtTraining to obtain a logistic classifier 1, Ds2To DtTraining to obtain a logistic classifier 2, Ds3To DtTraining to obtain a logistic classifier 3;
as a preferred scheme of the invention, the FNSCL model utilizes the expanded pivot characteristics to carry out characteristic migration, not only considers the mutual information MI value between the characteristics and the labels in the training set, but also considers the word frequency of the characteristics when screening the pivot characteristics, and uses a TF-IDF characteristic selection algorithm; the specific steps for screening pivot characteristics are as follows:
step1.1, firstly calculating the MI value between the feature and the label in the source domain through a formula 1,
where x represents a binary feature vector, p (x) represents the probability of the selection x appearing in the text feature, p (y) represents the probability of the sentence label y appearing, and p (x, y) represents the joint probability of x and y;
step1.2, then, selecting the characteristics that the word frequency of the uni-gram or the bi-gram exceeds the minimum word frequency threshold value in two domains, and taking the characteristics as second-level candidate pivot characteristics;
step1.3, finally, selecting the uni-gram or bi-gram with the TF-IDF value arranged in the front row from the second-level candidate pivot features through a TF-IDF feature selection algorithm as the final pivot feature between two domains;
the FNSCL model is based on the NSCL model, a TF-IDF characteristic selection method is added during pivot characteristic extraction, and the pivot characteristic extraction method is expanded, wherein the NSCL model is structurally shown in figure 3.
Step2, giving weights to the logics classifiers according to an integrated consistency principle, performing probability distribution prediction on input examples of a target domain by using the logics classifiers, obtaining average probability vectors of the classifiers by the probability distribution vectors predicted by the examples of each classifier, and obtaining the weight of each classifier by calculating the opposite number of the entropy of the consistency value, namely the average probability vector;
specifically, the method comprises the following steps: considering that the importance of each classifier in the integrated classifier to the prediction of the target domain data is different, different weights are given to different logistic classifiers, and when the different classifiers predict the class of the target domain sample to the maximum integration consistency, the weights of the logistic classifiers at the moment are considered to be optimal, so that different logistic classifiers are given different weights.
The principle of integration consistency relates to a logistic regression model and the concept of entropy, and the basic idea is as follows: firstly, searching a proper classification function, and predicting a judgment result of an input sample; then, constructing a loss function to represent the deviation of the predicted output result and the actual category of the training sample; and finally, training the model, and storing the optimal model parameters when the model loss is minimum.
The trained logics classifier in Step1 is used to perform probability distribution prediction on the input instance of the target domain, and the logics model predicts the probability distribution p (Y | X) of the input sample X belonging to the category Y. The samples input to the logistic classifier are characterized by a probability distribution vector p, p ∈ R, p (i) represents the probability that the input sample belongs to class i, and
Figure BDA0002188306210000042
where d represents the number of categories. Given m logistic classifiers l 1,2, … m, the prediction probability distribution vector for each classifier on the same input sample is plThen the mean probability distribution vector of the classifier is calculated by equation (2);
Figure BDA0002188306210000051
the definition of shannon entropy is given in the probability distribution vector, and the larger the probability that the input sample belongs to the category i, the larger the obtained information entropy. The information entropy of the average probability distribution vector predicted by the classifier for the target domain instance is calculated by equation (3).
Figure BDA0002188306210000052
An integrated classifier H is defined which is,
Figure BDA0002188306210000053
input instance x, and each of the logistic classifiers predicts the probability distribution vector for each instance, denoted p1,p2,…pmCalculating the average probability distribution vector of each example according to the formula (2)For emotion twoClassification problem, Table 1 records the results of the logistic classifier 1 (h)1) Local classifier 2 (h)2) And a logistic classifier 3 (h)3) Prediction example x1,x2Is given as a probability distribution vector p1,p2And the mean probability distribution vector corresponding to the instance
Figure BDA0002188306210000055
This is taken as an illustration to show that a consistency metric value of the classifier can be calculated through entropy. The dimension of the probability distribution vector for one instance is 2 dimensions, with the 1 st dimension representing the probability that each classifier prediction instance belongs to the first class and the 2 nd dimension representing the probability that each classifier prediction instance belongs to the second class. Similarly, the mean probability distribution vector
Figure BDA0002188306210000056
Also a two-dimensional vector, the 1 st dimension represents the average probability that three classifier prediction instances belong to the first class, and the 2 nd dimension represents the average probability that three classifier prediction instances belong to the second class. Detailed analysis, for the first example x1All classifiers are completely identical, i.e. they have a 100% probability of belonging to class 2. In addition, when the consensus degree of the three classifiers on the example prediction result reaches the maximum value, the entropy E (0,1) of the average distribution vector reaches the minimum value. In addition, for the second example x2The first two classifiers predict that they belong to class 1 and class 2, respectively, and the latter classifier predicts that the probability of belonging to class 1 is 50% and the probability of belonging to class 2 is 50%, at which time the consensus degree of the three classifiers is the lowest, whereas the entropy E (0.5 ) of the average distribution vector reaches the maximum.
TABLE 1 entropy and consistency metrics of probability distribution vectors
Figure BDA0002188306210000061
It follows that the inverse of the entropy in the mean probability distribution vector can be defined as a measure of consistency for different predictors, with the corresponding formula:
Figure BDA0002188306210000062
wherein E is the information entropy in formula (3), p1,p2,…pmThe probability distribution vectors predicted for the m classifier pairs of instances,for the example average probability distribution vector calculated by equation (2), the inverse of the sum of the information entropies is the consistency measure ce
Step3, optimizing classifier weight: optimizing by using a simulated annealing algorithm to obtain the optimal weight of each classifier;
combining classifiers for emotion classification does not simply average the prediction results of each classifier over the instances as the classifier's weights, but rather finds the optimal combination of weights. In the process of learning the weight, the prediction consensus degree of each classifier on the same example needs to be maximized, so that the integration consistency measurement function is used as an objective function, and the optimal weight w of each classifier is obtained by optimizing by using a simulated annealing algorithm1、w2And w3. In order to obtain the optimal combination of the weights, the problem of parameter estimation in the integrated classifier is converted into the problem of solving the optimal solution of the objective function. The invention takes a consistency measurement function as an objective function f (w)i) The weight w of the ith classifieriAs an argument, the simulated annealing algorithm (SA) was used to find the optimal parameter wiLet f (w)i) The value of (c) reaches a maximum. The reason for using the simulated annealing algorithm is that the simulated annealing algorithm receives a solution which is worse than the current result with a certain probability, local optimization is easy to jump out to achieve global optimization, and the maximum value of integration consistency based on the global is obtained, and the flow of the simulated annealing algorithm is as follows;
beginning:
1. given an initial value t0End value t1Given an initial feasible solution wiObjective function f (w)i) The number of iterations L for each value of T is set.
2. The iteration counter L is 1,2, … L, and steps 3 to 6 are performed.
3. Generating a new solution wi_newChanging the value w of the argument constantlyi_new=wi+ Δ w, Δ w is a random variable generated between [0, 1.
4. Calculating Δ f ═ f (w)i_new)-f(wi) Optimization goal f (w)i)。
5. If delta f is more than or equal to 0, take wi_newIs the current solution, otherwise with a certain probability
Figure BDA0002188306210000071
Accepting the new solution as the current solution.
6. And judging whether the iteration times L under each T value is reached or not, and if the iteration times L reach the termination condition, exiting.
7. Judging whether the L value reaches the termination condition, and gradually decreasing the L value by setting the descending amplitude of the L as alpha, wherein T is alpha T, and T is more than T0And turning to the second step. Otherwise, obtaining the current optimal solution maxf (w)i)。
And (6) ending.
Step4, carrying out sentiment classification on the product comments by using a trained classifier optimized to the optimal weight;
aiming at the experimental verification of the method disclosed by the invention on the product comments (electronic product comments, book comments, kitchen ware comments and DVD comments) in four different fields of Amazon, the method comprises the following specific steps:
step a 1: e is recorded as the domain comment of the electronic product, B is recorded as the domain comment of the book, and D is recorded as the domain comment of K, DVD is recorded as the domain comment of the kitchen ware. In the four fields, 3 source domains and 1 target domain are set, the characteristics of the three source domains are combined to be used as a training set of the source domains, and 3 local logistic classifiers are trained. The experimental settings were: the left side of (D, E, K) - > B, (B, E, K) - > D, (B, D, K) - > E, (B, D, E) - > K, -, is the source domain and the right side of- > is the target domain. Here, the classifier using the multi-source domain feature migration training improves the accuracy of the classifier by 3.3% on average in the classification process compared with the classifier obtained by the single-source domain training, as shown in table 2.
TABLE 2 comparison of results for multi-source weighted ensemble classifier and single-source classifier
Figure BDA0002188306210000081
Step a 2: setting the number of pivot features from a single source domain to a target domain as 100, setting the dimension of a pivot feature word vector as 500 dimensions, and weighting the classifier according to the principle of maximum consistency of integrated classification.
Step a 3: the classifier weights are optimized using a simulated annealing algorithm, and as can be seen from table 2, the parameters used in the optimization model need to be artificially set to initial values. Setting an initial value t010,000, end value t1The initial solution (i.e., the initial weights of the logistic classifier 1 and logistic classifier 2) is w, 0.1, respectively1=0.1,w2When the weight of the logistic classifier 3 is 0.5, the weight is w3=1-w1-w2The decrease α of the T value was 0.95. Before optimization, the weight of each classifier is the same, and after optimization, the weight is reasonably distributed.
Step a 4: finally, the method provided by the invention is compared with an emotion classifier based on original features and an emotion classifier based on mapping features, and the experimental result is shown in FIG. 2. In four experimental groups of (D, E, K) - > B, (B, E, K) - > D, (B, D, K) - > E, (B, D, E) - > K, the evaluation criterion is the accuracy of the test set. On the basis of the first experimental group, the accuracy of the method provided by the invention is respectively improved by 1.97% and 4.62% compared with the accuracy of the other two groups; on the second experimental group, the accuracy of the method provided by the invention is respectively improved by 7.8% and 7.9% compared with the accuracy of the other two groups; in the third experimental group, the accuracy of the method provided by the invention is respectively improved by 0.45% and 3.5% compared with the accuracy of the other two groups; in the fourth experimental group, the accuracy of the method provided by the invention is respectively improved by 2.2% and 4.1% compared with the accuracy of the other two groups.
Compared with other models, the cross-domain emotion classification can be improved, the baseline model is a self-coding structure corresponding learning model (automatic encoder SCL, AE-SCL) proposed by Yftah Ziser and Roi Reichart in 2017 and a similarity regularization self-coding structure corresponding learning model (automatic encoder SCL with similarity regularization, AE-SCL-SR), and the evaluation standard is accuracy. As shown in FIG. 4, the accuracy of the method proposed by the present invention is higher than that of the AE-SCL model in four experiments. In the first three groups of experiments, the accuracy of the method provided by the invention is higher than that of the AE-SCL-SR model, only the last group is slightly lower than that of the AE-SCL-SR model, and the overall effect is better than that of the AE-SCL-SR model.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (5)

1. A multi-source domain integrated migration-based cross-domain emotion classification method is characterized by comprising the following steps:
the emotion classification method comprises the following specific steps:
step1, obtaining migration of a source domain Ds to a target domain D based on the FNSCL modeltTraining different logistic classifiers;
step2, giving weights to the logics classifiers according to an integrated consistency principle, performing probability distribution prediction on input examples of a target domain by using the logics classifiers, obtaining average probability vectors of the classifiers by the probability distribution vectors predicted by the examples of each classifier, and obtaining the weight of each classifier by calculating the opposite number of the entropy of the consistency value, namely the average probability vector;
step3, optimizing by using a simulated annealing algorithm to obtain the optimal weight of each classifier;
and Step4, carrying out sentiment classification on the product comments by using the trained classifier optimized to the optimal weight, and carrying out sentiment classification experimental verification on amazon product comments in a plurality of fields.
2. The multi-source domain-integrated migration-based cross-domain of claim 1The domain emotion classification method is characterized by comprising the following steps: in Step1, a model, FNSCL, which is a model for searching migration characteristics based on neural-structure-corresponding learning model NSCL is used to obtain the source domain DsMigration to target Domain DtIn turn, training different logistic classifiers, Ds1To DtTraining to obtain a logistic classifier 1, Ds2To DtTraining to obtain a logistic classifier 2, Ds3To DtTraining results in a logistic classifier 3.
3. The multi-source domain integrated migration based cross-domain emotion classification method of claim 1, characterized in that:
the FNSCL model utilizes the expanded pivot characteristics to carry out characteristic migration, not only considers the mutual information MI value between the characteristics and the label in a training set when the pivot characteristics are screened, but also considers the word frequency of the characteristics, and uses a TF-IDF characteristic selection algorithm; the specific steps for screening pivot characteristics are as follows:
step1.1, firstly calculating the MI value between the feature and the label in the source domain through a formula 1,
Figure FDA0002188306200000011
where x represents a binary feature vector, p (x) represents the probability of the selection x appearing in the text feature, p (y) represents the probability of the sentence label y appearing, and p (x, y) represents the joint probability of x and y;
step1.2, then, selecting the characteristics that the word frequency of the uni-gram or the bi-gram exceeds the minimum word frequency threshold value in two domains, and taking the characteristics as second-level candidate pivot characteristics;
step1.3, finally, selecting the uni-gram or bi-gram with the TF-IDF value arranged in the front row from the second-level candidate pivot features through a TF-IDF feature selection algorithm as the final pivot feature between two domains;
the FNSCL model is based on the NSCL model, a TF-IDF characteristic selection method is added during pivot characteristic extraction, and the pivot characteristic extraction method is expanded.
4. The multi-source domain integrated migration based cross-domain emotion classification method of claim 1, characterized in that: in Step2, the importance of each classifier in the integrated classifier to the predicted target domain data is considered to be different, and when the class prediction of the target domain samples by different classifiers reaches the maximum integration consistency, the weights of the logistic classifiers at the moment are considered to be optimal, so that different logistic classifiers are given different weights.
5. The multi-source domain integrated migration based cross-domain emotion classification method of claim 1, characterized in that: in Step2, the probability distribution of the input examples of the target domain is predicted by using the training obtained logics classifier, and the probability distribution vector obtained by the logics classifier 1 for the same input example is P1The probability distribution vector obtained by the logistic classifier 2 is P2The probability distribution vector obtained by the logistic classifier 3 is P3M is the number of classifiers and is represented by a formula
Figure FDA0002188306200000021
Calculating to obtain average probability vector of classifier, wherein the consistency value is the inverse number of entropy calculated by average probability vector, such as formula
Figure FDA0002188306200000022
Where E represents the information entropy of the mean probability distribution vector predicted by the classifier for the target domain instance.
CN201910823443.3A 2019-09-02 2019-09-02 Cross-domain emotion classification method based on multi-source domain integrated migration Active CN110674849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910823443.3A CN110674849B (en) 2019-09-02 2019-09-02 Cross-domain emotion classification method based on multi-source domain integrated migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910823443.3A CN110674849B (en) 2019-09-02 2019-09-02 Cross-domain emotion classification method based on multi-source domain integrated migration

Publications (2)

Publication Number Publication Date
CN110674849A true CN110674849A (en) 2020-01-10
CN110674849B CN110674849B (en) 2021-06-18

Family

ID=69075911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910823443.3A Active CN110674849B (en) 2019-09-02 2019-09-02 Cross-domain emotion classification method based on multi-source domain integrated migration

Country Status (1)

Country Link
CN (1) CN110674849B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428039A (en) * 2020-03-31 2020-07-17 中国科学技术大学 Cross-domain emotion classification method and system of aspect level
CN111708986A (en) * 2020-05-29 2020-09-25 四川旷谷信息工程有限公司 Pipe gallery state parameter measuring method
CN112101085A (en) * 2020-07-22 2020-12-18 西安交通大学 Adaptive intelligent fault diagnosis method based on importance weighted domain impedance
CN112182209A (en) * 2020-09-24 2021-01-05 东北大学 GCN-based cross-domain emotion analysis method under lifelong learning framework
CN114020879A (en) * 2022-01-04 2022-02-08 深圳佑驾创新科技有限公司 Multi-source cross-domain text emotion classification network training method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761311A (en) * 2014-01-23 2014-04-30 中国矿业大学 Sentiment classification method based on multi-source field instance migration
CN107103364A (en) * 2017-03-28 2017-08-29 上海大学 A kind of task based on many source domain splits transfer learning Forecasting Methodology
CN107766873A (en) * 2017-09-06 2018-03-06 天津大学 The sample classification method of multi-tag zero based on sequence study
CN108256561A (en) * 2017-12-29 2018-07-06 中山大学 A kind of multi-source domain adaptive migration method and system based on confrontation study
US20180218284A1 (en) * 2017-01-31 2018-08-02 Xerox Corporation Method and system for learning transferable feature representations from a source domain for a target domain
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN108681585A (en) * 2018-05-14 2018-10-19 浙江工业大学 A kind of construction method of the multi-source transfer learning label popularity prediction model based on NetSim-TL
CN109389037A (en) * 2018-08-30 2019-02-26 中国地质大学(武汉) A kind of sensibility classification method based on depth forest and transfer learning
CN109492229A (en) * 2018-11-23 2019-03-19 中国科学技术大学 A kind of cross-cutting sensibility classification method and relevant apparatus
CN109885833A (en) * 2019-02-18 2019-06-14 山东科技大学 A kind of sexy polarity detection method based on the joint insertion of multiple domain data set
CN110032646A (en) * 2019-05-08 2019-07-19 山西财经大学 The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761311A (en) * 2014-01-23 2014-04-30 中国矿业大学 Sentiment classification method based on multi-source field instance migration
US20180218284A1 (en) * 2017-01-31 2018-08-02 Xerox Corporation Method and system for learning transferable feature representations from a source domain for a target domain
CN107103364A (en) * 2017-03-28 2017-08-29 上海大学 A kind of task based on many source domain splits transfer learning Forecasting Methodology
CN107766873A (en) * 2017-09-06 2018-03-06 天津大学 The sample classification method of multi-tag zero based on sequence study
CN108256561A (en) * 2017-12-29 2018-07-06 中山大学 A kind of multi-source domain adaptive migration method and system based on confrontation study
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN108681585A (en) * 2018-05-14 2018-10-19 浙江工业大学 A kind of construction method of the multi-source transfer learning label popularity prediction model based on NetSim-TL
CN109389037A (en) * 2018-08-30 2019-02-26 中国地质大学(武汉) A kind of sensibility classification method based on depth forest and transfer learning
CN109492229A (en) * 2018-11-23 2019-03-19 中国科学技术大学 A kind of cross-cutting sensibility classification method and relevant apparatus
CN109885833A (en) * 2019-02-18 2019-06-14 山东科技大学 A kind of sexy polarity detection method based on the joint insertion of multiple domain data set
CN110032646A (en) * 2019-05-08 2019-07-19 山西财经大学 The cross-domain texts sensibility classification method of combination learning is adapted to based on multi-source field

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
PING LUO等: "Transfer Learning From Multiple Source Domains via Consensus Regularization", 《ACM 17TH CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》 *
RITA CHATTOPADHYAY等: "Multisource Domain Adaptation and Its Application to Early Detection of Fatigue", 《ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA》 *
TAREQ AL-MOSLMI等: "Approaches to Cross-Domain Sentiment Analysis: A Systematic Literature Review", 《IEEE ACCESS》 *
XILUN CHEN等: ". Multinomial adversarial networks for multi-domain text classification", 《PROCEEDINGS OF THE 2018 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES》 *
赵传君等: "基于集成深度迁移学习的多源跨领域情感分类", 《山西大学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428039A (en) * 2020-03-31 2020-07-17 中国科学技术大学 Cross-domain emotion classification method and system of aspect level
CN111428039B (en) * 2020-03-31 2023-06-20 中国科学技术大学 Cross-domain emotion classification method and system for aspect level
CN111708986A (en) * 2020-05-29 2020-09-25 四川旷谷信息工程有限公司 Pipe gallery state parameter measuring method
CN112101085A (en) * 2020-07-22 2020-12-18 西安交通大学 Adaptive intelligent fault diagnosis method based on importance weighted domain impedance
CN112182209A (en) * 2020-09-24 2021-01-05 东北大学 GCN-based cross-domain emotion analysis method under lifelong learning framework
CN114020879A (en) * 2022-01-04 2022-02-08 深圳佑驾创新科技有限公司 Multi-source cross-domain text emotion classification network training method

Also Published As

Publication number Publication date
CN110674849B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN110674849B (en) Cross-domain emotion classification method based on multi-source domain integrated migration
Wu et al. Iou-balanced loss functions for single-stage object detection
Padurariu et al. Dealing with data imbalance in text classification
Rodrigues et al. Gaussian process classification and active learning with multiple annotators
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
Peddinti et al. Domain adaptation in sentiment analysis of twitter
Wu et al. Collaborative multi-domain sentiment classification
Jiang et al. A bi-objective knowledge transfer framework for evolutionary many-task optimization
Duma et al. Sparseness reduction in collaborative filtering using a nearest neighbour artificial immune system with genetic algorithms
Li et al. CGAN-MBL for reliability assessment with imbalanced transmission gear data
Bui et al. Neural graph machines: Learning neural networks using graphs
CN111738532A (en) Method and system for acquiring influence degree of event on object
CN114863175A (en) Unsupervised multi-source partial domain adaptive image classification method
CN104572623A (en) Efficient data summary and analysis method of online LDA model
Liu et al. A weight-incorporated similarity-based clustering ensemble method
Gill et al. Dynamically regulated initialization for S-system modelling of genetic networks
Bahrami et al. Automatic image annotation using an evolutionary algorithm (IAGA)
Fang et al. Active multi-task learning via bandits
CN114298160A (en) Twin knowledge distillation and self-supervised learning based small sample classification method
Neukart et al. A Machine Learning Approach for Abstraction Based on the Idea of Deep Belief Artificial Neural Networks
Shaikh et al. Unerstanding Machine Learning Approach on Various Algorithms: A Case Study Implementation
Kyriakides et al. Comparison of neural network optimizers for relative ranking retention between neural architectures
Zelený et al. Multi-Branch Multi Layer Perceptron: A Solution for Precise Regression using Machine Learning
Fu et al. A hybrid model for credit evaluation problem
Moorthy et al. Handling the Class Imbalance Problem with an Improved Sine Cosine Algorithm for Optimal Instance Selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant