CN113326695A

CN113326695A - Emotion polarity analysis method based on transfer learning

Info

Publication number: CN113326695A
Application number: CN202110455888.8A
Authority: CN
Inventors: 杨鹏; 任炳先; 周华健; 于晓潭
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-08-31
Anticipated expiration: 2041-04-26
Also published as: CN113326695B

Abstract

The invention discloses an emotion polarity analysis method based on transfer learning. And then, constructing an emotion polarity analysis model, and transferring the relevant parameters obtained by training the text sequencing model to the corresponding positions in the emotion polarity analysis model. And finally, continuing training the migrated model on the emotion polarity analysis data set. The invention introduces domain prior knowledge into the emotion polarity analysis model by using transfer learning, can improve the domain adaptability of the model, is beneficial to acquiring higher-quality sentence characteristics, and further improves the classification accuracy of the model.

Description

Emotion polarity analysis method based on transfer learning

Technical Field

The invention relates to an emotion polarity analysis method based on transfer learning, and belongs to the field of internet and natural language processing.

Background

With the continuous development and progress of mobile internet technology, the general public and the mass of news media are more and more inclined to release the view and attitude of real events in social networks. The emotion polarity analysis is a method for automatically acquiring text emotion tendencies or emotion categories by using a correlation theory of natural language processing, and has great utilization value.

At present, the research on emotional polarity analysis at home and abroad has achieved abundant achievements, and has reference significance for the research work of the invention. The existing emotion polarity analysis method is mainly divided into 3 types based on emotion dictionaries, machine learning and deep learning. The method based on the emotion dictionary introduces expert knowledge into the calculation process of the emotion value, is suitable for scenes lacking large-scale prediction, but has the defects of continuous expansion and poor portability. Compared with the method based on the emotion dictionary, the machine learning method has the advantages of simple modeling and better expansibility and portability. However, the machine learning method requires a high-quality labeled data set, and a certain labeling cost is consumed. The deep learning based method utilizes a neural network model to train a classifier. Compared with the method based on the emotion dictionary and machine learning, the deep learning model has stronger expression capability and obtains better classification indexes. In recent years, with the development of pre-training language models, a deep learning-based mode has achieved better effects. However, considering a specific application scenario, the existing deep learning model still has shortcomings. First, the existing language model lacks a sentence-level pre-training task, so that the logic perception and semantic expression capability of the model still have room for improvement. In addition, for emotion analysis of social comments, the content of the social comments often has the characteristics of being short and random, so that the sentence features are not dense, the noise is high, and the classification robustness of the emotion polarity analysis model is insufficient.

The invention provides an emotion polarity analysis method based on transfer learning, aiming at the problems of insufficient logic perception and semantic expression capability and insufficient classification robustness of a current deep learning model. On one hand, the model uses a transfer learning method, firstly, the logic perception and semantic expression capability of the model is trained in a text sequencing task, and then relevant model parameters are transferred to an emotion polarity analysis model. Through transfer learning, the model can obtain domain prior knowledge, obtain high-quality sentence characteristics and improve the classification accuracy of the model. On the other hand, after the emotion polarity analysis model extracts sentence features, feature noise is further reduced by combining an attention mechanism, and the classification robustness of the model can be improved.

Disclosure of Invention

Aiming at the problems and the defects in the existing emotion polarity analysis technology, the invention provides an emotion polarity analysis method based on transfer learning. Meanwhile, noise interference is reduced for the emotion polarity analysis model based on the attention mechanism, and the classification robustness of the model can be improved.

In order to achieve the purpose, the technical scheme of the invention is as follows: a sentiment polarity analysis method based on transfer learning includes that firstly, positions of characters or words in comment texts are disturbed according to a certain proportion. And then, constructing a text sequencing model, taking the disturbed sentences as input, and taking the sentences with normal word sequence as output to train the model. And then, constructing an emotion polarity analysis model, and transferring the relevant parameters in the text sequencing model to the corresponding positions in the emotion polarity analysis model. And finally, continuing training the emotion polarity analysis model after the migration. The method introduces priori knowledge into the emotion polarity analysis model through transfer learning, can improve the field adaptability of the model, is beneficial to acquiring sentence features with higher quality, and further improves the classification accuracy of the model.

An emotion polarity analysis method based on transfer learning mainly comprises 4 steps, and specifically comprises the following steps:

step 1, a sentence pair data set is constructed. And (3) disturbing the word position of each sentence in the emotion polarity analysis data set according to a set proportion (the proportion is determined according to the comparison experiment result), simultaneously reserving the sentences before disturbance, and forming a piece of training data in the new data set by each group of disturbed sentences and sentences with normal word order.

And 2, training a text sequencing model. Constructing a text sequencing model based on a seq2seq mode, firstly, inputting a disturbed sentence as a model, and extracting sentence characteristics by using an encoder; then decoding word by word, and predicting and outputting words according to the decoding characteristics of the current time step; and finally, comparing the output of the model with the characters at the corresponding positions of the normal word order, and training the parameters of the model based on the cross entropy loss function.

And 3, parameter migration. Firstly, constructing an emotion polarity analysis model, using a coding structure the same as that of the text sequencing model, and then transferring coding parameters and word vector parameters of the text sequencing model into the emotion polarity analysis model.

And 4, training an emotion polarity analysis model. Analyzing a data set based on emotion polarity, firstly inputting a comment text, extracting sentence features by using an encoder, and then further extracting local features, overall features and final features after noise reduction of the sentence by using a convolutional neural network, a cyclic neural network and an attention mechanism; and finally classifying the features.

Compared with the prior art, the invention has the following technical effects:

1. the method is based on transfer learning, firstly a text sequencing model is constructed, and then parameters learned by the text sequencing model are transferred to an emotion polarity analysis model. The method effectively makes up the defect that the current language model lacks of sentence-level pre-training tasks, can improve the logic perception and semantic expression capability of the model, and obtains higher-quality sentence characteristic representation. In the embodiment, the classification accuracy index of the model after the transfer learning is improved by 3.7%, and the effectiveness of the scheme is verified.

2. The emotion polarity analysis model firstly uses CNN (convolutional neural network) to extract partial characteristics of a sentence, then uses BiGRU (bidirectional gated cyclic unit) to extract overall characteristics of the sentence, and finally uses an attention mechanism to reduce characteristic noise, so that the interference of social text randomness on model classification can be reduced. Ablation experiments on the examples show that: compared with the direct classification of the overall features, the increased attention mechanism achieves higher classification robustness.

Drawings

Fig. 1 is an overall frame diagram of an embodiment of the present invention.

Fig. 2 is a frame diagram of a text ordering model according to an embodiment of the present invention.

FIG. 3 is a framework diagram of an emotion polarity analysis model according to an embodiment of the present invention.

Detailed Description

The invention will be further illustrated with reference to specific examples in order to provide a better understanding and appreciation of the invention.

Example 1: an emotion polarity analysis method based on transfer learning is specifically implemented, and the overall working flow of the emotion polarity analysis method is shown in fig. 1. According to the method, a text sequencing model is firstly constructed as shown in FIG. 2, the model makes up for the defect that the current language model lacks a sentence-level pre-training task, and the logic perception and semantic expression capability of the model can be improved. Then, constructing an emotion polarity analysis model as shown in fig. 3, and migrating relevant parameters obtained by training the text ordering model to corresponding positions in the emotion polarity analysis model. And finally, continuing training the migrated model on the emotion polarity analysis data set. The method comprises the following specific implementation steps:

step 1, a sentence pair data set is constructed. Sentences in the emotion polarity analysis data set are scrambled according to a set proportion (set to 25% in this embodiment after a comparison test), sentences before scrambling are retained, and a sentence pair consisting of the scrambled sentences and sentences in a normal word order is used as a piece of training data in the new data set. For example, sentences in normal word order are: all medical personnel are hard. The scrambled sentences are: the medical staff is bitter. Where the context of both "pungent" and "bitter" changes.

And 2, training a text sequencing model. Constructing a text sequencing model in a seq2seq mode, firstly inputting a disturbed sentence as a model, and extracting sentence characteristics by using an encoder; then, decoding word by word, and predicting characters to be output according to the decoding characteristics of the current time step; finally, comparing the output of the model with the normal language sequence, and training the parameters of the model by using a logarithmic loss function. This step can be divided into 3 sub-steps, and the specific implementation is as follows.

Substep 2-1, sentence coding. In the embodiment, text coding features are extracted by Bert, and for convenience of description, meanings of model-related symbols are summarized in table 1. Firstly, inputting a text sequence X (token) with the length of m₁,token₂,…,token_m) Index id in vocab according to token_iExtracting token's coded Emb_iAnd calculate e_iA text matrix consisting of the entire sentence denoted by s, e_iThe specific calculation process of s is shown in formulas (1) and (2), wherein position represents position coding, and segment represents segment coding.

TABLE 1 model-related symbolic meanings

e_i＝Bert(Emb_i+segment_i+postion_i) (1)

s＝(e₁,e₂,…,e_m) (2)

And then feeding S into a coding model containing 12 layers of transform sequences to extract a final coding output S. In equation (3), LN is the slice normalization operation, and MSA is the multi-headed self-attention operation. Taking the z-th layer as an example, the coded output s of the previous layer is first processed by MSA^z-1Then obtained through residual error and LN operation

Finally, FFN pairs are used as shown in formula (3)

Processing and combining residual and LN acquisition layer coded output s^zAs shown in equation (4). The calculation of FFN is shown in equation (5), where W₁、b₁、W₂、b₂Are all model learnable parameters.

FFN(x)＝max(0,xW₁+b₁)W₂+b₂ (5)

Substep 2-2, sentence decoding. GRUs are used as the basic unit of the decoding network. The decoding process is shown in equations (6) and (7). In the formula (6), d_t-1Represents input, h_t-1And representing the hidden layer output of the previous step. d₀Corresponding to CLS (a special character in vocab, indicating the beginning of a sentence) as input. Formula (7) represents the implicit layer initialization of the decoding process, i.e. the coded output S is averaged and pooled first, and then the linear layer is used for processing to represent the initial implicit layer input h₀Wherein W is_s、b_sAre all model learnable parameters.

h_t＝GRU(d_t-1,h_t-1) (6)

h₀＝W_savg(S)+b_s (7)

Substep 2-3, outputting the prediction, outputting h with the decoding hidden layer of each step_tAs query, using the coded output S as key and value, calculating context vector context by dot-product entry as shown in formula (8), and then splicing context and h_tAnd (3) obtaining the predicted probability distribution p of the model after the final characteristic of the current decoding step and the splicing characteristic are processed by linear transformation and a softmax function, wherein the process is shown as formula (9), wherein W is_p、b_pAre all model learnable parameters. Finally, the model log loss is calculated based on the predicted value p and the actual value y as shown in equation (10), where m represents the dictionary vocab size.

context＝Attention(h_t,S,S) (8)

p＝softmax(W_p[context,h_t]+b_p) (9)

And 3, parameter migration. And after the text sequencing model is trained, constructing an emotion polarity analysis model by using the same coding structure as the text sequencing model. And then, transferring the coding model parameters and the word vector parameters of the text sequencing model into an emotion polarity analysis model.

And 4, training an emotion polarity analysis model. Based on the emotion polarity analysis data set, firstly, inputting a text into a model, extracting sentence features by using an encoder, and then extracting local features, overall features and final features after noise reduction of the sentence by using a convolutional neural network, a cyclic neural network and an attention mechanism; and finally classifying the features. This step can be divided into 5 sub-steps, and the specific implementation is as follows.

Substep 4-1, sentence coding. Since the coding model is completely consistent with the text ordering model, the coding process of the emotion polarity model is the same as the substep 2-1. The coded output is denoted by S.

And a substep 4-2 of extracting local features. Extracting a local feature representation T of the coded output S using a one-dimensional convolutional network, as shown in equation (11); specific calculation result T of each step_iAs shown in equation (12), where W and b are model learnable parameters; x represents the convolution calculation; i represents the ith step; k denotes the convolution kernel width (this embodiment uses a convolution kernel of width 2).

T＝Conv(S) (11)

T_i＝tanh(W×S_i:i+k-1+b) (12)

And a substep 4-3 of extracting the integral features. The extraction of sentence integral characteristics C by Bi-GRU is shown in the formulas (13) and (14). The bidirectional GRU includes a forward GRU with read T generation from left to right

One reverse GRU generation

nt represents the length of the convolution output of the previous step.

And

respectively representing hidden layer output in two directions when the j step of the GRU model is carried out. Will be provided with

And

obtaining the characteristic h of each step after splicing_jAs shown in equation (15).

Substep 4-4, attention noise reduction. The actual comment text may be more noisy and it is generally undesirable to make a final prediction of all the parts of the sentence that are input without difference. In combination with the attention mechanism, further optimization of the text feature C can be made. H is to be_jObtaining v by perceptron (MLP)_jAs in equation (16), where W_aAnd b_aParameters may be learned for the model. By v_jThe importance of the similarity measure to context C is shown in equation (17). Calculating final semantic feature C of sentence by means of weighted summation_aAs shown in equation (18).

v_j＝tanh(W_ah_j+b_a) (16)

Substep 4-5, emotion polarity classification. First, sentence characteristic C_aAfter linear change and softmax function processing, the classification probability distribution p of the model is obtained, and the process is shown as formula (19), wherein W_pAnd b_pLearning parameters for the model; the model log-loss is then calculated based on p and the actual label y as shown in equation (20), where n represents the number of classes.

p＝softmax(W_pC_a+b_p) (19)

It should be understood that these examples are given solely for the purpose of illustration and are not to be construed as limitations on the scope of the invention, as various equivalent modifications of the invention will become apparent to those skilled in the art upon reading the present disclosure and may be made within the scope of the appended claims.

Claims

1. An emotion polarity analysis method based on transfer learning is characterized by comprising the following steps:

step 1, constructing a sentence pair data set,

step 2, training a text sequencing model,

and step 3, transferring the parameters,

and 4, training an emotion polarity analysis model.

2. The emotion polarity analysis method based on migratory learning of claim 1,

step 1, a sentence pair data set is constructed, specifically, the word position of each sentence in the emotion polarity analysis data set is disturbed according to a set proportion, the sentences before disturbance are retained, and each group of disturbed sentences and sentences with normal language sequence form a piece of training data in a new data set.

3. The emotion polarity analysis method based on migratory learning of claim 1,

step 2, training a text sequencing model, constructing the text sequencing model based on a seq2seq mode, firstly, inputting a disturbed sentence as a model, and extracting sentence characteristics by using an encoder; then decoding word by word, and predicting and outputting words according to the decoding characteristics of the current time step; and finally, comparing the output of the model with the characters at the corresponding positions of the normal word order, and training the parameters of the model based on the cross entropy loss function.

4. The emotion polarity analysis method based on migratory learning of claim 1,

and 3, parameter migration, specifically, firstly constructing an emotion polarity analysis model, using a coding structure the same as that of the text sequencing model, and then migrating the coding parameters and word vector parameters of the text sequencing model into the emotion polarity analysis model.

5. The emotion polarity analysis method based on transfer learning of claim 1, wherein, in step 4, an emotion polarity analysis model is trained, and based on the emotion polarity analysis data set, a comment text is firstly input, a sentence feature is extracted by using an encoder, and then a local feature, an overall feature and a final feature after noise reduction of the sentence are further extracted by using a convolutional neural network, a cyclic neural network and an attention mechanism; and finally classifying the features.

6. The emotion polarity analysis method based on transfer learning of claim 1, wherein in step 2, the text ranking model is trained as follows:

substep 2-1, sentence coding, extracting text coding features by Bert, summarizing the meanings of model related symbols into table 1 for description, and inputtingText sequence X (token) of length m₁,token₂,…,token_m) Index id in vocab according to token_iExtracting token's coded Emb_iAnd calculate e_iA text matrix consisting of the entire sentence denoted by s, e_iThe concrete calculation process of s is shown in the formulas (1) and (2), wherein position represents position coding, segment represents segment coding,

TABLE 1 model-related symbolic meanings

e_i＝Bert(Emb_i+segment_i+postion_i) (1)

s＝(e₁,e₂,…,e_m) (2)

Then, S is sent to a coding model containing 12 layers of transform sequences to extract a final coded output S, in formula (3), LN is a layer normalization operation, MSA is a multi-head self-attention operation, and taking the z-th layer as an example, the coded output S of the previous layer is processed by MSA first^z-1Then obtained through residual error and LN operation

Finally, FFN pairs are used as shown in formula (3)

Processing and combining residual and LN acquisition layer coded output s^zAs shown in equation (4), the calculation of FFN is shown in equation (5), where W₁、b₁、W₂、b₂Are all parameters that can be learned by the model,

FFN(x)＝max(0,xW₁+b₁)W₂+b₂ (5)

substep 2-2, sentence decoding, using GRU as basic unit of decoding network, the decoding process is shown in formulas (6) and (7), in formula (6), d_t-1Represents input, h_t-1Representing hidden layer output of the previous step, d₀Corresponding to CLS (a special character in vocab, which represents the beginning of a sentence) as input, formula (7) represents the hidden layer initialization of the decoding process, i.e. the coded output S is averaged and pooled first, and then linear layer processing is used to represent the initial hidden layer input h₀Wherein W is_s、b_sAre all parameters that can be learned by the model,

h_t＝GRU(d_t-1,h_t-1) (6)

h₀＝W_savg(S)+b_s (7)

substep 2-3, outputting the prediction, outputting h with the decoding hidden layer of each step_tAs query, using the coded output S as key and value, calculating context vector context by dot-product entry as shown in formula (8), and then splicing context and h_tAnd (3) obtaining the predicted probability distribution p of the model after the final characteristic of the current decoding step and the splicing characteristic are processed by linear transformation and a softmax function, wherein the process is shown as formula (9), wherein W is_p、b_pAre all model learnable parameters, and finally, the model log loss is calculated based on the predicted value p and the actual value y as shown in equation (10), where m represents the dictionary vocab size,

context＝Attention(h_t,S,S) (8)

p＝softmax(W_p[context,h_t]+b_p) (9)

。

7. the emotion polarity analysis method based on transfer learning of claim 1, wherein step 4, the emotion polarity analysis model is trained, specifically as follows,

substep 4-1, sentence coding, because the coding model is completely consistent with the text sequencing model, the coding process of the emotion polarity model is the same as substep 2-1, and S is used for representing coding output;

substep 4-2, extracting local features, and extracting a local feature representation T of the coded output S by using a one-dimensional convolution network, as shown in formula (11); specific calculation result T of each step_iAs shown in equation (12), where W and b are model learnable parameters; x represents the convolution calculation; i represents the ith step; k represents the convolution kernel width (this embodiment uses a convolution kernel of width 2),

T＝Conv(S) (11)

T_i＝tanh(W×S_i:i+k-1+b) (12)

substep 4-3, extracting integral characteristics C of sentences by using Bi-GRU as shown in formulas (13) and (14), bidirectional GRU comprises forward GRU, reading T generation from left to right

One reverse GRU generation

nt represents the length of the convolution output of the previous step,

and

respectively representing hidden layer outputs in two directions at the j step of the GRU model

And

obtaining the characteristic h of each step after splicing_jAs shown in the formula (15),

substep 4-4, reducing the attention, combining the attention mechanism, further optimizing the text characteristic C, and converting h into h_jObtaining v by perceptron (MLP)_jAs in equation (16), where W_aAnd b_aFor model learnable parameters, by v_jThe similarity with the context C is measured, the importance of the similarity is measured, and the final semantic feature C of the sentence is calculated in a weighted summation mode as shown in formula (17)_aAs shown in the formula (18),

v_j＝tanh(W_ah_j+b_a) (16)

substep 4-5, emotion polarity classification, first, the sentence characteristic C_aAfter linear change and softmax function processing, the classification probability distribution p of the model is obtained, and the process is shown as formula (19), wherein W_pAnd b_pIs a modelParameters can be learned; the model log-loss is then calculated based on p and the actual label y as shown in equation (20), where n represents the number of classes

p＝softmax(W_pC_a+b_p) (19)