CN113326695B

CN113326695B - Emotion polarity analysis method based on transfer learning

Info

Publication number: CN113326695B
Application number: CN202110455888.8A
Authority: CN
Inventors: 杨鹏; 任炳先; 周华健; 于晓潭
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2024-04-26
Anticipated expiration: 2041-04-26
Also published as: CN113326695A

Abstract

The invention discloses an emotion polarity analysis method based on transfer learning. And then, constructing an emotion polarity analysis model, and transferring relevant parameters obtained by training the text sequencing model to corresponding positions in the emotion polarity analysis model. And finally, continuing training the migrated model on the emotion polarity analysis data set. The invention introduces the field priori knowledge for the emotion polarity analysis model by utilizing the transfer learning, can improve the field adaptability of the model, is beneficial to obtaining sentence characteristics with higher quality, and further improves the classification accuracy of the model.

Description

Emotion polarity analysis method based on transfer learning

Technical Field

The invention relates to an emotion polarity analysis method based on transfer learning, and belongs to the field of Internet and natural language processing.

Background

With the continuous development and progress of mobile internet technology, various large news media and the general public tend to release views and attitudes of real events in social networks. Emotion polarity analysis is a method for automatically acquiring text emotion tendencies or emotion categories by using related theory of natural language processing, and has great utilization value.

At present, research on emotion polarity analysis at home and abroad has obtained abundant results, and has reference significance for research work of the invention. The existing emotion polarity analysis method is mainly divided into 3 types, namely emotion dictionary-based emotion learning-based emotion polarity analysis method and deep learning-based emotion polarity analysis method. The emotion dictionary-based method introduces expert knowledge into the emotion value calculation process, is suitable for scenes which lack large-scale anticipation, but has the defects of continuous expansion and poor portability. Compared with the method based on the emotion dictionary, the method based on the machine learning has the advantages of simple modeling, and better expansibility and portability. However, the machine learning method requires a high-quality labeling data set, and consumes a certain labeling cost. The deep learning-based method trains the classifier using a neural network model. Compared with the method based on emotion dictionary and machine learning, the deep learning model has stronger expression capability and obtains better classification indexes. In recent years, with the development of a pre-training language model, a deep learning-based mode achieves better effects. However, considering a specific application scenario, the existing deep learning model still has a disadvantage. First, the lack of sentence-level pre-training tasks in existing language models results in the model's ability to logically perceive and semantically express remains a room for improvement. In addition, for emotion analysis of social comments, the content of the emotion analysis model has the characteristics of brevity and random, so that sentence features are not dense and noise is high, and classification robustness of the emotion polarity analysis model is insufficient.

Aiming at the problems of insufficient logic perception and semantic expression capability and insufficient classification robustness of the current deep learning model, the invention provides an emotion polarity analysis method based on transfer learning. On the one hand, the model uses a transfer learning method, firstly, the logic perception and semantic expression capability of the model is trained in a text sequencing task, and then related model parameters are transferred into an emotion polarity analysis model. Through transfer learning, the model can obtain domain priori knowledge, obtain high-quality sentence characteristics and improve the classification accuracy of the model. On the other hand, after the emotion polarity analysis model extracts sentence characteristics, the characteristic noise is further reduced by combining with the attention mechanism, and the classification robustness of the model can be improved.

Disclosure of Invention

Aiming at the problems and the defects in the existing emotion polarity analysis technology, the invention provides an emotion polarity analysis method based on transfer learning, which introduces field priori knowledge into an emotion polarity analysis model based on transfer learning and can improve the classification accuracy of the model. Meanwhile, the invention reduces noise interference for the emotion polarity analysis model based on the attention mechanism, and can improve the classification robustness of the model.

In order to achieve the above object, the technical scheme of the present invention is as follows: according to the emotion polarity analysis method based on transfer learning, firstly, the positions of characters or words in comment texts are disturbed according to a certain proportion. Then, a text ranking model is constructed, and the model is trained by taking the disturbed sentences as input and the sentences with normal word ranks as output. And then, constructing an emotion polarity analysis model, and migrating relevant parameters in the text ordering model to corresponding positions in the emotion polarity analysis model. And finally, continuing training the migrated emotion polarity analysis model. According to the method, priori knowledge is introduced into the emotion polarity analysis model through transfer learning, so that the field adaptability of the model can be improved, higher-quality sentence characteristics can be obtained, and the classification accuracy of the model is improved.

The emotion polarity analysis method based on transfer learning mainly comprises 4 steps, and specifically comprises the following steps:

And 1, constructing a sentence pair data set. The word position of each sentence in the emotion polarity analysis data set is disturbed according to a set proportion (the proportion size is determined according to the comparison experiment result), meanwhile, the sentences before disturbance are reserved, and each group of disturbed sentences and sentences with normal word order form one piece of training data in the new data set.

And 2, training a text ordering model. Constructing a text ordering model based on a seq2seq mode, firstly taking a disturbed sentence as a model input, and extracting sentence characteristics by using an encoder; then decoding word by word, predicting and outputting words according to the decoding characteristics of the current time step; finally, comparing the model output with characters at positions corresponding to the normal language order, and training model parameters based on the cross entropy loss function.

And 3, parameter migration. Firstly, constructing an emotion polarity analysis model, using the same coding structure as that of the text ordering model, and then migrating coding parameters and word vector parameters of the text ordering model into the emotion polarity analysis model.

And 4, training an emotion polarity analysis model. Firstly inputting comment text based on emotion polarity analysis data set, extracting sentence characteristics by using an encoder, and then further extracting local characteristics, whole characteristics and final characteristics after noise reduction of sentences by using a convolutional neural network, a cyclic neural network and an attention mechanism; and finally classifying the features.

Compared with the prior art, the invention has the following technical effects:

1. According to the method, based on transfer learning, a text ordering model is firstly constructed, and then parameters learned by the text ordering model are transferred to an emotion polarity analysis model. The defect that the current language model lacks of sentence-level pre-training tasks is effectively overcome, the logic perception and semantic expression capacity of the model can be improved, and higher-quality sentence characteristic representation is obtained. In the embodiment, the model after the transfer learning is improved by 3.7% on the classification accuracy index, and the effectiveness of the scheme is verified.

2. The emotion polarity analysis model firstly uses CNN (convolutional neural network) to extract sentence local characteristics, then uses BiGRU (bidirectional gating circulation unit) to extract sentence integral characteristics, and finally uses attention mechanism to reduce characteristic noise, so that the interference of the random nature of the social text on model classification can be reduced. Ablation experiments on the examples showed that: the increased attention mechanism achieves higher classification robustness than if the overall features were classified directly.

Drawings

FIG. 1 is an overall frame diagram of an embodiment of the present invention.

Fig. 2 is a diagram of a text ranking model framework according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an emotion polarity analysis model according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated below in conjunction with specific embodiments in order to enhance the understanding and appreciation of the invention.

Example 1: an emotion polarity analysis method based on transfer learning is shown in figure 1 in the whole working flow when in implementation. The method comprises the steps of firstly constructing a text ordering model as shown in fig. 2, wherein the model overcomes the defect that a current language model lacks a sentence-level pre-training task, and can improve the logic perception and semantic expression capability of the model. Then, an emotion polarity analysis model is constructed as shown in fig. 3, and relevant parameters obtained through training of the text sorting model are migrated to corresponding positions in the emotion polarity analysis model. And finally, continuing training the migrated model on the emotion polarity analysis data set. The method comprises the following specific implementation steps:

And 1, constructing a sentence pair data set. The sentences in the emotion polarity analysis data set are disturbed according to a set proportion (after the comparison test, the embodiment is set to 25%), the sentences before the disturbance are reserved, and the sentence pairs formed by the sentences after the disturbance and the sentences with normal word order are used as one piece of training data in the new data set. For example, sentences of normal order are: all medical staff are hard. The disturbed sentences are: is bitter for medical staff. Where the context of both "pungent" and "bitter" changes.

And 2, training a text ordering model. Constructing a text ordering model in a seq2seq mode, firstly taking a disturbed sentence as a model input, and extracting sentence characteristics by using an encoder; then decoding word by word, predicting the words to be output according to the decoding characteristics of the current time step; and finally, comparing the model output with a normal language order, and training model parameters by using a logarithmic loss function. This step can be divided into 3 sub-steps, and the specific embodiment is as follows.

Substep 2-1, sentence coding. In this embodiment, the Bert is used to extract text coding features, and for convenience of description, the meanings of the model-related symbols are summarized in table 1. First, a text sequence X= (token ₁,token₂,…,token_m) with a length of m is input, a code Emb _i of the token is extracted according to an index id _i of the token in vocab, e _i is calculated, s represents a text matrix formed by the whole sentence, specific calculation processes of e _i and s are shown in formulas (1) and (2), wherein the position represents a position code, and the segment represents a segment code.

TABLE 1 model related symbol meanings

e_i＝Bert(Emb_i+segment_i+postion_i) (1)

s＝(e₁,e₂,…,e_m) (2)

S is then fed into a coding model containing a 12-layer transducer sequence to extract the final code output S. LN in equation (3) is a layer normalization operation and MSA is a multi-head self-attention operation. Taking the z-th layer as an example, the coding output s ^z-1 of the upper layer is firstly processed by MSA, and then the residual error and LN operation are carried out to obtainFinally, as shown in the formula (3), the FFN pair/>Processing is performed and layer encoded output s ^z is obtained in combination with the residual and LN as shown in equation (4). The calculation process of FFN is shown in formula (5), wherein W ₁、b₁、W₂、b₂ is a model learning parameter.

FFN(x)＝max(0,xW₁+b₁)W₂+b₂ (5)

And 2-2, sentence decoding. The GRU is used as a base unit of the decoding network. The decoding process is shown in formulas (6) and (7). In formula (6), d _t-1 represents an input, and h _t-1 represents a hidden layer output of the previous step. d ₀ corresponds to a CLS (a special character in vocab, representing the start of a sentence) as input. Equation (7) represents hidden layer initialization in the decoding process, namely, firstly, the coding output S is subjected to average pooling, and then, the initial hidden layer input h ₀ is represented after linear layer processing, wherein W _s、b_s is a model-learnable parameter.

h_t＝GRU(d_t-1,h_t-1) (6)

h₀＝W_savg(S)+b_s (7)

And 2-3, outputting prediction, namely taking the decoding hidden layer output h _t of each step as a query, taking the coding output S as a key and a value, calculating a context vector context in a dot-product attention mode as shown in a formula (8), then splicing context and h _t as final characteristics of the current decoding step, and processing the spliced characteristics through linear transformation and softmax functions to obtain a prediction probability distribution p of a model, wherein the process is shown in a formula (9), and W _p、b_p is a model learning parameter. Finally, a model log loss is calculated based on the predicted value p and the actual value y as shown in equation (10), where m represents the dictionary vocab size.

context＝Attention(h_t,S,S) (8)

p＝softmax(W_p[context,h_t]+b_p) (9)

And 3, parameter migration. After the training of the text ordering model is completed, the emotion polarity analysis model is built by using the same coding structure as the text ordering model. And then migrating the coding model parameters and the word vector parameters of the text ordering model into the emotion polarity analysis model.

And 4, training an emotion polarity analysis model. Based on emotion polarity analysis data set, firstly inputting a text into a model, extracting sentence characteristics by using an encoder, and then extracting local characteristics, whole characteristics and final characteristics after noise reduction of sentences by using a convolutional neural network, a cyclic neural network and an attention mechanism; and finally classifying the features. This step can be divided into 5 sub-steps, and the specific embodiment is as follows.

Sub-step 4-1, sentence coding. Since the coding model is completely consistent with the text ordering model, the coding process of the emotion polarity model is the same as that of the substep 2-1. The encoded output is denoted by S.

And sub-step 4-2, extracting local features. Extracting a local feature representation T of the encoded output S using a one-dimensional convolutional network, as shown in equation (11); the specific calculation result T _i of each step is shown as a formula (12), wherein W and b are model-learnable parameters; x represents convolution calculation; i represents the i-th step; k represents the convolution kernel width (this embodiment uses a convolution kernel of width 2).

T＝Conv(S) (11)

T_i＝tanh(W×S_i:i+k-1+b) (12)

And sub-step 4-3, extracting the integral features. The Bi-GRU is used for extracting the sentence integral characteristic C as shown in formulas (13) and (14). The bidirectional GRU comprises a forward GRU reading T generation from left to rightReverse GRU Generation/>Nt denotes the length of the previous convolution output. /(I)And/>And respectively representing hidden layer output in two directions when the GRU model is subjected to the j-th step. Will beAnd/>The characteristic h _j of each step obtained after the splicing is shown in formula (15).

And sub-step 4-4, attention is paid to noise reduction. The actual comment text may be quite noisy and it is generally not desirable to make a final prediction of the input model with no difference for all parts of the sentence. In combination with the attention mechanism, the text feature C can be further optimized. V _j is obtained by passing h _j through a perceptron (MLP) as in equation (16), where W _a and b _a are model learnable parameters. The importance of v _j by similarity to context C is shown in equation (17). The sentence final semantic feature C _a is calculated by means of weighted summation as shown in equation (18).

v_j＝tanh(W_ah_j+b_a) (16)

Sub-step 4-5, emotion polarity classification. Firstly, carrying out linear change and softmax function processing on sentence characteristics C _a to obtain classification probability distribution p of a model, wherein the process is shown in a formula (19), and W _p and b _p are model learning parameters; the model log loss is then calculated based on p and the actual label y as shown in equation (20), where n represents the number of categories.

p＝softmax(W_pC_a+b_p) (19)

It is to be understood that these examples are for illustration only and not for limitation of the scope of the application, and that modifications of the application in its various equivalents will fall within the scope of the application as defined by the appended claims after reading the application.

Claims

1. An emotion polarity analysis method based on transfer learning is characterized by comprising the following steps:

step 1, constructing a sentence pair data set,

Step 2, training a text ordering model,

Step 3, parameter migration,

Training an emotion polarity analysis model;

step 3, parameter migration, namely firstly constructing an emotion polarity analysis model, using the same coding structure as that of a text ordering model, and then migrating coding parameters and word vector parameters of the text ordering model into the emotion polarity analysis model;

step 2, training a text ordering model, which specifically comprises the following steps:

In the substep 2-1, sentence coding, extracting text coding features by using Bert, and the meanings of the related symbols of the model are as follows:

token: each word in the dataset;

n: total number of token in the dataset;

h: word vector dimensions of Token;

Emb: embedding matrix, shape: nxh;

vocab: a token dictionary, token: index id;

First inputting a text sequence X= (token ₁,token₂,…,token_m) with a length of m, extracting a code Emb _i of the token according to an index id _i of the token in vocab, calculating e _i, using s to represent a text matrix formed by the whole sentence, specifically calculating e _i and s as shown in formulas (1) and (2), wherein the position represents a position code, the segment represents a segment code,

e_i＝Bert(Emb_i+segment_i+postion_i) (1)

s＝(e₁,e₂,…,e_m) (2)

Then S is sent into a coding model containing 12 layers of transformers to extract the final coding output S, LN in the formula (3) is a layer normalization operation, MSA is a multi-head self-attention operation, taking the z-th layer as an example, the coding output S ^z-1 of the upper layer is firstly processed by MSA, and then residual error and LN operation are carried out to obtain the multi-head self-attention algorithmFinally, as shown in the formula (3), the FFN pair/>Processing is performed, and the layer encoded output s ^z is obtained by combining the residual and LN as shown in formula (4), the calculation process of FFN is shown in formula (5), wherein W ₁、b₁、W₂、b₂ is a model-learnable parameter,

FFN(x)＝ max(0,xW₁+b₁)W₂+b₂ (5)

In the substep 2-2, sentence decoding, GRU is used as a basic unit of a decoding network, the decoding process is shown in formulas (6) and (7), in the formula (6), d _t-1 represents input, h _t-1 represents hidden layer output of the last step, d ₀ corresponds to CLS as input, the formula (7) represents hidden layer initialization of the decoding process, namely, firstly, the coding output S is subjected to average pooling, then, the linear layer processing is used for representing initial hidden layer input h ₀, wherein W _s、b_s is model-learnable parameters,

h_t＝GRU(d_t-1,h_t-1) (6)h₀＝W_savg(S)+b_s (7)

Sub-step 2-3, outputting prediction, taking the decoding hidden layer output h _t of each step as query, taking the coding output S as key and value, calculating context vector context as shown in formula (8) in a dot-product Attention mode, then splicing context and h _t as final characteristics of the current decoding step, and processing the spliced characteristics through linear transformation and softmax function to obtain a prediction probability distribution p of a model, wherein W _p、b_p is a model learning parameter, finally calculating a model logarithmic loss as shown in formula (10) based on a predicted value p and an actual value y, wherein m represents the size of a dictionary vocab, context=attribute (h _t, S, S) (8)

p＝softmax(W_p[context,h_t]+b_p) (9)

Wherein, step 4, training emotion polarity analysis model, concretely as follows,

Step 4-1, sentence coding, wherein the coding model is completely consistent with the text ordering model, so that the coding process of the emotion polarity model is the same as that of step 2-1, and S is used for representing coding output;

Sub-step 4-2, extracting local features, and extracting a local feature representation T of the encoded output S by using a one-dimensional convolution network, as shown in a formula (11); the specific calculation result T _i of each step is shown as a formula (12), wherein W and b are model-learnable parameters; x represents convolution calculation; i represents the i-th step; k represents the width of the convolution kernel,

T＝Conv(S) (11)

T_i＝tanh(W×S_i:i+k-1+b) (12)

Sub-step 4-3, extracting integral features, extracting sentence integral features C by Bi-GRU as shown in formulas (13), (14), bi-directional GRU including a forward GRU generated from left to right reading TReverse GRU Generation/>Nt represents the length of the last convolution output,/>And/>Hidden layer outputs in two directions respectively representing jth step of GRU model and will/>And/>The characteristic h _j of each step obtained after the splicing is shown in a formula (15),

Substep 4-4, attention denoising, combining with an attention mechanism, further optimizing the text feature C, obtaining v _j as shown in formula (16) by h _j through a perceptron MLP, wherein W _a and b _a are model learnable parameters, the importance of which is measured by the similarity between v _j and the context C as shown in formula (17), calculating the final semantic feature C _a of the sentence as shown in formula (18) through a weighted summation mode,

v_j＝tanh(W_ah_j+b_a) (16)

Sub-step 4-5, emotion polarity classification, namely firstly, linear change and softmax function processing are carried out on sentence characteristics C _a to obtain classification probability distribution p of a model, wherein the process is shown in a formula (19), and W _p and b _p are model learnable parameters; the log-model loss is then calculated based on p and the actual label y as shown in equation (20), where n represents the number of categories

p＝softmax(W_pC_a+b_p) (19)

2. The method for emotion polarity analysis based on transfer learning of claim 1,

Step 1, constructing a sentence pair data set, namely disturbing the word position of each sentence in the emotion polarity analysis data set according to a set proportion, simultaneously reserving the sentences before disturbance, and forming a training data in a new data set by each group of the sentences after disturbance and the sentences with normal word order.

3. The method for emotion polarity analysis based on transfer learning of claim 1,

Step 2, training a text ordering model, constructing the text ordering model based on a seq2seq mode, firstly taking a disturbed sentence as a model input, and extracting sentence characteristics by using an encoder; then decoding word by word, predicting and outputting words according to the decoding characteristics of the current time step; finally, comparing the model output with characters at positions corresponding to the normal language order, and training model parameters based on the cross entropy loss function.

4. The emotion polarity analysis method based on transfer learning according to claim 1, wherein step 4, training an emotion polarity analysis model, inputting comment text based on emotion polarity analysis data set, extracting sentence characteristics by an encoder, and then further extracting local characteristics, whole characteristics and final characteristics after noise reduction of sentences by a convolutional neural network, a cyclic neural network and an attention mechanism; and finally classifying the features.