CN110287320A

CN110287320A - A kind of deep learning of combination attention mechanism is classified sentiment analysis model more

Info

Publication number: CN110287320A
Application number: CN201910553755.7A
Authority: CN
Inventors: 刘磊; 孙应红; 陈浩; 李静
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2019-09-27
Anticipated expiration: 2039-06-25
Also published as: CN110287320B

Abstract

The present invention relates to a kind of deep learning of combination attention mechanism mostly classification sentiment analysis models, belong to natural language processing technique field, the present invention analyzes the weakness of existing CNN network and LSTM network in terms of text emotion analysis, proposes a kind of deep learning mostly classification sentiment analysis model of combination attention mechanism.The model use attention mechanism blends the word order feature of local feature and LSTM model extraction that CNN network extracts, and the thought of integrated model is used in classification layer, the affective characteristics that CNN network and LSTM network extract are spliced respectively, the affective characteristics finally extracted as model.By comparative experiments, it is found that the accuracy rate of the model has significant raising.

Description

A kind of deep learning of combination attention mechanism is classified sentiment analysis model more

Technical field

The invention belongs to text information processing field, emotion of classifying the deep learning for being related to a kind of combination attention mechanism more Analysis model.

Background technique

With the continuous rise of the social networks such as microblogging, Twitter, internet has been not only that people obtain daily information Source, while also become people express the indispensable platform of oneself viewpoint.People Web Community comment on focus incident, express The viewpoint that writes a motion picture review and description Product Experience etc. can all generate a large amount of text for having emotional color (such as: happiness, anger, grief and joy) Information, and effective sentiment analysis is carried out to these text informations, the interest tendency and concern journey of user can be best understood from Degree.But there are the text with emotional color of magnanimity in the increase with people to network information attention rate, Web Community daily It generates, if only being much unable to complete this task by handmarking, this, which allows for text emotion analysis, becomes certainly One research hotspot in right Language Processing field.

Successful application with deep learning method in computer vision direction, more and more depth learning technologies also by Applied to natural language processing direction.The advantage of deep learning is, can not only automatically extract the feature of text, but also to big Data have stronger ability to express.The text emotion analysis method based on deep learning of mainstream mainly includes convolutional Neural at present Network (Convolutional Neural Network, CNN) and Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) two kinds, the accuracy rate of the sentiment analysis model based on both methods is all lower, mainly there is following side The reason of face:

Firstly, convolutional neural networks are effectively captured by expanding convolution kernel size during the sentiment analysis of text To the emotion information of different location, and then get the local affective characteristics of text.But during convolution, text is often had ignored Context in this between word order.But in text emotion analytic process, the precedence relationship of word order is again particularly significant, without word order Characteristic information result must be caused to have certain deviation.

Secondly, Recognition with Recurrent Neural Network network utilizes front and back dependence, the sequencing of text data is effectively simulated, it can The word order relationship and semantic information of text are extracted, therefore good effect can be reached in the sentiment analysis of text.But work as sample When notebook data is longer or language contexts are more complex, the interval of useful emotion information varies, different in size, length memory Therefore the performance of network (Long Short-Term Memory, LSTM) is also restricted.

The present invention takes full advantage of attention mechanism, CNN network, LSTM network, proposes and realize a kind of combination to pay attention to The deep learning of power mechanism mostly classification sentiment analysis model.This model can effectively improve the accuracy rate of text emotion analysis.

Summary of the invention

The invention proposes a kind of deep learning based on attention mechanism mostly classification sentiment analysis models.The models coupling CNN network and LSTM network carry out affective characteristics fusion.Text to be analyzed is extracted first with the multiple dimensioned convolution kernel of CNN network Then this local feature utilizes attention mechanism, the local feature that CNN network extracts is dissolved into LSTM network.Finally Using the thought of integrated model, the feature extraction result of the pond layer result of CNN network and LSTM network is spliced, as Final model output.Experiment shows that in text emotion analysis, the accuracy rate of the model has significant raising.

To achieve the above object, the present invention adopts the following technical scheme that:

A kind of sentiment analysis method 1. deep learning of combination attention mechanism is classified more, it is characterised in that including following step It is rapid:

Step (1) data prediction

If affection data set representations are as follows: G=[(segtxt₁,y₁),(segtxt₂,y₂),......,(segtxt_N,y_N)], Wherein, segtxt_iIndicate i-th of sample, y_iIt is then corresponding emotional category label, N indicates number of samples in data set G, to G Middle sample carries out data prediction,

Data set G after pretreatment, is expressed as G '=[(seg₁,y₁),(seg₂,y₂),...,(seg_M,y_M)], in which: seg_iIt is expressed as i-th of sample, y in data set G '_iIt is then corresponding emotional category label, M indicates the middle sample of data set G ' Number；

The input of step (2) building model

Sample data (seg, y) to be analyzed for any one in data set G ', by it, further refinement is indicated are as follows:

Seg=[w₁,w₂,w₃,...,w_d]^T (1)

Y=[0,0,1 ..., 0] (2)

Wherein: w_i∈R^εRefer to and encoded according to one-hot of the vocabulary wordList to the i-th word in text to be analyzed, ε is The size of vocabulary wordList, d indicate that the sentence of the text is long.y∈R^pIt is the one-hot coding according to emotional category, p indicates mould Type class number to be divided.Then the term vector embeded matrix of the sample may be expressed as:

X=seg*E^T (3)

Wherein: X ∈ R^d×m, X=[x₁,x₂,...,x_d]^TIt is indicated for the term vector matrix of text to be analyzed, m is term vector Dimension, x_i∈R^mIt is indicated for the term vector of i-th of vocabulary in the text, E is the expression of term vector embeding layer；

Step (3) constructs deep learning mostly classification sentiment analysis model

Deep learning sentiment analysis model of more classifying includes local shape factor stage based on CNN network and based on LSTM The word order relationship characteristic of network extracts the stage, by the pond layer result C in the local shape factor stage based on CNN network_CnnAnd base The result C' in stage is extracted in the word order relationship characteristic of LSTM network_RnnSplicing, i.e. vector [C_Cnn；C'_Rnn] finally mentioned as model The feature vector taken.Then by feature vector [C_Cnn；C'_Rnn] obtain final model output vector by full articulamentumWherein p indicates model class number to be divided.

The local shape factor stage based on CNN network, including the following contents:

The input of local shape factor stage is that the term vector matrix of the text to be analyzed of formula 3 indicates X；

The local shape factor stage is based on CNN network, altogether includes two layers, i.e. one layer of convolutional layer, one layer of pond layer, In:

Convolutional layer carries out convolution, and same scale convolution kernel to text to be analyzed using the convolution kernel of n kind different scale Filter, that is, neuron each k；

The resulting vector of convolution is done down-sampling using the method for maximum pond layer by pond layer, selects local optimum feature, Therefore each filter becomes a scalar by maximum pond layer, and it is special which represents emotion optimal in the filter Sign；

The output of local shape factor module is C_Cnn=[c₁,c₂,...,c_nk], i.e., it will be various sizes of more in the layer of pond The optimal characteristics that a filter is chosen are spliced together C_Cnn=[c₁,c₂,...,c_nk] output as this module, wherein C_Cnn ∈R^nk, nk is the number of all filters in convolutional layer；

The word order relationship characteristic based on LSTM network extracts stage, including the following contents:

Multiple dimensioned CNN network local shape factor: convolutional layer in the local shape factor stage based on CNN network is same The convolution results of k filter of convolution scale are spliced, and set Z is obtained_Cnn, then will set Z_CnnIn each vector Z_iInput Into GLU mechanism, i.e., gate convolutional network, obtained result are denoted as { π₁,π₂,...,π_n, complete multiple dimensioned CNN network part The extraction of feature.

Wherein, Z_Cnn={ Z₁,Z₂,...,Z_n, Z_iFor the splicing for multiple filter convolution results that scale is i；

Wherein,Z_iRepresent k filter convolution results of a certain scale Splicing, W₁, W₂∈R^λ×qFor weight matrix, λ indicates the dimension of respective weights matrix, b₁, b₂∈R^qFor amount of bias, σ is indicated Sigmoid function, π_i∈R^q, q is the output dimension of LSTM network；

Then, using attention mechanism, by multiple dimensioned CNN network local shape factor result { π₁,π₂,...,π_nIncorporate Into LSTM network, the output result C' that the word order relationship characteristic based on LSTM network extracts the stage is obtained_Rnn, i.e.,

Wherein,Indicate the output of LSTM module corresponding to the last one word in text to be analyzed,It indicates wait divide The output of LSTM module corresponding to first word in text is analysed, the present invention uses two-way LSTM model, i.e. BiLSTM model,

Using forward-propagating, specific calculating process is as follows:

D is the length of text to be analyzed, the corresponding LSTM module of each word order in the text,

During forward-propagating, the output of the t-1 LSTM module isThe then output of t-th of LSTM moduleIt calculates Formula is as follows:

Wherein:It is the dot product of two vectors, also referred to as scoring functions, is for calculating previous word The output of LSTMWith the similarity of current local feature vectors,

Wherein: α_t,i∈ R represents feature π_iWeight,

Wherein: s_t-1∈R^qIt is the weighted results of multiple convolution features, utilizes s_t-1Instead ofIn conjunction with current term word to Measure x_tAcquire the output of current LSTM moduleFormula is as follows:

Using backpropagation, specific calculating process is as forward-propagating, and details are not described herein again；

Step (4) model training: inputting sentiment analysis models of classifying for training data more, using cross entropy loss function, In conjunction with backpropagation BP algorithm adjusting parameter, is returned using softmax as sorting algorithm, complete training；

Step (5) model analysis: it is analysed to the model that text input training is completed, after final output analyzes the text Emotional semantic classification result.

The preprocessing process the following steps are included:

1) it segments, removal deactivates, English capitalization turns small letter, traditional font turns simplified.

2) word that frequency in data set G is more than or equal to σ is chosen, vocabulary wordList={ word is constructed₁, word₂,...word_ε, wherein word_iIndicate that i-th of word in vocabulary wordlist, ε indicate that word frequency is more than in data set G The word sum of σ.

3) sample is deleted if length is greater than d to each sample in data set G, if length is less than d, uses symbol </>polishing.

The convolutional layer calculation formula of the local shape factor module based on CNN network is as follows:

Z=f (∑ W^T*x_i:i+s-1+b) (8)

Wherein: z indicates a neuron to the resulting feature vector of the convolution of text to be analyzed, and f () indicates activation letter Number, W ∈ R^s×mIndicate that the weight matrix of neuron, the same neuron parameter sharing, s × m indicate the size of convolution kernel size, b Indicate threshold value, x_i:i+s-1Indicate the term vector by i-th of word in text sentence to i+s-1 word.

The training data is the data after pretreatment.

The convolutional layer in the local shape factor stage based on CNN network uses the convolution kernel of 4 kinds of different scales.Institute The training termination condition stated is that accuracy rate no longer changes or reach setting the number of iterations.

Detailed description of the invention

Fig. 1 flow chart of the method for the present invention；

The deep learning of Fig. 2 combination attention mechanism mostly classification sentiment analysis model structure schematic diagram.

Specific embodiment

Below with reference to chart and embodiment, a specific embodiment of the invention is described in further detail.Following reality Example is applied for illustrating the present invention, but is not intended to limit the scope of the invention.

Method proposed by the present invention is successively realized according to the following steps:

Step (1) data prediction

Emotion language dataset representation are as follows: G=[(segtxt₁,y₁),(segtxt₂,y₂),......,(segtxt_N,y_N)], Wherein, segtxt_iIndicate i-th of sample, y_iIt is then corresponding emotional category label.N indicates number of samples in data set G, emotion Label takes " happiness ", " indignation ", " detest ", " low " four major class, and N takes 80000, wherein four each 20000 of class emotion samples. Data prediction is carried out to sample in G including the following steps:

2) word that frequency in data set G is more than or equal to σ is chosen, vocabulary wordList={ word is constructed₁, word₂,...word_ε, wherein word_iIndicate i-th of word in data set G, word frequency is more than the word of σ in ε expression data set G Sum.σ takes 2, and in finally obtained data set G, word frequency is more than or equal to word totally 41763 of 2, i.e. ε is 41763.

3) by after above-mentioned processing, the sample is deleted, if length if length is greater than d to each sample in data set G Less than d, then symbol is used</>polishing.D takes 64.

Data set G after pretreatment, is expressed as G '=[(seg₁,y₁),(seg₂,y₂),...,(seg_M,y_M)].Wherein: seg_iIt is expressed as i-th of sample, y in data set G '_iIt is then corresponding emotional category label, M indicates the middle sample of data set G ' Number.The number of the middle sample of final data collection G ' is 73150, and the sample strip number of each emotional category is as shown in table 1:

Sample size of all categories after table 1 pre-processes

The input of step (2) model

Seg=[w₁,w₂,w₃,...,w_d]^T (1)

Y=[0,0,1 ..., 0] (2)

Wherein: w_i∈R^εRefer to and encoded according to one-hot of the vocabulary wordList to the i-th word in text to be analyzed, ε is The size of vocabulary wordList, the long d of the sentence of text are 64.y∈R^pIt is the one-hot coding according to emotional category, p indicates model Class number to be divided, p 4.Then the term vector embeded matrix of the sample may be expressed as:

X=seg*E^T (3)

Wherein: X ∈ R^d×m, X=[x₁,x₂,...,x_d]^TIt is indicated for the term vector matrix of text to be analyzed, term vector dimension m Take 256.x_i∈R^mIt indicates that term vector embeding layer indicates E for the term vector of i-th of vocabulary in the text, is increased income using wikipedia Word2vec term vector, next using X as the input of network model.

Convolutional layer carries out convolution, and same scale convolution kernel to text to be analyzed using the convolution kernel of n kind different scale Filter, that is, neuron each k, n of the present invention takes 4, k to take 128.

The output of local shape factor module is C_Cnn=[c₁,c₂,...,c_nk], i.e., it will be various sizes of more in the layer of pond The optimal characteristics that a filter is chosen are spliced together C_Cnn=[c₁,c₂,...,c_nk] output as this module, wherein C_Cnn ∈R^nk, nk be convolutional layer in all filters number, totally 512；

Wherein,Z_iRepresent k filter convolution results of a certain scale Splicing, W₁, W₂∈R^λ×qFor weight matrix, λ indicates the dimension of respective weights matrix, b₁, b₂∈R^qFor amount of bias, σ is indicated Sigmoid function, π_i∈R^q, q is the output dimension of LSTM network, and q takes 256；

Using forward-propagating, specific calculating process is as follows:

Wherein: α_t,i∈ R represents feature π_iWeight,

Step (4) model training: inputting sentiment analysis models of classifying for training data more, using cross entropy loss function, In conjunction with backpropagation BP algorithm adjusting parameter, is returned using softmax as sorting algorithm, complete training.

Z=f (∑ W^T*x_i:i+s-1+b) (8)

Wherein: z indicates a neuron to the resulting feature vector of the convolution of text to be analyzed, and f () indicates activation letter Number, W ∈ R^s×mIndicate that the weight matrix of neuron, the same neuron parameter sharing, s × m indicate the size of convolution kernel size, b Indicate threshold value, x_i:i+s-1It indicates to take [2,3,4,5] four by the term vector of i-th of word in text sentence to i+s-1 word, s The different convolution size of kind, f () use RELU activation primitive.

The training data is the data after pretreatment.

1. experimental analysis

Test phase chooses happy, angry, detest, low emotion corpus of all categories respectively accounts for 2000.Use accuracy rate Acc (Accuracy) is used as evaluation index, and the parameter of test phase model remains unchanged, and the results are shown in Table 2 for test set:

2 sentiment analysis Comparative result of table

The test result comparison of several models is given in table 2, wherein experiment 1 be general convolution kernel having a size of 3 list Scale CNN network model, experiment 2 is general LSTM network, and experiment 3 is then the text proposed in this paper based on attention mechanism Sentiment analysis model.

It is proposed in this paper based on note by the comparative analysis of experiment as it can be seen that comparing common CNN network and LSTM network The accuracy rate of the sentiment analysis model for power mechanism of anticipating all significantly improves, and illustrates that method proposed by the present invention can be extracted effectively The local feature information of CNN network and the word order characteristic information of LSTM network, illustrate the validity of this method.

Claims

A kind of sentiment analysis method 1. deep learning of combination attention mechanism is classified more, it is characterised in that the following steps are included:

Step (1) data prediction

If affection data set representations are as follows: G=[(segtxt₁,y₁),(segtxt₂,y₂),...,(segtxt_N,y_N)], wherein segtxt_iIndicate i-th of sample, y_iIt is then corresponding emotional category label, N indicates number of samples in data set G, to sample in G Data prediction is carried out,

Data set G after pretreatment, is expressed as G '=[(seg₁,y₁),(seg₂,y₂),...,(seg_M,y_M)], in which: seg_iTable It is shown as i-th of sample, y in data set G '_iIt is then corresponding emotional category label, M indicates the middle number of samples of data set G '；

The input of step (2) building model

Sample data (seg, y) to be analyzed for any one in data set G ', by it, further refinement is indicated are as follows:

Seg=[w₁,w₂,...,w_i,...,w_d]^T (1)

Y=[0,0,1 ..., 0] (2)

Wherein: w_i∈R^εRefer to and encoded according to one-hot of the vocabulary wordList to the i-th word in text to be analyzed, ε is vocabulary The size of wordList, d indicate that the sentence of the text is long, y ∈ R^pIt is the one-hot coding according to emotional category, p indicates that model waits for The class number divided, then the term vector embeded matrix of the sample may be expressed as:

X=seg*E^T (3)

Wherein: X ∈ R^d×m, X=[x₁,x₂,...,x_d]^TIt is indicated for the term vector matrix of text to be analyzed, m is the dimension of term vector Degree, x_i∈R^mIt is indicated for the term vector of i-th of vocabulary in the text, E is the expression of term vector embeding layer；

Step (3) constructs deep learning mostly classification sentiment analysis model

Deep learning sentiment analysis model of more classifying includes local shape factor stage based on CNN network and based on LSTM network Word order relationship characteristic extract the stage, by the pond layer result C in the local shape factor stage based on CNN network_CnnBe based on The word order relationship characteristic of LSTM network extracts the result C' in stage_RnnSplicing, i.e. vector [C_Cnn；C'_Rnn] finally extracted as model Feature vector, then by feature vector [C_Cnn；C'_Rnn] obtain final model output vector by full articulamentumWherein p indicates model class number to be divided,

The local shape factor stage based on CNN network, including the following contents:

The input of local shape factor stage is that the term vector matrix of the text to be analyzed of formula 3 indicates X；

The local shape factor stage is based on CNN network, altogether includes two layers, i.e. one layer of convolutional layer, one layer of pond layer, in which:

Convolutional layer carries out convolution, and the filtering of same scale convolution kernel to text to be analyzed using the convolution kernel of n kind different scale Device, that is, neuron each k；

The resulting vector of convolution is done down-sampling using the method for maximum pond layer by pond layer, selects local optimum feature, therefore Each filter becomes a scalar by maximum pond layer, which represents affective characteristics optimal in the filter；

The output of local shape factor module is C_Cnn=[c₁,c₂,...,c_nk], i.e., by multiple filtering various sizes of in the layer of pond The optimal characteristics that device is chosen are spliced together C_Cnn=[c₁,c₂,...,c_nk] output as this module, wherein C_Cnn∈R^nk, nk For the number of filters all in convolutional layer；

The word order relationship characteristic based on LSTM network extracts stage, including the following contents:

Multiple dimensioned CNN network local shape factor: by the same convolution of convolutional layer in the local shape factor stage based on CNN network The convolution results of k filter of scale are spliced, and set Z is obtained_Cnn, then will set Z_CnnIn each vector Z_iIt is input to In GLU mechanism, i.e., gate convolutional network, obtained result are denoted as { π₁,π₂,...,π_n, it is special to complete multiple dimensioned CNN network part The extraction of sign,

Wherein, Z_Cnn={ Z₁,Z₂,...,Z_n, Z_iFor the splicing for multiple filter convolution results that scale is i；

Wherein,Z_iRepresent the spelling of k filter convolution results of a certain scale It connects, W₁, W₂∈R^λ×qFor weight matrix, λ indicates the dimension of respective weights matrix, b₁, b₂∈R^qFor amount of bias, σ indicates sigmoid Function, π_i∈R^q, q is the output dimension of LSTM network；

Then, using attention mechanism, by multiple dimensioned CNN network local shape factor result { π₁,π₂,...,π_nBe dissolved into In LSTM network, the output result C' that the word order relationship characteristic based on LSTM network extracts the stage is obtained_Rnn, i.e.,

Wherein,Indicate the output of LSTM module corresponding to the last one word in text to be analyzed,Indicate text to be analyzed The output of LSTM module corresponding to first word in this, the present invention use two-way LSTM model, i.e. BiLSTM model,

Using forward-propagating, specific calculating process is as follows:

D is the length of text to be analyzed, the corresponding LSTM module of each word order in the text,

During forward-propagating, the output of the t-1 LSTM module isThe then output of t-th of LSTM moduleCalculation formula It is as follows:

Wherein:It is the dot product of two vectors, also referred to as scoring functions, is the LSTM for calculating previous word OutputWith the similarity of current local feature vectors,

Wherein: α_t,i∈ R represents feature π_iWeight,

Wherein: s_t-1∈R^qIt is the weighted results of multiple convolution features, utilizes s_t-1Instead ofIn conjunction with the term vector x of current term_t Acquire the output of current LSTM moduleFormula is as follows:

Using backpropagation, specific calculating process is as forward-propagating, and details are not described herein again；

Step (4) model training: inputting sentiment analysis models of classifying for training data more, using cross entropy loss function, in conjunction with Backpropagation BP algorithm adjusting parameter is returned as sorting algorithm using softmax, completes training；

Step (5) model analysis: it is analysed to the model that text input training is completed, final output is to the feelings after text analysis Feel classification results.
The sentiment analysis method 2. a kind of deep learning of combination attention mechanism according to claim 1 is classified more, it is special Sign is, the preprocessing process the following steps are included:

1) segment, remove deactivate, English capitalization turn small letter, traditional font turn it is simplified,

2) word that frequency in data set G is more than or equal to σ is chosen, vocabulary wordList={ word is constructed₁,word₂, ...word_ε, wherein word_iIndicate i-th of word in vocabulary wordlist, word frequency is more than the word of σ in ε expression data set G Language sum,

3) sample is deleted if length is greater than d to each sample in data set G, if length is less than d, uses symbol</> Polishing.
The sentiment analysis method 3. a kind of deep learning of combination attention mechanism according to claim 1 is classified more, it is special Sign is that the convolutional layer calculation formula of the local shape factor module based on CNN network is as follows:

Z=f (∑ W^T*x_i:i+s-1+b) (8)

Wherein: z indicates a neuron to the resulting feature vector of the convolution of text to be analyzed, and f () indicates activation primitive, W ∈R^s×mIndicate that the weight matrix of neuron, the same neuron parameter sharing, s × m indicate the size of convolution kernel size, b is indicated Threshold value, x_i:i+s-1Indicate the term vector by i-th of word in text sentence to i+s-1 word.
The sentiment analysis method 4. a kind of deep learning of combination attention mechanism according to claim 1 is classified more, it is special Sign is that the training data is the data after pretreatment.
The sentiment analysis method 5. a kind of deep learning of combination attention mechanism according to claim 1 is classified more, it is special Sign is that the convolutional layer in the local shape factor stage based on CNN network uses the convolution kernel of 4 kinds of different scales.
The sentiment analysis method 6. a kind of deep learning of combination attention mechanism according to claim 1 is classified more, it is special Sign is that the training termination condition is that accuracy rate no longer changes or reach setting the number of iterations.