CN116644760A

CN116644760A - Dialogue text emotion analysis method based on Bert model and double-channel model

Info

Publication number: CN116644760A
Application number: CN202310537056.XA
Authority: CN
Inventors: 宋永端; 杨环宇; 杨凡; 罗倩; 向清; 冯柄茱; 陈宇通; 盖瑞雪; 王玉娟
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-05-13
Filing date: 2023-05-13
Publication date: 2023-08-25

Abstract

The invention discloses a dialogue text emotion analysis method based on a Bert model and a double-channel model, which uses the Bert model to carry out word-level vectorization operation on an input text, enhances the semantic expression of text vectors through pre-training learned prior knowledge, and strengthens the understanding of words in the text by using a masking strategy. The dialogue text is short text, the semantic information of the short text is extracted with a certain difficulty, the feature extraction layer combines the advantages of the BiGRU and the CNN network, the semantic information of the context is captured by using the BiGRU network, the emotion information of the text is measured globally, and the multi-level feature information in the text is locally extracted by means of the CNN network, so that the feature information which is more beneficial to emotion analysis of the text is captured. And splicing and inputting the characteristics extracted by the BiGRU and CNN networks into an emotion classification layer, and realizing emotion classification after full connection layer and Softmax operation.

Description

Dialogue text emotion analysis method based on Bert model and double-channel model

Technical Field

The invention belongs to the technical field of text emotion analysis, and relates to a dialogue text emotion analysis method based on a Bert model and a double-channel model.

Background

With the rapid development of artificial intelligence and natural language processing technologies, an open domain dialogue system plays an increasingly important role in man-machine interaction services, mainly focusing on semantic accuracy and content relevance of generated replies, but ignoring consideration of emotion level. Emotion is an important factor affecting interpersonal communication, generates emotion-rich replies, gives the dialogue system an 'co-emotion' capability, is beneficial to improving the sustainability of man-machine interaction and optimizing the dialogue experience of users, but in actual man-machine dialogue, emotion deviation is easy to occur in the system replies, so that the dialogue falls into a dead office. The dialogue text emotion analysis is a pre-task for solving the emotion deviation problem in man-machine dialogue, has important meaning for helping a robot to accurately sense the emotion of a user, and the dialogue text emotion tendency is classified into like, happiness, sadness, aversion, anger and no emotion. Before the text classification task, a single-channel model based on static or dynamic word vectors has achieved better results, but in the actual human dialogue process, dialogue texts often have the characteristics of spoken statement expression, sparse semantics, short length and the like, and most emotion analysis methods today cannot fully mine emotion characteristics of the dialogue texts, so that emotion analysis errors are easy to cause.

The invention relates to a dialogue text emotion analysis method based on a Bert model and a double-channel model, which aims at the characteristics of spoken language expression, sparse semantics, short length and the like of sentence expression in dialogue text, and utilizes a word-level text word vector representation method to mine text emotion characteristics from two angles of local and global semantic information, so that emotion analysis is more accurately carried out on texts in man-machine dialogue, and the accuracy and the comprehensiveness of analysis are improved.

Disclosure of Invention

The invention aims to provide a dialogue text emotion analysis method based on a Bert model and a double-channel model, which aims at the characteristics of spoken statement expression, sparse semantics, short length and the like in dialogue text, and constructs an emotion analysis model through a vector representation layer, a feature extraction layer and an emotion classification layer. Firstly, the traditional Chinese natural language processing task particularly depends on word segmentation, however, sentences in the dialogue text express spoken language and have sparse semantics, and meanwhile, a plurality of network popular words are contained, so that the common word segmentation tool has defects in word segmentation of the dialogue text, and is easy to cause OOV problems and cause loss of semantic information. Aiming at the problem, the vector representation layer uses the Bert model to carry out word-level vectorization operation on the input text, effectively solves the problem of OOV and semantic information loss, enhances the semantic expression of text vectors through the prior knowledge learned by pre-training, strengthens the understanding of words in the text by using a masking strategy, and effectively solves the defects of the traditional word vector model in understanding the two-way semantic features and word ambiguity of the text. Secondly, the dialogue text is mainly short text, and a certain difficulty exists in effectively extracting semantic information of the short text, so that the feature extraction layer combines the advantages of the BiGRU and the CNN network, captures the semantic information of the context by using the BiGRU network, measures the emotion information of the text from the global, and locally extracts multi-level feature information in the text by means of the CNN network so as to capture the feature information which is more beneficial to emotion analysis of the text. Finally, the characteristics extracted by the BiGRU and CNN networks are spliced and input into an emotion classification layer, and emotion classification is realized after full connection layer and Softmax operation.

In order to achieve the above purpose, the technical scheme adopted by the invention is a dialogue text emotion analysis method based on a Bert model and a two-channel model, and the method comprises the following steps:

step 1 training word vectors using the Bert model:

the word-level dynamic word vector Bert model is selected to vectorize the input text so as to obtain a more comprehensive text semantic representation, namely a text feature vector. The Bert model is trained by a new pre-training mode instead of a traditional unidirectional language model or a method of simply splicing two unidirectional language models, and the model better learns the syntax and semantic information in sentences through NSP and MLM tasks, understands the text language structure and the context relation, and effectively solves the defects of static word vectors. The invention uses the Bert-base model, the number of encoder layers is 12, the dimension of a hidden layer is 768, the number of attention heads is 12, and the total parameter amount is 110M.

Step 2, global feature extraction is carried out by using a BiGRU network:

and (3) extracting global features of the text feature vectors obtained in the step (1), extracting global semantic features of the text vectors output by the Bert model by using a bi-directional gating network BiGRU, wherein the BiGRU network consists of two GRUs in opposite directions, and more complete text semantic information can be obtained by splicing information in the two directions. In a GRU network, x _t For inputting the model at the time t, h _t And h _t-1 The hidden layer states at the time t and the time t-1 are respectively, and corresponding weight matrixes W are respectively set for a reset gate and an update gate in the model _r And W is _z After the data information passes through the two gating units, the candidate state at the current t moment can be obtained through calculationAnd hidden state h _t In calculation +.>Is->Is a weight matrix of (a). The calculation formulas are respectively as follows:

r _t ＝sigmoid(W _r ·[h _t-1 ,x _t ])

z _t ＝sigmoid(W _z ·[h _t-1 ,x _t ])

the BiGRU network is composed of two GRUs in opposite directions, more complete text semantic information can be obtained by splicing the information in the two directions,and->Respectively representing hidden states of the forward GRU and the backward GRU, and outputting h by the model _t Is formed by forward and reverse +.>And->The concrete calculation formula is shown as follows.

Step 3, extracting local features by using a CNN network:

and (3) extracting local features of the text feature vector obtained in the step (1), extracting local semantic features of the text vector output by the Bert model by using a convolutional neural network CNN, capturing n-gram features with different lengths by convolution kernels with different sizes in the network, and mining multi-level emotion information in the text. The CNN network is a feed-forward neural network that optimizes the network structure and fully extracts the local features of the data through weight sharing and local connections. The CNN network in the invention uses a filter with the sliding window size of (3, 4, 5) to extract local characteristics of the text vector, and then sends the obtained characteristic diagram into a maximum pooling layer for dimension reduction treatment.

Assuming that the size of the input feature matrix of the convolution network is n×k and the size of the sliding window matrix is m×k, the sliding window convolution operation calculation formula is as follows.

c _i ＝f(w·x _i:i+m-1 +b)

In the above, x in m lines of input features sampled by a sliding window _i:i+m-1 The dimension of the parameter weight w is m x k, b is a bias value, and f is a nonlinear activation function. When the size of the sliding window matrix is 2*k, n-1 convolution operation results can be obtained, and the results are spliced to finally output an n-1-dimensional feature vector c= (c) ₁ ,c ₂ ,c ₃ ,...,c _n-1 )。

Step 4, fusing the extracted characteristics of the BiGRU and CNN networks:

and (3) splicing the global feature vector and the local feature vector obtained in the steps (2) and (3) through a full connection layer, wherein the connection layer is used for splicing the feature vectors extracted by the BiGRU network and the CNN network so as to fully learn the features of the input data. In addition, the parameters of the full-connection layer after feature vector splicing are more, and the phenomenon of overfitting easily occurs during training, so that a dropout mechanism is used in the full-connection layer, and a certain proportion of neurons are randomly stopped in the forward propagation process of model training, so that the complex co-adaptation relationship among the neurons is relieved, and the generalization capability of the model is improved.

Step 5, using Softmax function to obtain emotion analysis results:

and (3) outputting the feature vectors obtained after the splicing in the step (4) to a classifier, and finally obtaining the classification probability of the emotion type. Since the text emotion classification task is a multi-classification task, a Softmax function is selected as a classifier, the probability p of the text belonging to different emotion categories is calculated, and the emotion category with the highest probability is selected as the scoreAnd (5) comparing the class result with the true emotion label to calculate the classification accuracy. The mathematical formula of the Softmax function is shown as follows, wherein C is the number of neurons of the output layer, z is a C-dimensional vector, and represents the output of the upper layer which is not processed by the Softmax function, and p ⁽ⁱ⁾ The probability representing class i is a scalar.

Compared with the prior art, the invention has the technical advantages that:

(1) Aiming at the characteristics of spoken statement expression, sparse semantics, short length and the like in the dialogue text, the invention uses the Bert model to train word vectors of word level, effectively solves the OOV problem and the word polysensity problem in the dialogue text, carries out finer granularity representation on the text, and retains semantic information of more original text.

(2) Secondly, the advantages of extracting global context semantic information by the BiGRU network and extracting multi-level semantic information of text local by the CNN network are combined, and emotion characteristics of the dialogue text are fully mined.

(3) And finally, splicing the emotion features extracted by the two-channel network, sending the emotion features into a classification layer, realizing emotion classification by using a Softmax function, and calculating the classification accuracy.

Drawings

FIG. 1 is a diagram of a dialogue text emotion analysis method model structure based on a Bert model and a two-channel model.

Fig. 2 is a schematic general flow diagram of a dialogue text emotion analysis method based on a Bert model and a two-channel model according to the invention.

FIG. 3 is a GRU model structure diagram and BiGRU network structure diagram in the present invention.

Fig. 4 is a schematic diagram of the CNN convolution process in the present invention.

Fig. 5 is an emotion classification accuracy result obtained in a chinese dataset according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and examples.

The technical scheme adopted by the invention is a dialogue text emotion analysis method based on a Bert model and a double-channel model, and the specific analysis process of the invention is as follows

(1) Training word vectors using the Bert model

The Bert model enhances the semantic expression of the text vector through the prior knowledge learned by pre-training, enhances the understanding of words in the text through a masking strategy, obtains more comprehensive text semantic representation, and effectively solves the defects of the traditional word vector model in understanding the text bi-directional semantic features and word ambiguity.

The Bert model is not trained by using the traditional unidirectional language model or a method for simply splicing two unidirectional language models, but a new pre-training mode is adopted to build the language model, and training tasks comprise NSP and MLM. The NSP task helps the model to understand the relationship between two sentences, simply by predicting whether the two sentences are linked together, and the MLM task is why the Bert model is not limited by the unidirectional language model, i.e., by randomly masking words in the sentences, and then using the remaining words and context to predict the original words in the masked locations. The two training tasks help to learn the syntax and semantic information in sentences better, understand the text language structure and the context, and can effectively solve the defects of static word vectors.

(2) Global feature extraction using biglu networks

The GRU network utilizes a gating mechanism to carry out selective memory and forgetting on information, can effectively solve the problems of gradient explosion, disappearance and long-term dependence of sequences, is simpler than the traditional cyclic neural network, can reduce the parameter quantity in the training process, reduces the overfitting risk of model training, ensures that the model is easier to converge, and improves the training speed of the model. Aiming at the phenomenon that strong emotion information is contained before and after emotion words in a dialogue text, the BiGRU network acquires more complete context text semantic information by splicing information in the front direction and the back direction.

(3) Local feature extraction using CNN networks

The CNN network locally extracts multi-level characteristic information in the text so as to capture the characteristic information which is more beneficial to the emotion analysis of the text, optimizes the network structure and fully mines semantic information in the text through weight sharing and local connection.

(4) Fusing the extracted characteristics of BiGRU and CNN networks

And (3) splicing the global feature vector and the local feature vector extracted in the steps (2) and (3), wherein the spliced features can fully represent emotion semantic information contained in the dialogue text. Dropout mechanism is added in the full-connection layer, and in the forward propagation process of model training, a certain proportion of neurons stop working randomly, so that complex co-adaptation relations among all neurons are relieved, and the generalization capability of the model is improved.

(5) Emotion analysis results Using Softmax function

And outputting the feature vectors obtained after the splicing to a classifier, and finally obtaining the classification probability of the emotion type. Since the text emotion classification task is a multi-classification task, a Softmax function is selected as a classifier to calculate the probability p that the text belongs to different emotion categories, and meanwhile, a cross entropy loss function is used for carrying out reverse optimization propagation in model training to adjust model parameters. And in the test process, selecting the emotion type with the largest classification probability as a classification result, and comparing the emotion type with the real label to calculate the classification accuracy.

(6) Performance evaluation

The invention uses three indexes of the accuracy rate, the recall rate and the F1 value to comprehensively judge the effect of the model.

Precision (Precision): and counting the proportion of samples with the actual category being positive examples in the samples with all the predicted results being positive examples. The calculation formula is shown below.

Recall (Recall): and counting the proportion of the samples with the actual categories as the positive examples and the predicted results as the positive examples. The calculation formula is shown below.

F1 value (F1-score): in reality, there are many problems that the sample is unbalanced, the accuracy and the recall are often not compatible, and when the accuracy is high, the recall is often low, and vice versa. Therefore, the blending mean value of the two values is taken to form an evaluation index F1 value, and under the condition that the accuracy rate and the recall rate are high, the higher the F1 value is, the better the performance of the representation model is. The calculation formula is shown below.

In order to evaluate the performance of the two-channel text emotion analysis model based on Bert, the model provided by the invention is compared with 6 comparison models, and the comparison results on the same data set are shown in figure 5, so that the effectiveness and accuracy of the method are verified.

Compared with other models, the Bert-based dual-channel text emotion analysis model provided by the invention has better performance under the same experimental condition, and the accuracy, recall rate and F1 value respectively reach 90.35%, 91.26% and 90.80%.

Compared with the model using Word2Vec, the model using the Bert to generate the Word vector has great improvement in the precision, recall and F1 values, which are respectively improved by 10.69%, 11.78% and 11.23%, which indicates that compared with the static Word vector model Word2Vec, the dynamic Word vector model Bert has deeper understanding of text content and more accurate and complete semantic expression.

Among all models that use the Bert model to generate word vectors, the Bert-CNN model using convolutional neural networks performs the worst in emotion analysis experiments. Comparing the experimental results of the Bert-GRU and Bert-CNN models, the unidirectional GRU network has better effect on text time sequence feature extraction compared with the CNN network. Comparing the Bert-GRU model and the Bert-BiGRU experimental results, the unidirectional GRU network has defects in extracting text semantic information, and the bidirectional BiGRU network can perform feature extraction on text semantic in front and back directions, thereby exerting long-distance dependence advantage and better capturing semantic information of a sentence. Compared with experimental results of the Bert-BiGRU model and the Bert-BiLSTM model, the BiGRU network with a simpler structure can reduce the parameter quantity in the training process, reduce the risk of overfitting, improve the training speed of the model and improve the feature extraction of text semantics to a certain extent. Comparing the experimental results of the Bert-BiGRU model and the Bert-BiGRU-CNN model, it is known that the performance of model emotion analysis cannot be improved by carrying out local feature extraction on the CNN network in series after the BiGRU network, but the performance is reduced because the CNN network can miss key emotion information, and the text emotion information is lost. Comparing the birgu-CNN model based on Bert, the experimental results of the Bert-birgu model and the Bert-CNN model can show that the disadvantages of the CNN network can be effectively overcome by using the bidirectional birgu network, the advantages of the two networks are fully fused, and a better classification effect is obtained. Compared with the method of extracting the context characteristics and then extracting the local key information, the method can save more text semantic information and effectively improve the accuracy of model emotion analysis.

Claims

1. A dialogue text emotion analysis method based on a Bert model and a double-channel model is characterized by comprising the following steps:

step 1 training word vectors using the Bert model:

selecting a dynamic word vector Bert model of a word level to vectorize an input text, and obtaining a comprehensive text semantic representation, namely a text feature vector; the method comprises the steps that a language model is built by a Bert model in a pre-training mode, and the Bert model learns syntax and semantic information in sentences through NSP and MLM tasks to understand the language structure and the context of input texts;

step 2, global feature extraction is carried out by using a BiGRU network:

extracting global features of the text feature vector obtained in the step 1, extracting global semantic features of the text vector output by the Bert model by using a bi-directional gating network BiGRU, wherein the BiGRU consists of two GRU networks in opposite directions, and acquiring complete text semantic information, namely the global feature vector, by splicing information in the two directions;

step 3, extracting local features by using a CNN network:

extracting local features of the text feature vector obtained in the step 1, extracting local semantic features of the text vector output by the Bert model by using a convolutional neural network CNN, capturing n-gram features with different lengths by convolution kernels with different sizes in the convolutional neural network CNN, and mining multi-level emotion information in an input text; the convolutional neural network CNN uses a filter with the sliding window size of (3, 4, 5) to extract local features of the text vector, and then sends the obtained features into a maximum pooling layer for dimension reduction processing to obtain local feature vectors;

step 4, fusing the extracted characteristics of the BiGRU and CNN networks:

splicing the global feature vector and the local feature vector obtained in the step (2) and the step (3) through a full connection layer, and outputting the feature vector obtained by splicing to a classifier; the full connection layer is used for splicing the feature vectors extracted by the BiGRU network and the CNN network and fully learning the features of input data; a dropout mechanism is used in the full-connection layer, and a certain proportion of neurons stop working randomly in the forward propagation process of model training so as to relieve the complex co-adaptation relationship among the neurons;

step 5, using Softmax function to obtain emotion analysis results: and (3) outputting the feature vectors obtained after the splicing in the step (4) to a classifier, and finally obtaining the classification probability of the emotion type.

2. The method for emotion analysis of dialog text based on the Bert model and the two-channel model according to claim 1, wherein the Bert-base model is used in step 1, the number of encoder layers is 12, the hidden layer dimension is 768, the number of attention heads is 12, and the total parameter is 110M.

3. The method for emotion analysis of dialog text based on Bert model and dual-channel model as set forth in claim 1, wherein in the GRU network of step 2, x _t For the input of the Bert model at the time t, h _t And h _t-1 Hidden layer states at time t and time t-1 respectively; setting corresponding weight matrix W for a reset gate and an update gate in the Bert model respectively _r And W is _z After the data information passes through the two gating units, the candidate state at the current t moment is calculatedAnd hidden state h _t In calculation +.>Is->Is a weight matrix of (2); the calculation formulas are respectively as follows:

r _t ＝sigmoid(W _r ·[h _t-1 ,x _t ])

z _t ＝sigmoid(W _z ·[h _t-1 ,x _t ])

and->Respectively representing hidden states of the forward GRU and the backward GRU, and outputting h by the model _t Is formed by forward and reverse +.>And->The spliced concrete formula is as follows:

。

4. the dialogue text emotion analysis method based on the Bert model and the two-channel model according to claim 1, wherein the input feature matrix size of the convolution network in the step 3 is n×k, and the sliding window matrix size is m×k, and the calculation formula of the sliding window convolution operation is as follows;

c _i ＝f(w·x _i:i+m-1 +b)

in the formula, x in m rows of input features sampled by a sliding window _i:i+m-1 The dimension of the parameter weight w is m x k, b is a bias value, and f is a nonlinear activation function; when the size of the sliding window matrix is 2*k, n-1 convolution operation results are obtained, and after the results are spliced, an n-1-dimensional feature vector c= (c) is finally output ₁ ,c ₂ ,c ₃ ,...,c _n-1 )。

5. The dialogue text emotion analysis method based on Bert model and dual-channel model as claimed in claim 1In the step 5, selecting a Softmax function as a classifier, calculating the probability p that the text belongs to different emotion categories, selecting the emotion category with the highest probability as a classification result, and comparing the emotion category with a real emotion label to calculate the classification accuracy; the mathematical formula of the Softmax function is shown as follows, wherein C is the number of neurons of the output layer, z is a C-dimensional vector, and represents the output of the upper layer which is not processed by the Softmax function, and p ⁽ⁱ⁾ Representing the probability of class i; in particular, the method comprises the steps of,

。

6. the method for emotion analysis of dialog text based on the Bert model and the two-channel model according to claim 1, wherein the NSP task helps the model to understand the relationship between two sentences, i.e., predict whether two sentences are linked together, the MLM task is why the Bert model is not limited by the one-way language model, i.e., by randomly masking words in the sentences, and then predicting original words in the masking position using the remaining words and context; the two training tasks help learn the syntactic and semantic information in sentences, understand the text language structure and the context, and solve the defects of static word vectors.

7. The method for analyzing emotion of dialog text based on the Bert model and the two-channel model according to claim 1, wherein three indexes of accuracy, recall and F1 value are used for comprehensively judging effects, and accuracy: counting the proportion of samples with the actual category of positive examples in the samples with all the prediction results of positive examples; the calculation formula is shown as follows;

recall: counting the proportion of samples with the actual categories of positive examples and the predicted results of positive examples; the calculation formula is shown as follows;

f1 value F1-score: taking the blended average value of the two values to form an evaluation index F1 value, wherein under the condition that the accuracy rate and the recall rate are both high, the higher the F1 value is, the better the performance of the representation model is; the calculation formula is as follows:

。