CN115392232A

CN115392232A - Topic and multi-mode fused emergency emotion analysis method

Info

Publication number: CN115392232A
Application number: CN202211003522.8A
Authority: CN
Inventors: 苏依拉; 郭晨雨; 仁庆道尔吉; 吉亚图
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2022-11-25

Abstract

A topic and multi-mode fused emergent event sentiment analysis method is characterized in that external knowledge is fused in neural topic modeling, a neural topic model obtained through modeling is pre-trained on a large corpus, then fine adjustment is carried out on a target data set, then sentiment distribution of topics is calculated by utilizing a sentiment dictionary, and further the sentiment tendency of each comment in the data set is obtained; based on the neural topic model, the comment to be analyzed is taken as input to obtain the sentiment value M of the comment to be analyzed ₁ (ii) a For theAnd performing text and picture correlation analysis to obtain a text correlation coefficient and a picture correlation coefficient to obtain a text emotion value and a picture emotion value, performing weighted average operation according to the text correlation coefficient to obtain a text emotion value M and a picture emotion value M ₂ (ii) a Will M ₁ And M ₂ And carrying out fusion on the model result level to obtain the final emotion value M of the comment. According to the method and the device, sentiment analysis can be more accurately carried out on comment contents in similar websites or platforms such as microblogs.

Description

Topic and multi-mode fused emergency emotion analysis method

Technical Field

The invention belongs to the technical field of artificial intelligence, relates to emotion analysis in network events, and particularly relates to an emergent event emotion analysis method integrating themes and multiple modes.

Background

Emotion analysis is a task in the field of natural language processing, also called tendency analysis, opinion extraction, opinion mining, emotion mining, subjective analysis and the like, and is a process of analyzing, processing, inducing and reasoning subjective texts with emotion colors.

With the rapid development of networks, social networks become a main platform for network public opinion propagation, and microblogs and other similar websites or platforms serve as important media for network public opinion propagation, so that users can express opinions anytime and anywhere. Unlike traditional text data, the comment data is redundant and contains texts, pictures, videos and a large amount of special information such as website or platform specific emoticons, and meanwhile, text emotions are closely related to discussion topics, which causes great difficulty in emotion analysis of comment content in similar websites or platforms such as microblogs.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide an emergent event emotion analysis method combining themes and multiple modes, so that the comment content in similar websites or platforms such as microblogs can be more accurately analyzed.

In order to achieve the purpose, the invention adopts the technical scheme that:

a topic and multi-mode combined emergent event emotion analysis method comprises the following steps:

step 1, external knowledge is blended in neural topic modeling, and neural topic models obtained through modeling are pre-processed on a large corpusTraining, then carrying out fine adjustment on a target data set, and then calculating topic emotion distribution by using an emotion dictionary to further obtain the emotional tendency of each comment in the data set; the data in the large corpus and the target data set comprise texts and pictures; based on the neural topic model, the comment to be analyzed is taken as input to obtain the sentiment value M of the comment ₁ ；

Step 2, performing text and picture correlation analysis on the comment to be analyzed to obtain a picture and text correlation coefficient mu;

step 3, extracting emotion characteristics of the text and the picture by adopting a method of fusing BiLSTM, textCNN and attention mechanism to obtain text emotion value

And picture emotion value

According to the graphic correlation coefficient mu pair

And

carrying out weighted average operation to obtain the image-text emotion value

Step 4, the emotion value M ₁ And an emotion value M ₂ And carrying out fusion on the model result level to obtain the final emotion value M of the comment.

In one embodiment, in step 1, the external knowledge is knowledge which is learned about a subject and can be reused in fine tuning on the target data set when the neural subject model is pre-trained; is merged by the neural topic model through pre-training.

In one embodiment, the neural topic model adopts an encoder-decoder architecture, a BoW model is adopted to process texts in a data set to obtain x, and an encoder processes x e R ^v To input, dataThe topic distribution of the concentrated text is t epsilon R ^k Wherein, v is vocabulary, k is the subject number of the subject distribution t; the decoder reconstructs the original document; the encoder is a stack consisting of N +1 MLP layers, from bottom to top, the first N layers have the same structure, each layer has four sublayers of Dropout, linear, batchNorm and LeakyReLU, the last layer is a Dropout sublayer and a Linear transform, and then a Softmax layer, and the decoder and the encoder have the same architecture.

In one embodiment, the encoder receives x ∈ R ^v As input, and deducing the topic distribution t epsilon R thereof ^k Then, the decoder reconstructs the original document from t, and in the process, the exit probability of each layer of the encoder and the negative slope of the LeakyReLU sublayer are set to obtain the reconstruction loss: l. the _rec (x,t)＝-E(xlogt)；

Wherein t and x have the same size m, and the topic distribution obtained by the neural topic model is adjusted by minimizing the maximum mean difference between dirichlet distribution P, and the formula is as follows:

the overall training goals are: l = l _rec (x,t)+r·λ·l _MMD (t,t′)

t' is the subject distribution randomly extracted from P, k () is the information diffusion function, i, j take on values from 1 to m;

r is a hyperparameter for balancing l _rec And l _MMD ，

Normalization is performed using a two-norm, b (N + 1) is the deviation term before the Softmax sublayer of the encoder,

is the derivation operator.

In one embodiment, the large corpus is a DBPedia data set, and the target data set is a data set of netizen emotion recognition games during epidemic situations of CCIR 2020; the emotion dictionary is an emotion polarity dictionary of Taiwan university; training the neural topic model on the DBPedia data set once to finish pre-training; then, fine tuning is done on the data set of the netizen emotion recognition game during the epidemic situation of CCIR 2020.

In one embodiment, the fine tuning starts from a pre-trained model, parameters are randomly re-initialized at the last layer of an encoder and the first layer of the decoder, emotion values of all themes are obtained according to an emotion dictionary, and then a whole comment emotion value M based on the themes is obtained ₁ 。

In one embodiment, in step 2, a fusion method of BilSTM and attention mechanism is used for text and picture correlation analysis, and the method comprises the following steps:

firstly, processing a text and an image, converting the text into a text matrix by adopting a Glove method, extracting an image label by utilizing a tool provided by a Vision Platform in a Google Cloud Platform, and expressing the image label into a word matrix the same as the text;

then using two independent BilSTMs to respectively receive the picture label and the text label, and representing the picture and the text as vectors with the same dimensionality through the BilSTM;

finally, feature splicing is carried out on the image vector and the text vector to serve as input of a full connection layer, and finally the image-text correlation coefficient mu is output through a softmax layer.

In one embodiment, in step 3, based on a text classification method of a BilSTM-Attention-textCNN hybrid neural network, a text in a comment to be analyzed is mapped into a vector through a word embedding layer, and a BilSTM network is used to learn an upper expression and a lower expression of a word in the comment to be analyzed, so as to obtain a semantic vector of a deeper current word; establishing an attention model, and calculating the probability weight of each word vector to enable words with larger weights to get more attention, wherein the words with more attention are often key words for classification tasks; connecting the vectors output by the attention mechanism with a pooling layer, performing k-max pooling, and reserving the first k words with larger weights; connecting textCNN network extraction feature output text emotion value

The method comprises the steps of extracting labels from pictures in comments to be analyzed and expressing the labels into matrixes which are the same as texts, extracting picture characteristics by using the BilSTM, establishing an Attention mechanism to select the characteristics, connecting vectors output by the Attention mechanism to a pooling layer, executing k-max pooling, and reserving the first k words with larger weights; connecting CNN network to extract characteristic output picture emotion value

In one embodiment, in step 4, the fused calculation formula of the final emotion value M is as follows:

the specific gravity delta is determined by adjusting parameters in the model training process.

Since the network comments often contain specific information such as texts and pictures, and meanwhile, the comment sentiments are closely related to the discussion subjects. Therefore, compared with the existing emotion analysis method, the emotion analysis method provided by the invention analyzes emotion through the topic and multi-mode fusion analysis method, and is easier to deal with the emotion analysis of the network comment data.

Drawings

FIG. 1 is an architectural diagram of a neural topic model.

Fig. 2 is a diagram showing a graph-text correlation structure.

FIG. 3 is a schematic diagram of text emotion extraction.

Fig. 4 is a schematic diagram of emotional feature extraction.

FIG. 5 is a schematic diagram of a comment emotion polarity acquisition structure.

FIG. 6 is a drawing of a text review in an embodiment of the present invention.

FIG. 7 is a diagram of a second match of text comments in the embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

The invention relates to an emergent event emotion analysis method integrating a theme and multiple modes, which combines the theme and the multiple modes to carry out emotional analysis on an emergent event.

In the invention, the multi-mode refers to text and pictures, and obviously has themes around emergencies.

The implementation of the invention comprises the following steps:

step 1, external knowledge is blended in neural topic modeling, a neural topic model obtained through modeling is pre-trained on a large corpus, then fine tuning is carried out on a target data set, then topic emotion distribution is calculated by using an emotion dictionary, and then the emotional tendency of each comment in the data set is obtained. Based on the neural topic model, the comment to be analyzed is taken as input to obtain the sentiment value M of the comment to be analyzed ₁ ；

In the invention, the external knowledge is learned and related to a theme when the neural theme model is pre-trained, and can be repeatedly used when fine tuning is carried out on a target data set, and the external knowledge is merged by the neural theme model through pre-training.

The large corpus used in the method is a DBPedia data set, the target data set is a data set of netizen emotion recognition matches in the epidemic situation period of CCIR2020, the target data set is microblog comment data related to the epidemic situation, and emotion analysis is conducted in the embodiment, namely emotion polarity of each comment is analyzed, namely negative or positive. It should be noted that the data in the large corpus and the target dataset both include text, pictures, expressions, and the like. The invention uses only text and pictures, wherein only text is used in step 1. The emotion dictionary used by the invention is an emotion polarity dictionary of Taiwan university, and the neural topic model is trained on a DBPedia data set once to complete pre-training; the fine-tuning is then done on the target data set.

Referring to FIG. 1, the neural topic model established by the present invention adopts encoder-decoder architecture, firstly adopts BThe oW model processes texts in a data set (DBPedia data set and target data set) to obtain x, and the topic distribution of the texts in the data set is t e R ^k Where v is the vocabulary, k is the topic number of the topic distribution t, the encoder is a stack of N +1 MLP layers, from bottom to top, the first N layers have the same structure, each layer has four sublayers, namely, dryout sublayer, linear sublayer, batchNorm sublayer and LeakyReLU sublayer, the last layer is a Dropout sublayer and a Linear transform, and then a Softmax layer. The decoder reconstructs the original document, which has the same architecture as the encoder.

Specifically, the encoder receives x ∈ R ^v As input, and deducing the topic distribution t epsilon R ^k Then, the decoder reconstructs the original document from t, and in the process, the exit probability of each layer of the encoder and the negative slope of the LeakyReLU sublayer are set to obtain the reconstruction loss: l. the _rec (x,t)＝-E(xlogt)；

Wherein t and x have the same size m, the topic distribution obtained by the neural topic model adjusts the topic by minimizing the maximum mean difference between the Dirichlet distribution P, and the formula is as follows:

the overall training objectives are: l = l _rec (x,t)+r·λ·l _MMD (t,t′)

t' is the subject distribution randomly extracted from P, k () is the information diffusion function, i, j take on values from 1 to m; r is a hyperparameter for balancing l _rec And l _MMD ，

is the derivation operator.

The fine-tuning in this step starts with the pre-trained neural topic model, but in the last layer of the encoderAnd a decoder first layer randomly reinitializes the parameters, obtains the emotion value of each theme according to the emotion dictionary, and further obtains the emotion value M of the whole comment based on the theme ₁ 。

And 2, performing text and picture correlation analysis on a certain comment to be analyzed to obtain a picture and text correlation coefficient mu.

The structure of the teletext relevance is shown in fig. 2, which can be expressed as relevance and irrelevance, if not, only emotion analysis of the text is performed, and if relevant, comprehensive analysis is performed.

In the step, a BilSTM and attention mechanism fusion method is adopted to carry out text and picture correlation analysis, and the method comprises the following steps:

firstly, processing a text and an image, converting the text into a text matrix by adopting a Glove method, extracting an image label by utilizing a tool provided by a Vision Platform in Google Cloud Platform, and expressing the image label into a word matrix the same as the text.

Then two independent BilSTMs are used for respectively receiving the picture labels and the text labels, and the pictures and the texts are represented as vectors with the same dimensionality through the BilSTMs.

Finally, feature splicing is carried out on the picture vectors and the text vectors, the feature splicing is used as input of a full connection layer (namely the full connection layer formed by splicing the text and the picture feature vectors), and finally, the picture and text correlation coefficient mu is output through the softmax layer.

And picture emotion value

According to the image-text correlation mu pair

And

carrying out weighted average operation to obtain the image-text sentiment value M ₂ 。

And (3) correlation:

not relevant:

in the step, the Attention Mechanism (Attention Mechanism) is utilized to automatically learn and calculate the contribution of the input data to the output data, so that the extraction of the emotional characteristics is more reasonable.

Specifically, referring to fig. 3, based on the text classification method of the BilSTM-Attention-textCNN hybrid neural network, the text in the comment to be analyzed is mapped into a vector through a word embedding layer, and the BilSTM network is used to learn the upper expression and the lower expression of the word in the comment to be analyzed, so as to obtain a semantic vector of the current word with a deeper level; establishing an attention model, and calculating the probability weight of each word vector to enable words with larger weights to get more attention, wherein the words with more attention are often key words for classification tasks; connecting vectors output by the attention mechanism with a pooling layer, performing k-max pooling, and reserving the former k words with larger weights; connecting textCNN network extraction feature output text emotion value

Referring to fig. 4, in the image classification method based on the BiLSTM-Attention-CNN hybrid neural network, extracting tags from images in a comment to be analyzed and representing the tags into a matrix the same as a text, extracting image features using the BiLSTM, establishing an Attention mechanism to select the features, connecting vectors output by the Attention mechanism to a pooling layer, performing k-max pooling, and reserving the first k words with higher weights; connecting CNN network to extract characteristic output picture emotion value

In the step 4, the step of the method,referring to FIG. 5, the emotion value M ₁ And an emotion value M ₂ And (3) carrying out fusion on the model result level to obtain the final sentiment value M of the comment, wherein the fusion calculation formula is as follows:

In one particular embodiment of the invention, the text comment data is intercepted as follows:

4456427143652010,01, 23/02/3, 21, mimiko sweet heart, no. 1/2, first small fever hoped for this year to be a safe, healthy, and happy 2 shijiazhuang, hebei science and technology university?

Fig. 6 and 7 are diagrams of the text review.

And (4) the text of the target data set is processed by a neural topic model and an emotion dictionary to obtain the emotion value of the topic.

The comment accumulates the sentiment value of the theme according to the theme and then averages, and the sentiment value of the theme contained in the comment is m ₁ ,m ₂ ,…,m _n Then the sentiment value M derived from the subject ₁ ＝(m ₁ +m ₂ +…+m _n )/n。

Inputting the data into the correlation model shown in FIG. 2 to obtain the correlation coefficient μ between the text and the picture in the comment, and inputting the data into the text picture emotion value extraction model shown in FIG. 3 and FIG. 4 to obtain the emotion value between the text and the picture

And

the emotion value obtained from the image and text is:

will feel the value M ₁ And an emotion value M ₂ The final sentiment value M of the comment is calculated according to the weight delta, and then the comment is obtainedThe sentiment value of (A) is:

m >0 aggressive; m =0 neutral; m <0 negative.

Claims

1. A topic and multi-mode combined emergent event emotion analysis method is characterized by comprising the following steps:

step 1, external knowledge is blended in neural topic modeling, a neural topic model obtained through modeling is pre-trained on a large corpus, then fine tuning is carried out on a target data set, then topic emotion distribution is calculated by using an emotion dictionary, and then emotion tendency of each comment in the data set is obtained; the data in the large corpus and the target data set comprise texts and pictures; based on the neural topic model, the comment to be analyzed is taken as input to obtain the sentiment value M of the comment to be analyzed ₁ ；

Step 2, performing text and picture correlation analysis on the comment to be analyzed to obtain a picture-text correlation coefficient mu;

And picture emotion value

According to the graphic correlation coefficient mu pair

And

carrying out weighted average operation to obtain the image-text emotion value

Step 4, the emotion value M ₁ And an emotion value M ₂ And (5) carrying out fusion on the model result layer to obtain the final emotion value M of the comment.

2. The method for analyzing emergent events and emotions integrating topics and multimodalities according to claim 1, wherein in the step 1, the external knowledge is knowledge which is learned and related to topics and can be reused when fine-tuning on the target data set when the neural topic model is pre-trained; is merged by the neural topic model through pre-training.

3. The method for analyzing emergent events and emotions integrating themes and multimodal as claimed in claim 1, wherein the neural theme model adopts a coder-decoder architecture, a BoW model is adopted to process texts in a data set to obtain x, and a coder processes x e R ^v For input, the subject distribution of the text in the data set is t epsilon R ^k Wherein v is vocabulary, k is the topic number of the topic distribution t; the decoder reconstructs the original document; the encoder is a stack consisting of N +1 MLP layers, from bottom to top, the first N layers have the same structure, each layer has four sublayers, namely, dropout, linear, batchNorm, and LeakyReLU, the last layer is a Dropout sublayer and a Linear transform, and then a Softmax layer, and the decoder and the encoder have the same architecture.

4. The method for topic-fusion and multi-modal analysis of emotion from incident events according to claim 3, wherein said encoder receives x e R ^v As input, and deducing the topic distribution t epsilon R thereof ^k Then, the decoder reconstructs the original document from t, and in the process, the exit probability of each layer of the encoder and the negative slope of the LeakyReLU sublayer are set to obtain the reconstruction loss: l. the _rec (x,t)＝-E(xlogt)；

the overall training goals are: l = l _rec (x,t)+r·λ·l _MMD (t,t′)

r is a hyperparameter for balancing l _rec And l _MMD ，

is the derivation operator.

5. The topic-fused and multi-modal emergent event emotion analysis method according to claim 1, wherein the large corpus is a DBPedia dataset, and the target dataset is a dataset of netizen emotion recognition match during epidemic situations of CCIR 2020; the emotion dictionary is an emotion polarity dictionary of Taiwan university; training the neural topic model on the DBPedia data set once to finish pre-training; then, fine tuning is done on the data set of the netizen emotion recognition game during the epidemic situation of CCIR 2020.

6. The method for analyzing emergent events and emotions integrating topics and multiple modalities according to claim 1 or 5, wherein the fine tuning starts from a pre-trained model, the parameters are randomly re-initialized at the last layer of an encoder and the first layer of the decoder, the emotion value of each topic is obtained according to an emotion dictionary, and then the emotion value M of the whole comment based on the topic is obtained ₁ 。

7. The method for analyzing emotion of emergency event combining topic and multi-modality as claimed in claim 1, wherein in the step 2, the text and picture correlation analysis is performed by using a fusion method of BilSTM and attention mechanism, and the method comprises the following steps:

firstly, processing a text and a picture, converting the text into a text matrix by adopting a Glove method, extracting a picture label by utilizing a tool provided by a Vision Platform in a Google Cloud Platform, and expressing the picture label into a word matrix which is the same as the text;

8. The method for analyzing emergent events and emotions integrating topics and multimodalities according to claim 1, wherein in the step 3, based on a text classification method of a BilSTM-Attention-textCNN hybrid neural network, the text in the comment to be analyzed is mapped into a vector through a word embedding layer, and the BilSTM network is used for learning the upper expression and the lower expression of the word in the comment to be analyzed, so as to obtain a semantic vector of the current word at a deeper level; establishing an attention model, and calculating the probability weight of each word vector to enable words with larger weights to get more attention, wherein the words with more attention are often key words for classification tasks; connecting the vectors output by the attention mechanism with a pooling layer, performing k-max pooling, and reserving the first k words with larger weights; connecting textCNN network extraction feature output text emotion value

A picture classification method based on a BilSTM-Attention-CNN hybrid neural network comprises the steps of extracting labels from pictures in comments to be analyzed, representing the labels into matrixes the same as texts, extracting picture features by using the BilSTM, establishing an Attention mechanism to select the features, connecting vectors output by the Attention mechanism to a pooling layer, performing k-max pooling, and preservingLeaving the first k words with larger weight; connecting CNN network to extract characteristic output picture emotion value

9. The method for analyzing emergent events and emotions combining topic and multi-modality according to claim 1, wherein in the step 4, the final emotion value M is combined as follows: