CN114564565A

CN114564565A - Deep semantic recognition model for public safety event analysis and construction method thereof

Info

Publication number: CN114564565A
Application number: CN202210203781.9A
Authority: CN
Inventors: 游兰; 彭庆喜; 金红
Original assignee: Hubei University
Current assignee: Hubei University
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-05-31

Abstract

The invention belongs to the technical field of natural language processing emotion analysis, and discloses a deep semantic recognition model for public security event analysis and a construction method thereof, wherein context semantic feature representation of a comment text is obtained through a BERT pre-training model, and a deep nonlinear feature vector is extracted by combining a bidirectional GRU (generalized regression) to realize the optimal effect under a single model; training a plurality of emotion classifiers which are excellent in performance and have differentiation based on a BERT series pre-training model; and fully fusing deep features of each model by using an integrated learning method of data disturbance and voting strategies. The method fully utilizes the difference between the models, adopts an integrated learning and voting strategy to fuse a plurality of models, and trains a stable emotion classification model with balanced performance in all aspects; experimental results show that the BERT-BiGRU model has better emotion recognition effect on two public data sets compared with other traditional models.

Description

Deep semantic recognition model for public safety event analysis and construction method thereof

Technical Field

The invention belongs to the technical field of natural language processing emotion analysis, and particularly relates to a deep semantic recognition model for public security event analysis and a construction method thereof.

Background

Currently, emotion recognition is one of the key technologies of artificial intelligence, which is to sense and understand human emotion intentions expressed by media such as texts and images from the machine perspective. Currently, social networking sites are important platforms for people to pay attention to events and share personal opinions, and a huge amount of unstructured text comments are generated every day, and the comments usually carry subjective emotional intentions of publishers. The emotion recognition of social comments has very important significance for public opinion management and control, commercial marketing, social governance and the like, and is one of research hotspots in the field of natural language processing in recent years.

The social network text has typical characteristics of rich emotion semantics and different text lengths, and the key problem that how to judge the emotion polarity from the text with different lengths is urgently needed to be solved by the current emotion recognition system is. The traditional word2vec or glove word vector pre-training model can learn the context information of words to a certain extent, but the semantic deviation of the model that the same word expresses the same semantic under different contexts exists. For example, "cost performance of this vehicle is really high" and "fuel consumption of this vehicle is really high", where "really high" expresses positive comments in the former and negative comments in the latter, the method is difficult to identify. When a traditional Convolutional Neural Network (CNN) is adopted to perform a text classification task, although local features in word representation can be effectively extracted, semantic relevance among long-distance contexts is ignored.

Nowadays, most models adopt a machine learning or deep learning method to predict text emotion, and great progress is made. However, most models are single models, and because single models have randomness, the models can only perform well in a certain aspect, but the generalization capability is insufficient.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) in a traditional word2vec or glove word vector pre-training model, model semantic deviations of the same word expressing the same semantic meaning under different contexts exist, and the model is difficult to identify.

(2) When a traditional Convolutional Neural Network (CNN) is adopted to perform a text classification task, semantic relevance between long-distance contexts is ignored.

(3) Most of the existing models are single models, and the single models have randomness, so that the single models can only perform well in a certain aspect, but have insufficient generalization capability.

The difficulty in solving the above problems and defects is:

the accuracy of traditional machine learning is improved compared with that of an emotion dictionary method, but the method needs high-quality feature construction and professional field knowledge, has no good generalization capability, is difficult to judge which information is discarded or retained when capturing semantic relevance between long-distance contexts, and a single model is usually good only for a certain specific field, so that the generalization capability of the method is improved with great difficulty.

The significance of solving the problems and the defects is as follows:

deep semantic information among text comments can be mined aiming at a specific task to obtain the optimal emotion recognition effect under a single model, the aim of multiple application of the single model can be achieved, and the integrated model can obtain the optimal prediction result and generalization performance.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a deep semantic recognition model for public security event analysis and a construction method thereof, aiming at solving the problems of insufficient extraction of deep emotion semantic features of texts, limited generalization capability and the like in the prior art.

The invention is realized in such a way that a construction method of a deep semantic recognition model for public safety event analysis comprises the following steps:

obtaining context semantic feature representation of the comment text through a BERT pre-training model, and extracting deep nonlinear feature vectors by combining a bidirectional GRU (generalized regression Unit), so as to realize the optimal effect under a single model; training a plurality of emotion classifiers which are excellent in performance and have differentiation based on a BERT series pre-training model; and fully fusing deep features of each model by using an integrated learning method of data disturbance and voting strategies.

Further, the method for constructing the deep semantic recognition model for public safety event analysis comprises the following steps:

firstly, preprocessing an original data set and removing noise data;

step two, constructing a single emotion recognition model, splicing the BERT pre-training language model and the BiGRU to obtain a text classification model, and obtaining a classification result by using the text classification model;

and step three, constructing an integrated emotion recognition model, obtaining a plurality of emotion-based classifiers respectively through a data disturbance mode and a BERT series pre-training model, and voting the classification result obtained in the step two and the result output by the integration module.

Further, the BERT model in the second step adopts a Transformer encoder as a main body model structure, mines the relation between words based on an attention mechanism, and is used for parallel training and considering global information.

In a text classification task, text is usually represented by word vectors, and a BERT model adds position vectors on the basis of using the word vectors and segment vectors, and stores word sequence information in a position embedding manner, so that characters or words at different positions are added with different vectors to indicate distinction:

where pos represents a position index, d_modelFor the word vector dimension, PE represents the corresponding position code at pos position, generated by sin and cos functions, and then word vectors corresponding to the positionsAdding; beginning of sentence Using [ CLS]Mark, sentence separation and end use [ SEP]And (4) marking.

After obtaining the input representation of the sentence, BERT is jointly trained using the masking language model MLM and the next sentence prediction NSP. MLM refers to masking words in the text randomly with [ MASK ] to allow the model to predict. NSP refers to randomly selecting two sentences from a corpus to be spliced, and predicting whether the two sentences come from the same text or not.

The core of the BERT model is an encoder adopting a Transformer model, multi-head attention is an important component of the Transformer, an attention mechanism takes the similarity of a target word Query and a Key of a top word and a bottom word thereof as weight, and values of the top word and the bottom word are blended into the Query of the target word to obtain enhanced semantic vector representation of the target word; and (3) projecting Q (query), K (Key), V (value) through multiple linear transformations, and finally splicing different attention results to form multi-head attention so that the model learns related information in different expression sub-spaces, thereby obtaining the enhanced semantic vectors of words in different semantic spaces.

MultiHead(Q，K，V)＝Concat(head₁，...，head_k)W^O；

A residual error and standardization module is added into the transform coder and connected behind each submodule of the Encoder end and the Decoder end; the residual error is used for solving the problem of multi-layer network training, and the network only focuses on the part of the current difference, so that the network is prevented from degrading and convergence is accelerated. Normalization refers to Layer Normalization, which is used for normalizing the activation value of each Layer; alpha and beta are training parameters, and mu and sigma represent deviation and variance; and (3) performing linear conversion and a ReLu activation function through a feedforward neural network to obtain the output of the encoder, wherein the output is shown as a formula:

FNN＝max(0，xW₁+b₁)W₂+b₂。

further, the semantic representation obtained by the BERT model is used as the input of the bidirectional GRU model. In a bidirectional GRU, each GRU unit contains two gate structures, an update gate, a reset gate, denoted r respectively_t，z_tSo as to maintain and update the state information and to transmit it. The update gate functions like the forget gate and the input gate of the LSTM, determining the extent to which the state information at the previous time is brought into the current state. To the extent that the state information at the last time in the reset gate control is ignored, a smaller value of the reset gate represents more ignorance. The time-sequence problem is processed through a bidirectional GRU model, information of the whole text sequence is utilized, including mutual relation information among all words, and the information is used for processing each word.

Further, the output of the BERT is passed through the forward direction

Acquiring complete context information in all past time sequences and then reversing

Obtaining the complete context of all future time sequences, and calculating the formula as follows:

wherein w is the weight connecting the two layers, b is the offset vector, f is the activation function,

and

positive and negative GRUs, respectivelyAnd (6) outputting.

Bidirectional GRU represents d to input on hidden layer of positive and negative directions_nCalculating to obtain d_nHidden state h_tAnd adopting a splicing strategy for the positive GRU and the negative GRU:

obtaining the hidden state of the bidirectional GRU, wherein the calculation formula is as follows:

H＝(h₁,h₁,...,h_d}；

the global average pooling is used for replacing a full-connection layer, parameters do not exist in the global average pooling layer, global information is integrated, output multidimensional features are subjected to global average pooling to obtain one-dimensional feature vectors, the one-dimensional feature vectors are sent to a softmax function to obtain emotion categories of comment texts, and the emotion categories are shown in a formula:

TEXT_C＝softmax(W_t·H+b_t)；

wherein, W_tWeight parameter representing the global average pooling layer, b_tAnd representing the offset value, and finally obtaining the output TEXT _ C of the BERT-BiGRU model.

Further, in the third step, after a plurality of base classifiers with differences are obtained, the results of the base classifiers are fused by combining strategies, so that the model prediction effect after ensemble learning is the best. Taking a plurality of BERT, BERT-BilSTM and BERT-BiGRU models as base classifiers, counting output classes of all the classifiers, and performing decision making by adopting a majority voting strategy on the basis of generating class probability distribution of emotion recognition. Given the same weight of the classification results of all the individual classifiers, each base classifier can only cast one vote, the minority-obeying majority principle is adopted, the class with the highest vote number is taken as the final prediction result of the comment, and the voting formula is as follows:

wherein n andt represents the number of classes of emotion classification and the number of base classifiers, respectively, C_i，jThe prediction category of the base classifier i in the test set x is represented as j,

and representing the total ticket number of the prediction category results of j of the test set x in all the base classifiers, and taking the category with the largest ticket number as the final category result of the prediction sample x.

The invention also aims to provide a public safety event analysis-oriented deep semantic recognition model which is constructed by applying the public safety event analysis-oriented deep semantic recognition model.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide an information data processing terminal, which is used for implementing the deep semantic recognition model facing public safety event analysis.

By combining all the technical schemes, the invention has the advantages and positive effects that: the deep semantic recognition model for public security event analysis uses a transform-based Bidirectional coding representation model (BERT) to dynamically adjust word vector semantic features and adopts a Bidirectional Gated recursive Unit (BiGRU) to perform semantic coding to enhance semantic expression of a text, so as to mine deeper long-distance context emotion semantic information in a network text. The invention fully utilizes the difference among the models, researches and adopts an integrated learning and voting strategy to fuse a plurality of models, and aims to train a stable emotion classification model with balanced performances in all aspects. Experimental results show that the BERT-BiGRU model has superior emotion recognition on both public data sets (COV19 and ChnSenti) compared to other traditional models.

According to the method, a BERT pre-training model is adopted to replace a word embedding layer of a traditional model, implicit semantic word vector representation of a comment text is obtained, deep semantic features of context are extracted through a bidirectional GRU, and the extraction capability of the model on comment text sentiment semantics is improved; through the integrated learning of data disturbance and voting strategies, a plurality of excellent and differential emotion recognition models are fused to obtain a stable and balanced emotion classifier, so that the generalization capability of the models is improved; a plurality of groups of comparison experiments are carried out on a public data set, and the result shows that the deep semantic recognition model for public security event analysis, which is provided by the invention, can effectively recognize emotion tendencies and has a better emotion classification effect.

The invention provides a deep emotion semantic recognition model based on BERT-BiGRU multi-mode ensemble learning, which surrounds the emotion recognition research hotspot of social network texts. Firstly, the model adopts a BERT pre-training model to replace a word embedding layer of the traditional model, implicit semantic word vector representation of a comment text is obtained, and then deep semantic features of the context are extracted through a bidirectional GRU, so that the problem that polysemous words and deep emotional semantics are poor in extraction capability under different contexts in the traditional language model is solved, and the optimal emotion recognition effect under a single model is realized. In order to improve the generalization capability of the model, the model applies the idea of ensemble learning to emotion recognition on the basis of comprehensively analyzing the variance and deviation of the model, observes the performances of different models on different parameters and data sets, trains the data sets in a cross validation mode, and combines a voting strategy with a base classifier consisting of a plurality of BERT pre-training models to enable the models to have the capability of mutual error correction, thereby obtaining a better integration result. Finally, experiments are respectively designed on the three-classification corpus and the two-classification corpus, and the BERT-BiGRU model is superior to most of the existing emotion recognition models in multiple evaluation indexes.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a representation of BERT inputs provided by an embodiment of the present invention.

Fig. 2 is a diagram of a GRU unit architecture provided by an embodiment of the present invention.

Fig. 3 is a diagram of a BERT-BiGRU model structure according to an embodiment of the present invention.

Fig. 4 is a flowchart of an algorithm provided by an embodiment of the present invention.

FIG. 5 is a model diagram of multimodal fusion emotion and semantic recognition under an ensemble learning framework provided by an embodiment of the present invention.

Fig. 6 is a flowchart of a method for constructing a deep semantic recognition model for public safety event analysis according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In a traditional word2vec or glove word vector pre-training model, model semantic deviations of the same word expressing the same semantic meaning under different contexts exist, and the model is difficult to identify. When a traditional Convolutional Neural Network (CNN) is adopted to perform a text classification task, semantic relevance between long-distance contexts is ignored. Most of the existing models are single models, and the single models have randomness, so that the single models can only perform well in a certain aspect, but have insufficient generalization capability.

Aiming at the problems in the prior art, the invention provides a deep semantic recognition model facing public safety event analysis and a construction method thereof, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 6, the method for constructing a deep semantic recognition model for public safety event analysis according to the embodiment of the present invention includes the following steps:

s101, obtaining context semantic feature representation of the comment text through a BERT pre-training model;

s102, extracting deep nonlinear feature vectors by combining bidirectional GRUs to achieve the optimal effect under a single model;

s103, training based on a BERT series pre-training model to obtain a plurality of emotion classifiers; and fully fusing deep features of each model by using an integrated learning method of data disturbance and voting strategies.

The technical solution of the present invention is further described below with reference to specific examples.

At present, most of the classification models commonly used in the emotion recognition field are shallow structures, complex feature engineering is usually required, the semantic relation among words is ignored, and classification output of shallow features is focused. The invention provides a deep semantic recognition model for public safety event analysis. Firstly, the feature vectors of text comments are extracted through a BERT pre-training language model, the two-way context information of sentences can be effectively captured, word vectors are dynamically adjusted, and the limitation of a traditional language model is avoided. And then, the text feature vector extracted by the BERT is used as the input of a BiGRU network, and the front and back feature information is correlated by superposing GRUs in the positive and negative directions, so that the potential relation among all the emotion features is better mined, and the emotion tendentiousness of the comment text is obtained.

In addition, the single model has limited ability to learn features, which results in a slight deficiency in improving accuracy and generalization ability. Aiming at the problem, the invention constructs a plurality of classification models by utilizing the idea of ensemble learning to solve the problem, and balances the variance and deviation of the models by a data disturbance and voting mode, so that the performance of the integrated classification models is better.

The invention provides an algorithm model process of a deep semantic recognition model for public safety event analysis, which mainly comprises the following three steps. The first step is to preprocess the original data set and remove the noise data; secondly, constructing a single emotion recognition model, and splicing the BERT pre-training language model and the BiGRU to obtain a text classification model which is superior to other single models in performance; and thirdly, in order to enhance the universality of the emotion recognition model, the integrated emotion recognition model is constructed. And (3) obtaining a plurality of emotion-based classifiers by a data disturbance mode and a BERT series pre-training model, and voting the classification result of the step two and the result output by the integration module, so that the overall classification effect and the model generalization capability are improved.

BERT refers to a Chinese BERT pre-training model, and a final model is obtained by large-scale training by using data of Chinese Wikipedia. BERT-wwm-ext uses Chinese Wikipedia data and general (encyclopedia, news, question and answer) data relative to BERT, while increasing the number of training steps. The BERT-BilSTM uses a BERT pre-training model to obtain a feature vector of each text, the feature vector is used as the input of the bidirectional LSTM, and deeper long-distance context emotion semantic information in the network text is mined.

The invention uses a BERT series pre-training language model to obtain semantic representation of input text, wherein the maximum input length of the representation model is obtained. If the text length is smaller than n, filling the semantic representation of the output text smaller than n obtained by the BERT model, and filling a plurality of m-dimensional zero vectors until the length of the output sequence is n; if the text length is larger than n, only the text semantic representation of the first n length is output.

In the embodiment, the deep semantic recognition model facing public safety event analysis is suitable for sentence-level emotion classification. FIG. 1 is a representation of the BERT input used in the present invention.

The BERT model adopts a Transformer encoder as a main model structure, and mines the relation between words based on an attention mechanism, so that the model can be trained in parallel and global information can be considered.

In the text classification task, text is usually represented by Word vectors, and unlike the language models such as Word2Vec and Glove, the BERT model adds position vectors on the basis of using Word vectors and segment vectors, and stores Word sequence information in a position embedding manner, so that different vectors are added to words or words at different positions to distinguish:

where pos represents a position index, d_modelFor the word vector dimension, PE represents the corresponding position code at pos position, generated by sine sin and cosine cos functions, and added to the corresponding position word vector. The input representation is shown in FIG. 1, where the beginning of the sentence uses [ CLS]Mark, sentence separation and end use [ SEP]And (6) marking.

After the input representation of the Sentence is obtained, the BERT proposes a new training method for unsupervised training on a mass corpus, and jointly trains by using a Mask Language Model (MLM) and Next Sentence Prediction (NSP). MLM refers to masking some words in the text randomly with MASK, allowing the model to predict. Compared with the unidirectional prediction of the traditional language model, the MLM task can predict the occluded words from any direction, so that the model learns more word-level domain knowledge. NSP is to randomly select two sentences from a corpus to be spliced, predict whether the two sentences come from the same text or not and further consider the learning of sentence-pair relationship. The combination of the two modes enables the model to recognize noise data and understand deep semantics of sentences more accurately.

The core of the BERT model is an encoder using a transform model, as shown in fig. 2, and a Multi-Head Attention (Multi-Head Attention) is an important component of the transform, firstly, an attribute mechanism takes the similarity between a target word Query and a Key of a top word or a bottom word thereof as a weight, and Value of the top word or the bottom word is merged into the Query of the target word to obtain an enhanced semantic vector representation of the target word. Secondly, in order to obtain the enhanced vector representation under different spaces, projection is carried out on Q (query), K (Key) and V (value) through multiple times of linear transformation, and finally different attention results are spliced to form multi-head attention, so that the model learns related information in different representation sub-spaces, and the enhanced semantic vectors of words under different semantic spaces are obtained.

MultiHead(Q，K，V)＝Concat(head₁，...，head_k)W^O

On top of that, the transform Encoder adds a residual and normalization module, which follows each sub-module at the Encoder side and the Decoder side. The residual is generally used to solve the problem of multi-layer network training, and the network can be made to pay attention to only the part of the current difference. The method is used for preventing network degradation and accelerating convergence. Normalization refers to Layer Normalization, which is used to normalize the activation values for each Layer, as shown in the above equation. α, β are training parameters, μ and σ denote deviation and variance. Finally, the output of the encoder is obtained by performing linear conversion and ReLu activation function on the Feedforward Neural Network (fed Neural Network), as shown in the formula:

FNN＝max(0，xW₁+b₁)W₂+b₂

fig. 2 is a GRU unit architecture diagram.

As shown in fig. 2, in order to further obtain the intrinsic relation between texts, the present invention uses the semantic representation obtained by the BERT model as the input of the bidirectional GRU model. In a bidirectional GRU, each GRU unit contains two gate structures, an update gate, a reset gate, denoted r respectively_t，z_tIn this way, status information is maintained and updated and passed, as shown in fig. 3. The update gate acts like the forget gate and the input gate of the LSTM, which determines the extent to which the state information at the previous time is brought into the current state. To the extent that the state information at the last time in the reset gate control is ignored, a smaller value of the reset gate represents more ignorance. The tensor operation of the GRU is less compared to the LSTM, so the speed is also faster. Therefore, the bidirectional GRU model is commonly used to deal with the problem of time sequence, and can make full use of the information of the whole text sequence, including the information of interrelation between words, and use the information for processing each word.

FIG. 3 is a BERT-BiGRU model structure.

Passing the output of the BERT in the forward direction

To obtain the complete context information in all past time series and then go backward

And acquiring the complete context of all future time sequences, wherein the calculation formula is as follows:

and

positive GRU and negative GRU outputs, respectively.

Bidirectional GRU represents d to input on hidden layer of positive and negative directions_nCalculating to obtain d_nHidden state h_tThe invention adopts a splicing strategy for the positive GRU and the negative GRU:

H＝(h₁,h_i,...,h_d}

and finally, global average pooling is used to replace a full-connection layer, and the global average pooling layer has no parameters, so that overfitting can be avoided. Meanwhile, global information can be integrated, the output multidimensional characteristics are subjected to global average pooling to obtain one-dimensional characteristic vectors, and the one-dimensional characteristic vectors are sent to a softmax function to obtain the emotion categories of the comment texts. As shown in the formula:

TEXT_C＝softmax(W_t·H+b_t)

wherein, W_tWeight parameter representing the global average pooling layer, b_tAnd representing the offset value, and finally obtaining the output TEXT _ C of the BERT-BiGRU model. The model structure is shown in fig. 3.

Fig. 4 is an algorithm flow chart.

As shown in FIG. 4, BERT refers to a Chinese BERT pre-training model, and the final model is obtained by large-scale training using data of Chinese Wikipedia. BERT-wwm-ext uses Chinese Wikipedia data and general (encyclopedia, news, question and answer) data relative to BERT, while increasing the number of training steps. The BERT-BilSTM uses a BERT pre-training model to obtain a feature vector of each text, the feature vector is used as the input of the bidirectional LSTM, and deeper long-distance context emotion semantic information in the network text is mined.

FIG. 5 is a multi-modal fusion emotion semantic recognition model under an ensemble learning framework.

In order to make the models different, the invention adopts different Chinese BERT pre-training models, and the models use a large amount of non-labeled text corpora and are finally trained by continuously improving the hyper-parameters. For example, the BERT-base model uses Chinese Wikipedia data, and BERT-wwm-ext uses Chinese Wikipedia data and general (encyclopedia, news, question and answer) data, and increases the number of training steps. Therefore, the invention uses the BERT pre-training model and two modes of adding the BilSTM and the BiGRU after the BERT pre-training model to obtain different emotion recognition results. In addition, after the sample is analyzed, different training parameters such as learning rate, training batch, text segmentation length and the like are respectively adopted. By the mode of obtaining a plurality of base classifiers through the parallel training, the variance can be effectively reduced, and the overfitting problem is solved.

After the plurality of base classifiers with differences are obtained in the mode, the results of the base classifiers are fused by combining strategies, so that the model prediction effect after ensemble learning is the best. Firstly, taking a plurality of BERT, BERT-BilSTM and BERT-BiGRU models as base classifiers, counting output classes of all the classifiers, and carrying out decision making by adopting a majority voting strategy on the basis of generating class probability distribution of emotion recognition. Given the same weight of the classification results of all the individual classifiers, each base classifier can only cast one vote, a few obedient majority rules are adopted, and the class with the highest final vote number is taken as the final prediction result of the comment, as shown in fig. 5. The voting formula is as follows:

wherein n and T respectively represent emotion marksNumber of classes and base classifiers, C_i，jThe prediction category of the base classifier i in the test set x is represented as j,

The invention adopts two groups of comparison experiments, wherein the first experiment is the comparison of BERT-BiGRU of the invention and a traditional deep learning model, and the BERT-BiGRU comprises models such as classic TextCNN, BiGRU-ATT and BERT.

(1) TextCNN. And (3) encoding the input text by using word2vec, sending the encoded input text into a convolutional neural network, extracting text features by using a plurality of convolutional kernels with different sizes, and finally classifying the text features through a full connection layer.

(2) BiGRU. And (3) using word2vec training word vectors, sending the word2vec training word vectors into a bidirectional GRU, associating front and back feature information through superposition of the GRUs in the positive and reverse directions, excavating potential connections among all the emotion features, and obtaining the emotion tendentiousness of the comment text.

(3) BiGRU-ATT. On the basis of extracting text features by using the bidirectional GRU, an attention mechanism is introduced to capture the degree of contribution of each word in the comment text to the emotion semantics, and weighted calculation is performed to obtain a final classification result.

(4) BERT. And (3) extracting features by using a depth bidirectional Transformer model and utilizing context information of words, and dynamically adjusting word vectors according to the context information at any time to obtain context semantic feature representation of the comment text.

(5) BERT-BiGRU. The method comprises the steps of firstly expressing the implicit semantics of a text through a BERT pre-training model, and then mining deep semantic information among text comments by adopting a bidirectional GRU model which is simpler in structure and higher in operation speed than bidirectional LSTM to obtain a final emotion recognition result.

TABLE 1 results of experiments on COV19 with several models

Table 1 shows the precision, recall and F1-value comparison results of the inventive and comparative models on COV19 data sets. It can be seen that the F1 value of the BERT model reaches 71.7% of classification accuracy, which is respectively improved by 8.3% and 5.6% by comparing 63.4% of the CNN model with 66.1% of the BiGRU model, and the BERT pre-training model is verified to be obviously superior to the traditional word vector-based training model. The word vector model based on word2vec has low indexes and is mainly influenced by the fact that Chinese expresses one word with multiple meanings. It is difficult to understand the meaning of the same word expressed under different semantics, resulting in inaccurate feature extraction. The BERT model can dynamically change word vectors according to the meaning of the context, and can more accurately reflect semantic information of sentences. The addition of multiple attentions in the BERT model is also very critical, and is the attention of input text weight distribution to distinguish the contribution degree of each word in the text to emotion semantics. It can be seen that the BERT model is improved by 5.3% compared with the BiGRU-ATT model in which a single attention mechanism is added to the BiGRU. Under the same word2vec word vector model, the BiGRU model is 2.7% higher than the CNN model, and it can be seen that the effect of extracting text features by using BiGRU is better than that of CNN. The text is natural long-time-sequence information, so that the recurrent neural network is better at capturing long-time-sequence features, while the convolutional neural network is good at learning spatial features, focuses on local features, and is weaker in long-distance modeling capability. Therefore, the best classification effect is obtained by the classification effect of accessing the BiGRU after BERT, which shows that adding the BiGRU into the output of the BERT can extract the emotional characteristics of deeper layers in sentences, thereby improving the classification accuracy.

The result of the first experiment shows that the model can achieve a better classification effect on emotion recognition. However, the research objective of the invention is not only to consider the optimal model effect, but also to ensure that the model has better applicability. In order to verify the universality of the integration idea in the emotion recognition field, in the second experiment, a BERT pre-training model is connected to different networks, and different training batches and training modes are used to achieve the effect of difference by utilizing the difference between training corpora and training steps. Finally, a voting strategy is used to achieve an integration effect, and experiments are respectively carried out on the data sets of emotion two-classification and emotion three-classification. The model is illustrated as follows:

(1) BERT. Extracting text features by using a BERT-base pre-training model, obtaining context semantic feature representation of comment texts, setting training batches to be 32 and 16 respectively, setting maximum text interception lengths to be 128 and 140 respectively, and obtaining a model 1(M1) and a model 2(M2) respectively through 3 rounds of training.

(2) The pre-trained model was trained using BERT-wwm-ext, with a training batch of 32, and a maximum text truncation length of 128, to yield model 3 (M3).

(3) BERT-BilSTM. And (3) obtaining the characteristics of each text by using a BERT-base pre-training model, and mining deeper long-distance context emotion semantic information in the network text through bidirectional LSTM. And performing 5-fold cross validation, setting the training batch as 16, setting the maximum text truncation length as 140, and training to obtain a model 4 (M4).

(4) BERT-BilSTM. The training method is the same as (3), the batch is set to 48, and model 5 is obtained by training (M5).

(5) BERT-BiGRU. And obtaining the characteristics of each text by using a BERT-base pre-training model, performing 5-fold cross validation through the output of a bidirectional GRU, setting the training batches to be 16, 48 and 64 respectively, setting the maximum text interception lengths to be 140, 140 and 128 respectively, and training to obtain a model 6(M6), a model 7(M7) and a model 8 (M8).

Table 2 results of experiments on the basis classifier at COV19

Experiments were conducted to compare the classification effect predicted by each base classifier individually on the COV19 data set. And respectively training different hyper-parameter sets aiming at different base classifiers, and observing the classification effect of the model on the test set so as to select the optimal classifier.

Table 2 shows the prediction results of each base classifier. The same network structure can be seen from the table, and the classification results shown by training with different hyper-parameter sets are very different. For example, the difference between BERT models reaches 1.0%, while BERT-BilSTM and BERT-BiGRU use different hyper-parameters, and the difference is between 0.2% and 0.4%. Compared with the traditional data set single division, the five-fold cross validation is added, the training data set is divided for many times, so that the data sets are not crossed, the contingency caused by randomly dividing the data set is greatly reduced, and the stability of the model is enhanced. Meanwhile, BERT-BilSTM and BERT-BiGRU are found to respectively reach the optimization of the same type model in the batch 48 and the maximum text interception length 140.

In summary, according to two preconditions of ensemble learning, firstly, the classification effect of the base classifier is to achieve a certain superiority, and if the classification error rate of the base classifier is too high, the accuracy of ensemble learning will be reduced; secondly, the base classifiers need to have difference, namely the prediction results need to be diversified, and if the difference is too small, the integration results basically have no change. Therefore, as shown in table 2, the present invention selects M2, M3, M5, and M7 as the base classifier for ensemble learning, and uses the majority voting method to obtain the final classification result for the prediction result of the test set, and the ensemble result is shown in table 3.

TABLE 3 Final integration Experimental results of COV19

The integrated result reaches 73.2%, which is improved by 0.3% compared with the optimal single model BERT-BiGRU, and in order to further verify the generalization capability of the model provided by the invention, the model is tested on the ChnSenti corpus, and the result is shown in Table 4.

TABLE 4 results of the experiment of ensemble learning on ChnSenti

Table 4 shows the classification effect of 4 single models and the integrated models on hotel corpora. As can be seen from the table, the evaluation indexes of BERT-BiGRU are all superior to those of other three models, and are respectively improved by 1.4%, 2.8% and 0.8%. The method proves that the BERT-BiGRU can mine deeper semantic features than other models, and the effectiveness and superiority of the method for accessing the BiGRU model after the BERT output representation are verified. The F1 value of each model is above 0.92, which indicates that each single model is excellent in performance in two categories, so that the difference among the models is not strong enough, and the F1 value after integration is not improved obviously enough. The accuracy of the ensemble learning method used by the invention is only improved by 0.1% compared with the best model BERT-BiGRU, so that a model with excellent performance and larger difference needs to be integrated to obtain an obvious improvement effect.

The BERT-BiGRU model is applied to a food safety big data sentiment analysis system, the system analyzes the changes of the public sentiment heat degree, netizen sentiment polarity and concerned topics of a food safety event from a multi-dimensional angle, displays a risk knowledge map and a food safety full chain track related to the food safety event, obtains the knowledge map and the spot check information of food contained in the food safety event, can accurately analyze the sentiment polarity of netizens across the country through the BERT-BiGRU model, can obtain the text tendency of comments aiming at the comments of each user, visualizes data so as to see the time-space evolution trend of the public sentiment, and assists a decision maker in carrying out public sentiment research and judgment. And carrying out multidimensional statistical analysis on the internet information, calculating public sentiment indexes such as regions, emotions, hot words and the like, and providing support for public sentiment research and judgment.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A construction method of a deep semantic recognition model for public safety event analysis is characterized by comprising the following steps: obtaining context semantic feature representation of the comment text through a BERT pre-training model, extracting deep nonlinear feature vectors by combining a bidirectional GRU, and performing single-model optimization; training a plurality of emotion classifiers which are excellent in performance and have differentiation based on a BERT series pre-training model; and carrying out fusion of deep features of each model by using an integrated learning method of data disturbance and voting strategies.

2. The method for constructing the deep semantic recognition model for public safety event analysis according to claim 1, wherein the method for constructing the deep semantic recognition model for public safety event analysis comprises the following steps:

firstly, preprocessing an original data set to remove noise data;

3. The method for constructing a deep semantic recognition model for public safety event analysis according to claim 2, wherein the BERT model in the second step adopts a Transformer encoder as a main model structure, mines the relation between words based on an attention mechanism, and is used for parallel training and considering global information;

where pos represents a position index, d_modelThe PE represents a corresponding position code at pos position, is generated by sine sin and cosine cos functions and is added with a corresponding position word vector; beginning of sentence Using [ CLS]Mark, sentence separation and end use [ SEP]Marking;

after obtaining the input representation of a sentence, BERT uses a masking language model MLM and the next sentence prediction NSP to jointly train; MLM means that words in a text are randomly shielded by using [ MASK ] to make a model predict; NSP means that two words are randomly selected from a corpus to be spliced, and whether the words come from the same text or not is predicted;

the core of the BERT model is an encoder adopting a Transformer model, multi-head attention is an important component of the Transformer, an attention mechanism takes the similarity of a target word Query and a Key of a top word and a bottom word thereof as weight, and values of the top word and the bottom word are blended into the Query of the target word to obtain enhanced semantic vector representation of the target word; projecting Q (query), K (Key), V (value) through multiple linear transformations, and finally splicing different attention results to form multi-head attention, so that the model learns related information in different expression sub-spaces, and thus, the enhanced semantic vectors of words in different semantic spaces are obtained;

a residual error and standardization module is added into the transform coder and connected behind each submodule of the Encoder end and the Decoder end; the residual error is used for solving the problem of multi-layer network training, so that the network only pays attention to the current difference part, and is used for preventing network degradation and accelerating convergence; normalization refers to Layer Normalization, which is used for normalizing the activation value of each Layer; alpha and beta are training parameters, and mu and sigma represent deviation and variance; and (3) performing linear conversion and a ReLu activation function through a feedforward neural network to obtain the output of the encoder, wherein the formula is as follows:

FNN＝max(0，xW₁+b₁)W₂b₂。

4. the method for constructing a deep semantic recognition model for public safety event analysis according to claim 3, wherein semantic representations obtained through a BERT model are used as input of a bidirectional GRU model; in a bidirectional GRU, each GRU unit containsTwo gate structures, update gate and reset gate, denoted r respectively_t，z_tFor maintaining and updating status information and for communicating; the updating gate is similar to the forgetting gate and the input gate of the LSTM in function, and determines the degree of the state information brought into the current state at the previous moment; the degree of neglecting the state information of the reset gate at the last moment is controlled, and the smaller the value of the reset gate is, the more the neglect is represented; the time-sequence problem is processed through a bidirectional GRU model, information of the whole text sequence is utilized, including mutual relation information among all words, and the information is used for processing each word.

5. The method for constructing deep semantic recognition model for public safety event analysis according to claim 3, wherein the output of BERT is passed through forward direction

and

forward GRU and GRU, respectivelyNegative GRU output;

H＝{h₁，h₁，...，h_d}；

TEXT_C＝softmax(W_t·H+b_t)；

6. The method for constructing a deep semantic recognition model for public safety event analysis according to claim 2, wherein in the third step, after obtaining a plurality of base classifiers with differences, the results of the base classifiers are fused by combining strategies, so that the model prediction effect after ensemble learning is the best; taking a plurality of BERT, BERT-BilSTM and BERT-BiGRU models as base classifiers, counting output categories of all the classifiers, and making a decision by adopting a majority voting strategy on the basis of generating category probability distribution of emotion recognition; given the same weight of the classification results of all the individual classifiers, each base classifier can only cast one vote, the minority-obeying majority principle is adopted, the class with the highest vote number is taken as the final prediction result of the comment, and the voting formula is as follows:

wherein n and T respectively represent the number of categories of emotion classification and the number of base classifiers, C_i，jThe prediction category of the base classifier i in the test set x is represented as j,

7. The public safety event analysis-oriented deep semantic recognition model is constructed by applying the public safety event analysis-oriented deep semantic recognition model according to any one of claims 1 to 6.

8. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

10. An information data processing terminal, characterized in that the information data processing terminal is configured to implement the deep semantic recognition model for public safety event analysis according to claim 7.