CN110110323B

CN110110323B - Text emotion classification method and device and computer readable storage medium

Info

Publication number: CN110110323B
Application number: CN201910285262.XA
Authority: CN
Inventors: 齐云飞; 陈栋
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2022-11-11
Anticipated expiration: 2039-04-10
Also published as: CN110110323A

Abstract

The application discloses a text emotion classification method and device and a computer readable storage medium, wherein the method comprises the steps of obtaining sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text; randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is a word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix; and classifying and judging according to the corrected angle embedding matrix to obtain the emotion of each angle of the text. According to the method and the device, the sentence context of the text is obtained through the pre-training language model, the attention function calculation is carried out through the angle embedding matrix and the sentence context, the multi-angle and multi-polarity emotion analysis task can be processed, and the characteristic extraction does not need a large amount of time.

Description

Text emotion classification method and device and computer readable storage medium

Technical Field

The present application relates to, but not limited to, the field of Natural Language Processing (NLP) technology, and in particular, to a text emotion classification method and apparatus, and a computer-readable storage medium.

Background

Sentiment analysis is an important task in NLP, sometimes referred to as "sentiment mining," and angle-based sentiment mining is a finer grained sentiment analysis that can provide deeper sentiment trends.

Most of the currently popular emotion classifications are used for judging the emotion polarity of a whole sentence or an article, and the emotion polarity of each angle cannot be judged in a finer granularity mode according to a given section of text. And part of emotion classification based on angles is to define part of rules through syntactic analysis, linguistic feature extraction or manual operation, and the method needs a large amount of time to extract features and requires developers to have good linguistic basis.

Disclosure of Invention

The application provides a text sentiment classification method and device and a computer readable storage medium, which can process a multi-angle multi-polarity sentiment analysis task and do not need a large amount of time to extract features.

The application provides a text sentiment classification method, which comprises the following steps:

obtaining a sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text;

randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is a word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;

and classifying and judging according to the corrected angle embedding matrix to obtain the emotion of each angle of the text.

In one exemplary embodiment, the pre-trained language model includes an embedding layer, a feature extraction layer, and a prediction layer, wherein:

the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer;

the feature extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional features of the text and outputting the sentence context to the prediction layer;

the prediction layer is used for predicting the words covering the positions according to the received sentence context.

In one exemplary embodiment, the embedding layer includes a forward embedding layer and a backward embedding layer, and the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein:

the forward embedding layer is used for mapping each word in the text at the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer;

the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer;

the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer;

the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer;

the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.

In an exemplary embodiment, said performing an attention function calculation on the initialized angle embedding matrix and the sentence context comprises:

wherein Attention () is an Attention function, s is a sentence, softmax () is a normalization index function, Q is the angle embedding matrix, K is a sentence context, and n is the word embedding dimension number.

The present application further provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the text emotion classification method as described in any of the above.

The application also provides a text sentiment classification device, which comprises a processor and a memory, wherein: the processor is used for executing the program stored in the memory to realize the steps of the text emotion classification method according to any one of the above items.

The application also provides a text sentiment classification device, which comprises a context acquisition module, an attention calculation module and a classification discrimination module, wherein:

the context acquisition module is used for acquiring the sentence context of a text through a pre-training language model, and the pre-training language model is used for predicting one or more randomly covered words in the text;

the attention calculation module is used for randomly initializing an angle embedding matrix a x n, wherein a is the number of angles, n is the word embedding dimension number, and the initialized angle embedding matrix and the sentence context are subjected to attention function calculation to obtain a corrected angle embedding matrix;

and the classification judging module is used for performing classification judgment according to the corrected angle embedding matrix to obtain the emotion of each angle of the text.

In an exemplary embodiment, the attention calculation module performs attention function calculation on the initialized angle embedding matrix and the sentence context, including:

Compared with the prior art, the text emotion classification method and device and the computer readable storage medium can process multi-angle and multi-polarity emotion analysis tasks by obtaining the sentence context of the text through the pre-training language model and performing attention function calculation through the angle embedding matrix and the sentence context, and do not need a large amount of time for extracting features.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the application may be realized and attained by the instrumentalities and combinations particularly pointed out in the written description and claims hereof, as well as the appended drawings.

Drawings

The drawings are intended to provide an understanding of the present disclosure, and are to be considered as forming a part of the specification, and are to be used together with the embodiments of the present disclosure to explain the present disclosure without limiting the present disclosure.

FIG. 1 is a flowchart illustrating a text sentiment classification method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a text emotion classification apparatus according to an embodiment of the present invention.

Detailed Description

The description herein describes embodiments, but is intended to be exemplary, rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with, or instead of, any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented individually or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

Embodiment A text sentiment classification method

As shown in fig. 1, a method for classifying text sentiments according to an embodiment of the present invention includes the following steps:

step 101: obtaining a sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text;

it should be noted that deep learning has very strong expression ability, but its inherent disadvantage is that it needs to label a large amount of training samples, and labeling a large amount of high quality training samples is very time-consuming and costly. The annotation task is certainly of the utmost importance if a neural network is trained from scratch. However, in practice, a large number of training samples are not marked, and time and energy are not needed to manually mark a large number of training samples, so that the learning capability of the model is greatly limited. According to the method, a transfer learning idea is borrowed, a pre-training language model is used for obtaining high-dimensional feature representation of a text, then modeling is carried out on the basis of the high-dimensional feature representation, and fine adjustment is carried out on the whole network.

In an exemplary embodiment, the pre-trained language model includes an embedding layer, a feature extraction layer, and a prediction layer, wherein:

In an exemplary embodiment, the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein:

the backward feature extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional features of a text on the right side of the covering position, and outputting a sentence context on the right side of the covering position to the prediction layer;

In an exemplary embodiment, the prediction layer comprises a first fully-connected layer and a first Softmax layer, wherein:

the first full connection layer is used for receiving the sentence contexts output by the backward and forward feature extraction layer and the backward feature extraction layer and outputting the results to the first Softmax layer;

the first Softmax layer is used for receiving the output result of the first full-connection layer and predicting the words covering the positions in the input sentence.

Step 102: randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is the word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;

in an exemplary embodiment, the performing attention function calculation on the initialized angle embedding matrix and the sentence context includes:

wherein, attention () is an Attention function, s is a sentence, softmax () is a normalized exponential function, Q is the angle embedding matrix, K is a sentence context, and n is the word embedding dimension number.

Step 103: and carrying out classification and judgment according to the corrected angle embedding matrix to obtain the emotion of each angle of the text.

In an exemplary embodiment, the step 103 specifically includes:

inputting the corrected angle embedded matrix into a second full-connection layer L (n, t), wherein t represents the number of each polarity, and an a x t result matrix is obtained; and then inputting the a x t result matrix into the second Softmax layer to obtain a final prediction result.

Example two computer-readable storage Medium

Embodiments of the present invention also provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the text emotion classification method as described in any of the above.

Embodiment three-text sentiment classification device

The embodiment of the invention also provides a text emotion classification device, which comprises a processor and a memory, wherein: the processor is configured to execute a program stored in the memory to implement the steps of the text sentiment classification method according to any one of the above.

Embodiment four text emotion classification device

As shown in fig. 2, an embodiment of the present invention further provides a text emotion classification apparatus, which includes a context obtaining module 201, an attention calculating module 202, and a classification judging module 203, where:

the context acquiring module 201 is configured to acquire a sentence context of a text through a pre-trained language model, where the pre-trained language model is used to predict one or more randomly masked words in the text;

the attention calculation module 202 is configured to randomly initialize an angle embedding matrix a × n, where a is the number of angles and n is a word embedding dimension, and perform attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;

the classification and judgment module 203 is configured to perform classification and judgment according to the corrected angle embedding matrix, so as to obtain the emotion of each angle of the text.

the characteristic extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional characteristics of the text and outputting the sentence context to the prediction layer;

the forward embedding layer is used for mapping each word in the text on the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer;

In an exemplary embodiment, the attention calculation module 202 performs attention function calculation on the initialized angle embedding matrix and the sentence context, including:

Embodiment five-text sentiment classification method

After the training corpus is taken, firstly, the training corpus needs to be labeled, and the specific labeling mode is as follows:

(1) And determining the type and the number of the angles. For example, taking car data as an example, we stipulate that the number of angles is 5, and the angle types are: interior decoration, power, cost performance, oil consumption and comfort.

(2) Labeling each training sample according to each angle, namely labeling all angles required by each training sample, and if a certain angle is not mentioned in the training samples, defaulting that the angle polarity is common. For example:

a) Training sample 1: 'starting meat, dragging from gear 2 to gear 3 slightly, satisfying background space, no crowding for three people, satisfying appearance and interior most, general gas and general oil consumption'.

B) Training sample 2: the color value is full, the key hand feeling is super good, abnormal sound exists in a vehicle, double-clutch starting is slow, starting and stopping functions cannot be closed permanently, manual closing is needed each time, the key is not comfortable, the oil consumption is slightly high, but the cost performance is still high, and the price does not need to be too much after all.

For the above two training samples, the labeling results are shown in table 1:

training sample	Interior decoration	Power plant	Cost performance ratio	Oil consumption	Comfort feature
						Training sample 1	Forward direction	Negative going	In general	Negative going	In general
Training sample 2	Forward direction	Negative direction of rotation	Forward direction	Negative direction of rotation	In general

TABLE 1

(3) And (4) screening the training samples, removing the common training samples from all angles, and accelerating the training speed of the model.

The specific pre-training method is as follows:

i) Unsupervised pre-training language samples:

to make the language model more feature-extracting, we will randomly mask one or more words in the training sample and train the language model to make predictions. Such as:

original training samples: the car has abnormal sound, the ceiling also has abnormal sound, and the double clutch starts slowly.

Pre-training samples: abnormal sound exists in the vehicle, abnormal sound also exists on the ceiling, and double MASK is slow in starting.

And (3) outputting a language model: separation device

Note that each word may be masked and the language model predicts word by word. The language model will predict the masked words based on learning the context representation from Left to Right (Left-to-Right) and Right to Left (Right-to-Left).

II) language model definition:

to predict the correctly masked words, the language model needs to learn the context from both the left and right sides, the two parts being structurally identical but with different parameters, and then the two part-learned vectors are added. Correspondingly, the left part is called a forward representation network, the right part is called a backward representation network, and the two networks have the same structure. Note that all words to the right of the mask are invisible to the forward direction, indicating that the network is invisible to the left of the mask.

The forward representation network consists of two parts, wherein the first part is a forward Embedding layer which maps each word in the text at the left side of the covering Position into a word vector (Token Embedding), embeds a Position vector (Position Embedding) into each word vector, and takes the accumulated result of the Token Embedding and the Position Embedding as the input of the second part; the second part is a feature extraction layer, the input of which is the output result of the first part, and the layer can be a transform Encoder (Encoder), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN) (Long-Short Term Memory artificial Neural Network (LSTM), a GRU (Gated corrected Unit)), or the like, and the feature extraction layer can add multiple layers to improve feature extraction capability (the layers can be added as Long as the input and output dimensions are the same, and the n-dimensional output of the first layer can be used as the n-dimensional input of the second layer). In an exemplary embodiment, the second part of the present application adopts the Encoder part of the transform, and the number of layers adopts 8 layers.

It should be noted that the transform model of *** was originally used for machine translation tasks, and the transform improved the slow training disadvantage of the most popular problem of RNN, and implemented fast parallelism by using Self-Attention (Self-Attention) mechanism. And the Transformer can be increased to a very Deep depth, so that the characteristics of Deep Neural Network (DNN) models are fully developed, and the model accuracy is improved.

The backward representation network consists of two parts, wherein the first part is a backward Embedding layer which maps each word in the text on the right side of the covering Position into a word vector, embeds the Position vector into each word vector and takes the accumulated result of Token Embedding and Position Embedding as the input of the second part; the second part is a feature extraction layer, the input of which is the output result of the first part, the layer can be transform's Encoder, CNN or RNN (LSTM, GRU), etc., and the feature extraction layer can be stacked with multiple layers to improve the feature extraction capability. In an exemplary embodiment, the second part of the present application adopts the Encoder part of the Transformer, and the number of layers adopts 8 layers.

And in the third part, after sentence contexts are output by the left and right side submodels, the two sentence contexts are added and then input into a full connection layer, and Softmax prediction is carried out on characters at the position of [ MASK ] in an input sentence.

III) model training:

and training according to the normal neural network model to obtain a pre-training language model.

According to the method, on the basis of a pre-training language model, an angle Embedding matrix (Aspect Embedding) is constructed, and all outputs of the language model are modeled by combining an Attention Mechanism (Attention Mechanism). The detailed process is as follows:

an angle embedding matrix is constructed according to the number of angles, for example, the number of angles is 5 in the above, so a 5 x n dimensional matrix V is constructed, wherein n is the word embedding dimensional number, and then the angle embedding matrix and all the outputs (i.e. sentence contexts) of the second part of the pre-training model are subjected to attention function calculation:

wherein Q is an angle embedding matrix, K is a sentence context, and n is a word embedding dimension number.

The attention calculation mode adopted by the application can be a Scaled Dot Product (Scaled Dot-Product) mode. And each row of the angle embedding matrix calculates a corrected angle embedding matrix according to the equation, and then the corrected angle embedding matrix is input into a multi-label classifier, wherein the classifier is a full-connection layer, and the output dimension is the number of angle polarities. For example: three polarities are output per angle, so 5 angles output a matrix of 5 x 3, each row representing three polarities for one angle, after which the optimum polarity is selected by Softmax.

The pre-trained model is trained using cross entropy as a loss for each angle.

Specifically, the following are shown:

assume that the pre-training model output is M, M is a matrix of p × n, p represents the number of words in a sentence (i.e., there are several words in a sentence. Suppose the angle embedding matrix is A, A is a matrix of a x n, and a represents the number of angles. F represents the Attention function (a Scaled Dot-Product mode can be adopted), so that M and A are input into F and then output into an a x n matrix, and then the matrix passes through a fully-connected network L (n, t), t represents the number of each polarity, and a x t result matrix is obtained. In the application, t =3, and finally, a final prediction result is obtained through a Softmax layer.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A method for classifying text sentiments is characterized by comprising the following steps:

classifying and judging according to the corrected angle embedding matrix to obtain the emotion of each angle of the text;

the pre-training language model comprises an embedding layer, a feature extraction layer and a prediction layer, wherein: the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer; the characteristic extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional characteristics of the text and outputting the sentence context to the prediction layer; the prediction layer is used for predicting the words covering the positions according to the received sentence context;

the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein: the forward embedding layer is used for mapping each word in the text at the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer; the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer; the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer; the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer; the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.

2. The method for classifying emotion of text according to claim 1, wherein said performing attention function calculation on the initialized angle embedding matrix and the sentence context comprises:

wherein, attention () is an Attention function, s is a sentence, softmax () is a normalized exponential function, Q is the angle embedding matrix, K is the sentence context, and n is the word embedding dimension number.

3. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the text emotion classification method as recited in any one of claims 1 through 2.

4. A text emotion classification device, comprising a processor and a memory, wherein: the processor is used for executing the program stored in the memory to realize the steps of the text sentiment classification method according to any one of the claims 1-2.

5. A text sentiment classification device is characterized by comprising a context acquisition module, an attention calculation module and a classification judgment module, wherein:

the classification judgment module is used for performing classification judgment according to the corrected angle embedding matrix to obtain the emotion of each angle of the text;

the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein: the forward embedding layer is used for mapping each word in the text on the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer; the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer; the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer; the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer; the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.

6. The apparatus of claim 5, wherein the attention calculation module performs attention function calculation on the initialized angle embedding matrix and the sentence context, and comprises: