CN110110323B - Text emotion classification method and device and computer readable storage medium - Google Patents

Text emotion classification method and device and computer readable storage medium Download PDF

Info

Publication number
CN110110323B
CN110110323B CN201910285262.XA CN201910285262A CN110110323B CN 110110323 B CN110110323 B CN 110110323B CN 201910285262 A CN201910285262 A CN 201910285262A CN 110110323 B CN110110323 B CN 110110323B
Authority
CN
China
Prior art keywords
layer
embedding
word
text
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910285262.XA
Other languages
Chinese (zh)
Other versions
CN110110323A (en
Inventor
齐云飞
陈栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910285262.XA priority Critical patent/CN110110323B/en
Publication of CN110110323A publication Critical patent/CN110110323A/en
Application granted granted Critical
Publication of CN110110323B publication Critical patent/CN110110323B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a text emotion classification method and device and a computer readable storage medium, wherein the method comprises the steps of obtaining sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text; randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is a word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix; and classifying and judging according to the corrected angle embedding matrix to obtain the emotion of each angle of the text. According to the method and the device, the sentence context of the text is obtained through the pre-training language model, the attention function calculation is carried out through the angle embedding matrix and the sentence context, the multi-angle and multi-polarity emotion analysis task can be processed, and the characteristic extraction does not need a large amount of time.

Description

Text emotion classification method and device and computer readable storage medium
Technical Field
The present application relates to, but not limited to, the field of Natural Language Processing (NLP) technology, and in particular, to a text emotion classification method and apparatus, and a computer-readable storage medium.
Background
Sentiment analysis is an important task in NLP, sometimes referred to as "sentiment mining," and angle-based sentiment mining is a finer grained sentiment analysis that can provide deeper sentiment trends.
Most of the currently popular emotion classifications are used for judging the emotion polarity of a whole sentence or an article, and the emotion polarity of each angle cannot be judged in a finer granularity mode according to a given section of text. And part of emotion classification based on angles is to define part of rules through syntactic analysis, linguistic feature extraction or manual operation, and the method needs a large amount of time to extract features and requires developers to have good linguistic basis.
Disclosure of Invention
The application provides a text sentiment classification method and device and a computer readable storage medium, which can process a multi-angle multi-polarity sentiment analysis task and do not need a large amount of time to extract features.
The application provides a text sentiment classification method, which comprises the following steps:
obtaining a sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text;
randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is a word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;
and classifying and judging according to the corrected angle embedding matrix to obtain the emotion of each angle of the text.
In one exemplary embodiment, the pre-trained language model includes an embedding layer, a feature extraction layer, and a prediction layer, wherein:
the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer;
the feature extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional features of the text and outputting the sentence context to the prediction layer;
the prediction layer is used for predicting the words covering the positions according to the received sentence context.
In one exemplary embodiment, the embedding layer includes a forward embedding layer and a backward embedding layer, and the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein:
the forward embedding layer is used for mapping each word in the text at the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer;
the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer;
the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer;
the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer;
the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.
In an exemplary embodiment, said performing an attention function calculation on the initialized angle embedding matrix and the sentence context comprises:
Figure BDA0002023066770000021
wherein Attention () is an Attention function, s is a sentence, softmax () is a normalization index function, Q is the angle embedding matrix, K is a sentence context, and n is the word embedding dimension number.
The present application further provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the text emotion classification method as described in any of the above.
The application also provides a text sentiment classification device, which comprises a processor and a memory, wherein: the processor is used for executing the program stored in the memory to realize the steps of the text emotion classification method according to any one of the above items.
The application also provides a text sentiment classification device, which comprises a context acquisition module, an attention calculation module and a classification discrimination module, wherein:
the context acquisition module is used for acquiring the sentence context of a text through a pre-training language model, and the pre-training language model is used for predicting one or more randomly covered words in the text;
the attention calculation module is used for randomly initializing an angle embedding matrix a x n, wherein a is the number of angles, n is the word embedding dimension number, and the initialized angle embedding matrix and the sentence context are subjected to attention function calculation to obtain a corrected angle embedding matrix;
and the classification judging module is used for performing classification judgment according to the corrected angle embedding matrix to obtain the emotion of each angle of the text.
In one exemplary embodiment, the pre-trained language model includes an embedding layer, a feature extraction layer, and a prediction layer, wherein:
the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer;
the feature extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional features of the text and outputting the sentence context to the prediction layer;
the prediction layer is used for predicting the words covering the positions according to the received sentence context.
In one exemplary embodiment, the embedding layer includes a forward embedding layer and a backward embedding layer, and the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein:
the forward embedding layer is used for mapping each word in the text at the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer;
the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer;
the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer;
the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer;
the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.
In an exemplary embodiment, the attention calculation module performs attention function calculation on the initialized angle embedding matrix and the sentence context, including:
Figure BDA0002023066770000041
wherein Attention () is an Attention function, s is a sentence, softmax () is a normalization index function, Q is the angle embedding matrix, K is a sentence context, and n is the word embedding dimension number.
Compared with the prior art, the text emotion classification method and device and the computer readable storage medium can process multi-angle and multi-polarity emotion analysis tasks by obtaining the sentence context of the text through the pre-training language model and performing attention function calculation through the angle embedding matrix and the sentence context, and do not need a large amount of time for extracting features.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the application may be realized and attained by the instrumentalities and combinations particularly pointed out in the written description and claims hereof, as well as the appended drawings.
Drawings
The drawings are intended to provide an understanding of the present disclosure, and are to be considered as forming a part of the specification, and are to be used together with the embodiments of the present disclosure to explain the present disclosure without limiting the present disclosure.
FIG. 1 is a flowchart illustrating a text sentiment classification method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a text emotion classification apparatus according to an embodiment of the present invention.
Detailed Description
The description herein describes embodiments, but is intended to be exemplary, rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with, or instead of, any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented individually or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
Embodiment A text sentiment classification method
As shown in fig. 1, a method for classifying text sentiments according to an embodiment of the present invention includes the following steps:
step 101: obtaining a sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text;
it should be noted that deep learning has very strong expression ability, but its inherent disadvantage is that it needs to label a large amount of training samples, and labeling a large amount of high quality training samples is very time-consuming and costly. The annotation task is certainly of the utmost importance if a neural network is trained from scratch. However, in practice, a large number of training samples are not marked, and time and energy are not needed to manually mark a large number of training samples, so that the learning capability of the model is greatly limited. According to the method, a transfer learning idea is borrowed, a pre-training language model is used for obtaining high-dimensional feature representation of a text, then modeling is carried out on the basis of the high-dimensional feature representation, and fine adjustment is carried out on the whole network.
In an exemplary embodiment, the pre-trained language model includes an embedding layer, a feature extraction layer, and a prediction layer, wherein:
the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer;
the feature extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional features of the text and outputting the sentence context to the prediction layer;
the prediction layer is used for predicting the words covering the positions according to the received sentence context.
In an exemplary embodiment, the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein:
the forward embedding layer is used for mapping each word in the text at the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer;
the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer;
the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer;
the backward feature extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional features of a text on the right side of the covering position, and outputting a sentence context on the right side of the covering position to the prediction layer;
the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.
In an exemplary embodiment, the prediction layer comprises a first fully-connected layer and a first Softmax layer, wherein:
the first full connection layer is used for receiving the sentence contexts output by the backward and forward feature extraction layer and the backward feature extraction layer and outputting the results to the first Softmax layer;
the first Softmax layer is used for receiving the output result of the first full-connection layer and predicting the words covering the positions in the input sentence.
Step 102: randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is the word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;
in an exemplary embodiment, the performing attention function calculation on the initialized angle embedding matrix and the sentence context includes:
Figure BDA0002023066770000071
wherein, attention () is an Attention function, s is a sentence, softmax () is a normalized exponential function, Q is the angle embedding matrix, K is a sentence context, and n is the word embedding dimension number.
Step 103: and carrying out classification and judgment according to the corrected angle embedding matrix to obtain the emotion of each angle of the text.
In an exemplary embodiment, the step 103 specifically includes:
inputting the corrected angle embedded matrix into a second full-connection layer L (n, t), wherein t represents the number of each polarity, and an a x t result matrix is obtained; and then inputting the a x t result matrix into the second Softmax layer to obtain a final prediction result.
Example two computer-readable storage Medium
Embodiments of the present invention also provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the text emotion classification method as described in any of the above.
Embodiment three-text sentiment classification device
The embodiment of the invention also provides a text emotion classification device, which comprises a processor and a memory, wherein: the processor is configured to execute a program stored in the memory to implement the steps of the text sentiment classification method according to any one of the above.
Embodiment four text emotion classification device
As shown in fig. 2, an embodiment of the present invention further provides a text emotion classification apparatus, which includes a context obtaining module 201, an attention calculating module 202, and a classification judging module 203, where:
the context acquiring module 201 is configured to acquire a sentence context of a text through a pre-trained language model, where the pre-trained language model is used to predict one or more randomly masked words in the text;
the attention calculation module 202 is configured to randomly initialize an angle embedding matrix a × n, where a is the number of angles and n is a word embedding dimension, and perform attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;
the classification and judgment module 203 is configured to perform classification and judgment according to the corrected angle embedding matrix, so as to obtain the emotion of each angle of the text.
In an exemplary embodiment, the pre-trained language model includes an embedding layer, a feature extraction layer, and a prediction layer, wherein:
the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer;
the characteristic extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional characteristics of the text and outputting the sentence context to the prediction layer;
the prediction layer is used for predicting the words covering the positions according to the received sentence context.
In an exemplary embodiment, the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein:
the forward embedding layer is used for mapping each word in the text on the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer;
the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer;
the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer;
the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer;
the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.
In an exemplary embodiment, the attention calculation module 202 performs attention function calculation on the initialized angle embedding matrix and the sentence context, including:
Figure BDA0002023066770000091
wherein, attention () is an Attention function, s is a sentence, softmax () is a normalized exponential function, Q is the angle embedding matrix, K is a sentence context, and n is the word embedding dimension number.
Embodiment five-text sentiment classification method
After the training corpus is taken, firstly, the training corpus needs to be labeled, and the specific labeling mode is as follows:
(1) And determining the type and the number of the angles. For example, taking car data as an example, we stipulate that the number of angles is 5, and the angle types are: interior decoration, power, cost performance, oil consumption and comfort.
(2) Labeling each training sample according to each angle, namely labeling all angles required by each training sample, and if a certain angle is not mentioned in the training samples, defaulting that the angle polarity is common. For example:
a) Training sample 1: 'starting meat, dragging from gear 2 to gear 3 slightly, satisfying background space, no crowding for three people, satisfying appearance and interior most, general gas and general oil consumption'.
B) Training sample 2: the color value is full, the key hand feeling is super good, abnormal sound exists in a vehicle, double-clutch starting is slow, starting and stopping functions cannot be closed permanently, manual closing is needed each time, the key is not comfortable, the oil consumption is slightly high, but the cost performance is still high, and the price does not need to be too much after all.
For the above two training samples, the labeling results are shown in table 1:
training sample Interior decoration Power plant Cost performance ratio Oil consumption Comfort feature
Training sample 1 Forward direction Negative going In general Negative going In general
Training sample 2 Forward direction Negative direction of rotation Forward direction Negative direction of rotation In general
TABLE 1
(3) And (4) screening the training samples, removing the common training samples from all angles, and accelerating the training speed of the model.
The specific pre-training method is as follows:
i) Unsupervised pre-training language samples:
to make the language model more feature-extracting, we will randomly mask one or more words in the training sample and train the language model to make predictions. Such as:
original training samples: the car has abnormal sound, the ceiling also has abnormal sound, and the double clutch starts slowly.
Pre-training samples: abnormal sound exists in the vehicle, abnormal sound also exists on the ceiling, and double MASK is slow in starting.
And (3) outputting a language model: separation device
Note that each word may be masked and the language model predicts word by word. The language model will predict the masked words based on learning the context representation from Left to Right (Left-to-Right) and Right to Left (Right-to-Left).
II) language model definition:
to predict the correctly masked words, the language model needs to learn the context from both the left and right sides, the two parts being structurally identical but with different parameters, and then the two part-learned vectors are added. Correspondingly, the left part is called a forward representation network, the right part is called a backward representation network, and the two networks have the same structure. Note that all words to the right of the mask are invisible to the forward direction, indicating that the network is invisible to the left of the mask.
The forward representation network consists of two parts, wherein the first part is a forward Embedding layer which maps each word in the text at the left side of the covering Position into a word vector (Token Embedding), embeds a Position vector (Position Embedding) into each word vector, and takes the accumulated result of the Token Embedding and the Position Embedding as the input of the second part; the second part is a feature extraction layer, the input of which is the output result of the first part, and the layer can be a transform Encoder (Encoder), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN) (Long-Short Term Memory artificial Neural Network (LSTM), a GRU (Gated corrected Unit)), or the like, and the feature extraction layer can add multiple layers to improve feature extraction capability (the layers can be added as Long as the input and output dimensions are the same, and the n-dimensional output of the first layer can be used as the n-dimensional input of the second layer). In an exemplary embodiment, the second part of the present application adopts the Encoder part of the transform, and the number of layers adopts 8 layers.
It should be noted that the transform model of *** was originally used for machine translation tasks, and the transform improved the slow training disadvantage of the most popular problem of RNN, and implemented fast parallelism by using Self-Attention (Self-Attention) mechanism. And the Transformer can be increased to a very Deep depth, so that the characteristics of Deep Neural Network (DNN) models are fully developed, and the model accuracy is improved.
The backward representation network consists of two parts, wherein the first part is a backward Embedding layer which maps each word in the text on the right side of the covering Position into a word vector, embeds the Position vector into each word vector and takes the accumulated result of Token Embedding and Position Embedding as the input of the second part; the second part is a feature extraction layer, the input of which is the output result of the first part, the layer can be transform's Encoder, CNN or RNN (LSTM, GRU), etc., and the feature extraction layer can be stacked with multiple layers to improve the feature extraction capability. In an exemplary embodiment, the second part of the present application adopts the Encoder part of the Transformer, and the number of layers adopts 8 layers.
And in the third part, after sentence contexts are output by the left and right side submodels, the two sentence contexts are added and then input into a full connection layer, and Softmax prediction is carried out on characters at the position of [ MASK ] in an input sentence.
III) model training:
and training according to the normal neural network model to obtain a pre-training language model.
According to the method, on the basis of a pre-training language model, an angle Embedding matrix (Aspect Embedding) is constructed, and all outputs of the language model are modeled by combining an Attention Mechanism (Attention Mechanism). The detailed process is as follows:
an angle embedding matrix is constructed according to the number of angles, for example, the number of angles is 5 in the above, so a 5 x n dimensional matrix V is constructed, wherein n is the word embedding dimensional number, and then the angle embedding matrix and all the outputs (i.e. sentence contexts) of the second part of the pre-training model are subjected to attention function calculation:
Figure BDA0002023066770000121
wherein Q is an angle embedding matrix, K is a sentence context, and n is a word embedding dimension number.
The attention calculation mode adopted by the application can be a Scaled Dot Product (Scaled Dot-Product) mode. And each row of the angle embedding matrix calculates a corrected angle embedding matrix according to the equation, and then the corrected angle embedding matrix is input into a multi-label classifier, wherein the classifier is a full-connection layer, and the output dimension is the number of angle polarities. For example: three polarities are output per angle, so 5 angles output a matrix of 5 x 3, each row representing three polarities for one angle, after which the optimum polarity is selected by Softmax.
The pre-trained model is trained using cross entropy as a loss for each angle.
Specifically, the following are shown:
assume that the pre-training model output is M, M is a matrix of p × n, p represents the number of words in a sentence (i.e., there are several words in a sentence. Suppose the angle embedding matrix is A, A is a matrix of a x n, and a represents the number of angles. F represents the Attention function (a Scaled Dot-Product mode can be adopted), so that M and A are input into F and then output into an a x n matrix, and then the matrix passes through a fully-connected network L (n, t), t represents the number of each polarity, and a x t result matrix is obtained. In the application, t =3, and finally, a final prediction result is obtained through a Softmax layer.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (6)

1. A method for classifying text sentiments is characterized by comprising the following steps:
obtaining a sentence context of a text through a pre-training language model, wherein the pre-training language model is used for predicting one or more randomly covered words in the text;
randomly initializing an angle embedding matrix of a x n, wherein a is the number of angles, n is a word embedding dimension number, and performing attention function calculation on the initialized angle embedding matrix and the sentence context to obtain a corrected angle embedding matrix;
classifying and judging according to the corrected angle embedding matrix to obtain the emotion of each angle of the text;
the pre-training language model comprises an embedding layer, a feature extraction layer and a prediction layer, wherein: the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer; the characteristic extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional characteristics of the text and outputting the sentence context to the prediction layer; the prediction layer is used for predicting the words covering the positions according to the received sentence context;
the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein: the forward embedding layer is used for mapping each word in the text at the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer; the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer; the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer; the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer; the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.
2. The method for classifying emotion of text according to claim 1, wherein said performing attention function calculation on the initialized angle embedding matrix and the sentence context comprises:
Figure FDA0003820349540000021
wherein, attention () is an Attention function, s is a sentence, softmax () is a normalized exponential function, Q is the angle embedding matrix, K is the sentence context, and n is the word embedding dimension number.
3. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the text emotion classification method as recited in any one of claims 1 through 2.
4. A text emotion classification device, comprising a processor and a memory, wherein: the processor is used for executing the program stored in the memory to realize the steps of the text sentiment classification method according to any one of the claims 1-2.
5. A text sentiment classification device is characterized by comprising a context acquisition module, an attention calculation module and a classification judgment module, wherein:
the context acquisition module is used for acquiring the sentence context of a text through a pre-training language model, and the pre-training language model is used for predicting one or more randomly covered words in the text;
the attention calculation module is used for randomly initializing an angle embedding matrix a x n, wherein a is the number of angles, n is the word embedding dimension number, and the initialized angle embedding matrix and the sentence context are subjected to attention function calculation to obtain a corrected angle embedding matrix;
the classification judgment module is used for performing classification judgment according to the corrected angle embedding matrix to obtain the emotion of each angle of the text;
the pre-training language model comprises an embedding layer, a feature extraction layer and a prediction layer, wherein: the embedding layer is used for mapping each word in the text into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the feature extraction layer; the characteristic extraction layer is used for receiving the output result of the embedding layer, extracting the high-dimensional characteristics of the text and outputting the sentence context to the prediction layer; the prediction layer is used for predicting the words covering the positions according to the received sentence context;
the embedding layer includes a forward embedding layer and a backward embedding layer, the feature extraction layer includes a forward feature extraction layer and a backward feature extraction layer, wherein: the forward embedding layer is used for mapping each word in the text on the left side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the forward feature extraction layer; the forward feature extraction layer is used for receiving an output result of the forward embedding layer, extracting high-dimensional features of a text on the left side of a covering position, and outputting a sentence context on the left side of the covering position to the prediction layer; the backward embedding layer is used for mapping each word in the text on the right side of the covering position into a word vector, embedding a position vector into each word vector, accumulating the word vectors and the position vectors and outputting the accumulated word vectors and position vectors to the backward feature extraction layer; the backward characteristic extraction layer is used for receiving an output result of the backward embedding layer, extracting high-dimensional characteristics of the text on the right side of the covering position and outputting sentence context on the right side of the covering position to the prediction layer; the prediction layer is specifically configured to superimpose the sentence contexts output by the forward feature extraction layer and the backward feature extraction layer, and predict a word at a covered position according to a result of the superimposition.
6. The apparatus of claim 5, wherein the attention calculation module performs attention function calculation on the initialized angle embedding matrix and the sentence context, and comprises:
Figure FDA0003820349540000031
wherein, attention () is an Attention function, s is a sentence, softmax () is a normalized exponential function, Q is the angle embedding matrix, K is the sentence context, and n is the word embedding dimension number.
CN201910285262.XA 2019-04-10 2019-04-10 Text emotion classification method and device and computer readable storage medium Expired - Fee Related CN110110323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910285262.XA CN110110323B (en) 2019-04-10 2019-04-10 Text emotion classification method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910285262.XA CN110110323B (en) 2019-04-10 2019-04-10 Text emotion classification method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110110323A CN110110323A (en) 2019-08-09
CN110110323B true CN110110323B (en) 2022-11-11

Family

ID=67483800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910285262.XA Expired - Fee Related CN110110323B (en) 2019-04-10 2019-04-10 Text emotion classification method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110110323B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704622A (en) * 2019-09-27 2020-01-17 北京明略软件***有限公司 Text emotion classification method and device and electronic equipment
CN110795537B (en) * 2019-10-30 2022-10-25 秒针信息技术有限公司 Method, device, equipment and medium for determining improvement strategy of target commodity
CN110837733B (en) * 2019-10-31 2023-12-29 创新工场(广州)人工智能研究有限公司 Language model training method and system of self-reconstruction mode and electronic equipment
CN111241304B (en) * 2020-01-16 2024-02-06 平安科技(深圳)有限公司 Answer generation method based on deep learning, electronic device and readable storage medium
CN111274807B (en) * 2020-02-03 2022-05-10 华为技术有限公司 Text information processing method and device, computer equipment and readable storage medium
CN111506702A (en) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 Knowledge distillation-based language model training method, text classification method and device
CN111737994B (en) * 2020-05-29 2024-01-26 北京百度网讯科技有限公司 Method, device, equipment and storage medium for obtaining word vector based on language model
CN112214576B (en) * 2020-09-10 2024-02-06 深圳价值在线信息科技股份有限公司 Public opinion analysis method, public opinion analysis device, terminal equipment and computer readable storage medium
CN112214601B (en) * 2020-10-21 2022-06-10 厦门市美亚柏科信息股份有限公司 Social short text sentiment classification method and device and storage medium
CN113792143B (en) * 2021-09-13 2023-12-12 中国科学院新疆理化技术研究所 Multi-language emotion classification method, device, equipment and storage medium based on capsule network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2973138A1 (en) * 2014-01-10 2015-07-16 Cluep Inc. Systems, devices, and methods for automatic detection of feelings in text
DK201670552A1 (en) * 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
CN108170681A (en) * 2018-01-15 2018-06-15 中南大学 Text emotion analysis method, system and computer readable storage medium
CN109543039A (en) * 2018-11-23 2019-03-29 中山大学 A kind of natural language sentiment analysis method based on depth network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2973138A1 (en) * 2014-01-10 2015-07-16 Cluep Inc. Systems, devices, and methods for automatic detection of feelings in text
DK201670552A1 (en) * 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
CN108170681A (en) * 2018-01-15 2018-06-15 中南大学 Text emotion analysis method, system and computer readable storage medium
CN109543039A (en) * 2018-11-23 2019-03-29 中山大学 A kind of natural language sentiment analysis method based on depth network

Also Published As

Publication number Publication date
CN110110323A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
CN107526785B (en) Text classification method and device
CN106650813B (en) A kind of image understanding method based on depth residual error network and LSTM
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
CN109271522B (en) Comment emotion classification method and system based on deep hybrid model transfer learning
CN107608956B (en) Reader emotion distribution prediction algorithm based on CNN-GRNN
CN108009148B (en) Text emotion classification representation method based on deep learning
CN110765775B (en) Self-adaptive method for named entity recognition field fusing semantics and label differences
CN112100346B (en) Visual question-answering method based on fusion of fine-grained image features and external knowledge
CN107729311B (en) Chinese text feature extraction method fusing text moods
CN111061843A (en) Knowledge graph guided false news detection method
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN107909115B (en) Image Chinese subtitle generating method
CN110826338B (en) Fine-grained semantic similarity recognition method for single-selection gate and inter-class measurement
CN110232123B (en) Text emotion analysis method and device, computing device and readable medium
CN111400494B (en) Emotion analysis method based on GCN-Attention
Yan et al. Data augmentation for deep learning of judgment documents
CN110415071A (en) A kind of competing product control methods of automobile based on opining mining analysis
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
Sharma et al. Deep eigen space based ASL recognition system
CN112434686B (en) End-to-end misplaced text classification identifier for OCR (optical character) pictures
CN116152554A (en) Knowledge-guided small sample image recognition system
Sen et al. Face recognition using deep convolutional network and one-shot learning
CN113806543A (en) Residual jump connection-based text classification method for gated cyclic unit
Xu et al. CNN-based skip-gram method for improving classification accuracy of chinese text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20221111

CF01 Termination of patent right due to non-payment of annual fee