CN113971201A - Text analysis method, text analysis device and storage medium - Google Patents

Text analysis method, text analysis device and storage medium Download PDF

Info

Publication number
CN113971201A
CN113971201A CN202010712773.8A CN202010712773A CN113971201A CN 113971201 A CN113971201 A CN 113971201A CN 202010712773 A CN202010712773 A CN 202010712773A CN 113971201 A CN113971201 A CN 113971201A
Authority
CN
China
Prior art keywords
text
emotion
implicit
determining
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010712773.8A
Other languages
Chinese (zh)
Inventor
蒋忠强
赵冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010712773.8A priority Critical patent/CN113971201A/en
Publication of CN113971201A publication Critical patent/CN113971201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a text analysis method, a text analysis device and a storage medium, wherein the method comprises the following steps: acquiring a text to be analyzed; classifying the emotion classes contained in the text to be analyzed by adopting a trained neural network to obtain emotion class results; the trained neural network is obtained by training according to the turning text and the implicit emotion text; outputting the emotion classification result; therefore, the emotion types of the text to be analyzed are respectively analyzed by adopting the neural network obtained by training the implicit emotion type text sentence and the turning type text, and the emotion polarity of the text to be analyzed can be more accurately analyzed.

Description

Text analysis method, text analysis device and storage medium
Technical Field
The present application relates to the field of data analysis, and relates to, but is not limited to, a text analysis method, apparatus, and storage medium.
Background
In the emotion analysis process, whether the emotion analysis of the text is realized based on the statement rule or the neural network of machine learning, for the situation that positive emotion words appear in the text too much but actually neutral or negative emotion occurs, misjudgment of the neural network is often caused; under the condition that the text has no obvious emotional words, the emotion of the text can be difficult to judge by the neural network based on machine learning; as such, the neural network is not accurate enough for emotion analysis of the text.
Disclosure of Invention
In view of this, embodiments of the present application provide a text analysis method, apparatus, and storage medium, which at least solve the problem that the system is difficult to determine if the text has no obvious emotion words.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a text analysis method, which comprises the following steps:
acquiring a text to be analyzed;
classifying the emotion classes contained in the text to be analyzed by adopting a trained neural network to obtain emotion class results; the trained neural network is obtained by training according to the turning text and the implicit emotion text;
and outputting the emotion classification result.
An embodiment of the present application provides a text analysis device, the device includes:
the first acquisition module is used for acquiring a text to be analyzed;
the first classification module is used for classifying the emotion types contained in the text to be analyzed by adopting a trained neural network to obtain emotion type results; the trained neural network is obtained by training according to a turning text and an implicit emotion text;
and the first output module is used for outputting the emotion classification result.
An embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the text analysis method provided above.
The embodiment of the application provides a text analysis method, a text analysis device and a storage medium, wherein for an acquired text to be analyzed; firstly, classifying emotion types contained in the text to be analyzed by adopting a trained neural network to obtain emotion type results; the trained neural network is obtained by training according to the turning type text and the implicit emotion type text; finally, outputting the emotion classification result; therefore, the implicit emotion text sentence and the turning text are input into the neural network together, the neural network is trained to obtain the trained neural network, the trained neural network is adopted to analyze the emotion type of the text to be analyzed, and the emotion polarity of the text to be analyzed can be determined more accurately.
Drawings
In the drawings, which are not necessarily drawn to scale, like reference numerals may describe similar components in different views. Like reference numerals having different letter suffixes may represent different examples of similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed herein.
FIG. 1A is a schematic diagram of an implementation flow of a text analysis method according to an embodiment of the present application;
FIG. 1B is a schematic diagram of a process for implementing text analysis network training according to an embodiment of the present disclosure;
FIG. 2A is a schematic flow chart illustrating another implementation of the text analysis method according to the embodiment of the present application;
FIG. 2B is a schematic flow chart illustrating another implementation of the text analysis method according to the embodiment of the present disclosure;
FIG. 3 is a system framework diagram illustrating a method for implementing text analysis in accordance with an exemplary embodiment of the present application;
FIG. 4 is a block diagram of an implementation of implicit sentiment analysis according to an embodiment of the present application;
FIG. 5A is another block diagram of the implicit sentiment analysis of the embodiment of the present application;
FIG. 5B is a block diagram of an implicit sentiment analysis in accordance with an embodiment of the present application;
FIG. 6 is a block diagram of an embodiment of the present invention for implementing a break-over statement analysis;
FIG. 7 is a block diagram of a text representation implemented by an embodiment of the present application;
FIG. 8 is a block diagram of an embodiment of the present application for implementing sentiment polarity analysis;
fig. 9 is a schematic structural diagram of a text analysis apparatus according to an embodiment of the present application.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.
The terminal may be implemented in various forms. For example, the terminal described in the present application may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like.
The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.
An embodiment of the present application provides a text analysis method, and fig. 1A is a schematic view of an implementation flow of the text analysis method in the embodiment of the present application, as shown in fig. 1A, the method includes the following steps:
and step S111, acquiring a text to be analyzed.
Here, the text to be analyzed may be any type of text; such as news-like text.
And step S112, classifying the emotion types contained in the text to be analyzed by adopting the trained neural network to obtain emotion type results.
The trained neural network is obtained by the following processes:
firstly, determining a turning text from an obtained sample text; then, determining an implicit emotion type text from the sample text; inputting the implicit emotion type text and the turning type text into a neural network to obtain a sample emotion type result to which the sample text belongs; and finally, adjusting the network parameters of the neural network by adopting the loss of the sample emotion category result so as to enable the loss of the adjusted emotion category result of the neural network to meet the convergence condition. The loss of the sample emotion classification result can be determined based on the sample emotion classification result and the labeled text in the sample text (e.g., the determined inflection text and the implicit emotion text) to characterize the difference between the sample emotion classification result and the labeled text in the sample text. In some possible implementations, the weights of the neural network are adjusted using the loss of the sample emotion classification result, so that the accuracy of the adjusted emotion classification result of the neural network is greater than or equal to a preset accuracy threshold.
And step S113, outputting the emotion classification result.
Here, the emotion classification result is an emotion classification to which the text to be analyzed belongs, which is obtained by analyzing the text to be analyzed.
In the embodiment of the application, the implicit emotion text sentence and the turning text are input into the neural network together, the neural network is trained, and the trained neural network is obtained, so that the emotion type to which the text to be analyzed belongs can be accurately identified, and the accuracy of emotion analysis on the text to be analyzed is improved.
The embodiment of the present application provides a text analysis method, wherein a training process of a trained neural network is shown in fig. 1B, fig. 1B is a schematic diagram of an implementation flow of the text analysis network training in the embodiment of the present application, and the following description is performed in combination with the steps shown in fig. 1B:
step S101, determining a turning text from the acquired sample text.
Here, the sample text may be any type of text, such as news-like text, sports-like text, entertainment-like text, or chat-like text in a social application. Determining a inflected text containing inflected words from the sample text, and in some possible implementation manners, searching the inflected text matched with the inflected words in the inflected word library from the sample text, for example, dividing the sample text words, for example, performing word division on the sample text according to the lengths of two words to obtain a text set with two words, then searching the inflected words contained in the inflected word library from the text set, and taking the sentence in which the inflected words are located as the inflected text. For example, if 5 sentences are included in the sample text, and the fourth sentence includes a turning type word, the fourth sentence is determined to be a turning type text.
And step S102, determining the implicit emotion type text from the sample text.
Here, the implicit emotion-like text may be understood as a text containing implicit emotion words, or a text in which emotion polarities expressed in combination with a context are inconsistent with emotion polarities of words in the text. For example, "he can" smart ", say" in the head, "although the emotional polarity of" smart "is positive, in this context, it is ironic, so the emotional polarity of the text is negative. In some possible implementations, the sentence is divided by performing sentence division on the sample text, and each sentence is represented by a vector; and then, inputting the vector of each sentence into the trained classification model to realize the classification of the sentences, thereby obtaining the implicit emotion type text.
And S103, training the neural network to be trained according to the implicit emotion text and the turning text to obtain the trained neural network.
In some possible implementations, step S103 may be implemented by the following process:
firstly, the implicit emotion type text and the turning type text are input into a neural network to obtain a sample emotion type result to which the sample text belongs.
And adjusting vectors corresponding to unmarked texts in the sample texts by adopting vectorization representation of the implicit emotion type texts and vectorization representation of the turning type texts, inputting the adjusted vectors and the sample texts into a neural network, and obtaining a preliminary sample emotion type result. The sample emotion classification result is the probability that the sample text belongs to each emotion classification.
And then, adjusting the network parameters of the neural network by adopting the loss of the sample emotion category result so as to enable the loss of the adjusted emotion category result of the neural network to meet a convergence condition.
Here, the loss of the sample emotion classification result is determined based on the sample emotion classification result and the labeled text in the sample text (e.g., the determined turning type text and the implicit emotion type text). In some possible implementations, the weights of the neural network are adjusted using the loss of the sample emotion classification result, so that the accuracy of the adjusted emotion classification result of the neural network is greater than or equal to a preset accuracy threshold.
In the embodiment of the application, the implicit emotion text sentence and the turning text are input into the neural network together, and the neural network is trained, so that the trained neural network can more accurately analyze the emotion polarity of the input text, and the emotion analysis accuracy is obviously improved.
In some embodiments, in order to be able to accurately mark the implicit emotion-like text in the sample text, the step S102 may be implemented by the following steps, as shown in fig. 2A, fig. 2A is a schematic flow chart of another implementation of the text analysis method according to the embodiment of the present application, and the following description is made with reference to fig. 1B:
step S201, extracting features of the sample text to obtain a feature set.
Here, a neural network can be adopted to perform feature extraction on the sample text to obtain a feature set; wherein the feature set is composed of a plurality of vectors characterizing a sentence.
Step S202, determining the category to which each feature in the feature set belongs to obtain a category set.
Inputting the feature set into a trained classifier to obtain the class of each feature; the category comprises an implicit emotion category and a display emotion category, and at least comprises the following steps: happy, sad, angry, calm, excited, sad, etc. In one specific example, a set of vectors characterizing a sentence is input into a trained classifier to determine whether the sentence is an implicit emotion type sentence or a display emotion type sentence.
Step S203, determining the category in the category set as a first characteristic subset of the implicit emotion category.
Here, from the category set, a first feature subset is obtained by determining features of which categories are implicit emotion categories.
Step S204, determining the text to which the first feature subset belongs as the implicit emotion type text.
Here, the first feature subset includes a vector of a plurality of sentences, and the plurality of sentences are determined as the implicit emotion-like text.
Step S205, inputting the implicit emotion type text and the turning type text into a neural network to obtain a sample emotion type result to which the sample text belongs.
And step S206, adjusting the network parameters of the neural network by adopting the loss of the sample emotion category result so as to enable the loss of the adjusted sample emotion category result of the neural network to meet a convergence condition.
In the embodiment of the application, the sample texts represented by the vector quantization are classified by adopting the classifier, so that the implicit emotion type texts are determined, and the sample texts are accurately classified into the implicit emotion type texts and the explicit emotion type texts.
In some embodiments, the emotion polarity of the implicit emotion class text is determined based on the emotion polarity of the context of the implicit emotion class text, after step S204, the method further includes the following steps, as shown in fig. 2B, where fig. 2B is a schematic flow chart of another implementation of the text analysis method according to the embodiment of the present application, and the following description is made with reference to fig. 2A:
step S221, determining a second characteristic subset of which the category is an explicit emotion category in the category set.
Here, the feature of which the category is the explicit emotion category may be directly determined from the category set, so as to obtain a second feature subset; stored in this second feature subset is a vectorized representation of a sentence of the explicit emotion class.
Step S222, determining, from the second feature subset, a target feature of the display emotion class, where the number of texts spaced from the features of the implicit emotion class is less than a preset number of texts.
Here, the target features with the number of texts with the interval between the features of the implicit emotion class and the text less than the preset number of texts are determined from the second feature subset, for example, some target sentences with the smaller interval between the sentences of the implicit emotion class and the sentences of the explicit emotion class are determined from a plurality of sentences representing the explicit emotion class, so as to obtain vectorized representations of the target sentences, namely the target features.
Step S223, determining the emotion polarity of the implicit emotion type text according to the characteristics of the implicit emotion type and the target characteristics.
Here, if the emotion polarity of the text corresponding to the feature of the implicit emotion category is the same as or opposite to the emotion polarity of the text corresponding to the target feature.
In some possible implementation manners, first, according to the features of the implicit emotion category and the target features, a logical relationship between the implicit emotion category text and the target text corresponding to the target features is determined.
Here, the logical relationships include both compliant relationships and neutral relationships, where neutral relationships include inverse and irrelevant. For example, if the features of the implicit emotion category are included in the target features, the logical relationship between the implicit emotion category text and the target text is determined to be a sequential relationship. In a specific example, the feature of the implicit emotion category is included in the target feature, it is understood that the semantic meaning corresponding to the text sentence corresponding to the target feature includes the semantic meaning of the implicit emotion category text (for example, the implicit emotion category text is "a dog playing with a flying disk in snow," and the semantic meaning corresponding to the text sentence corresponding to the target feature is "an animal playing a plastic toy outdoors in cold weather"), in this case, it is stated that there is no transition between the text sentence corresponding to the target feature and the implicit emotion category text, that is, the two are compliant, so that the emotion polarities of the two are consistent. In addition, if the characteristics of the implicit emotion type are not contained in the target characteristics, determining that the logic relationship between the implicit emotion type text and the target text is a neutral relationship; in a specific example, it can be understood that the feature of the implicit emotion category is not included in the target feature, and the semantic corresponding to the text sentence corresponding to the target feature cannot include the semantic of the implicit emotion type text, for example, the semantic corresponding to the text sentence corresponding to the target feature is contradictory to the semantic of the implicit emotion type text (for example, the implicit emotion type text is "one dog plays on snow" and the semantic corresponding to the text sentence corresponding to the target feature is "one cat plays on snow"), or the semantic corresponding to the text sentence corresponding to the target feature is unrelated to the semantic of the implicit emotion type text (for example, the implicit emotion type text is "one dog plays on snow" and the semantic corresponding to the text sentence corresponding to the target feature is "little friend eats").
And then, determining the emotion polarity of the implicit emotion type text according to the logical relationship.
Here, if the logical relationship indicates that no transition occurs between the target text and the implicit emotion type text, and the target text and the implicit emotion type text are coherent and logical, it indicates that the emotion polarity of the implicit emotion type text is the same as that of the target text; if the logical relationship indicates that the turning occurs between the target text and the implicit emotion type text, or the target text and the implicit emotion type text are not coherent, or the logic is not smooth, the emotion polarity of the implicit emotion type text is different from that of the target text. In some possible implementation manners, if the logical relationship is a sequential relationship, determining the emotion polarity of the target text as the emotion polarity of the implicit emotion type text; the sequential relationship can be understood that no transition occurs between the target text and the implicit emotion text, and the target text and the implicit emotion text are coherent and logical; if the logic relation is a neutral relation, determining the emotion polarity of the implicit emotion type text according to the characteristics of the implicit emotion type text and the target characteristics; the neutral relation can be understood as that a transition occurs between the target text and the implicit emotion text, or the target text and the implicit emotion text are not coherent, or the logic is not smooth; in this case, the emotion polarity of the implicit emotion class text is analyzed by analyzing the meaning of the feature representation of the implicit emotion class text (e.g., the meaning of a sentence representation) and the meaning of the feature representation of the context of the implicit emotion class text (i.e., the target feature).
Step S224, inputting the emotion polarity of the implicit emotion type text, the implicit emotion type text and the turning type text into a neural network to obtain a sample emotion type result to which the sample text belongs.
The vectorization representation of the implicit emotion text and the vectorization representation of the turning text are input into the neural network to adjust the vectorization representation of the unmarked text input into the neural network, so that the diversity of input samples is enriched, and the accuracy of the input samples can be ensured.
In some possible implementations, first, unlabeled text in the sample text is determined except for the implicit emotion type text and the inflection type text.
Here, because the implicit emotion type text and the turning type text have been found in the sample text and the two types of texts are annotated with marks; in this way, unmarked text can be determined from the sample text.
Secondly, a first vector corresponding to the unmarked text is determined.
Here, vectorization represents the unlabeled text, and the first vector includes a plurality of vectors, in which one sentence is represented by one vector. For example, if 10 sentences are included in the unlabeled text, 10 vectors may be included in the first vector.
And thirdly, determining a second vector corresponding to the implicit emotion type text.
Here, vectorization represents the implicit emotion-like text, and the second vector includes a plurality of vectors, where one sentence is represented by one vector. For example, if 10 sentences are included in the implicit emotion class text, 10 vectors may be included in the second vector.
And thirdly, determining a third vector corresponding to the turning text.
Here, vectorization represents the turning text, and the third vector includes a plurality of vectors, where one sentence is represented by one vector. For example, if 10 sentences are included in the inflected type text, then 10 vectors may be included in the second vector.
And adjusting the first vector according to the second vector and the third vector to obtain an updated first vector.
Here, since the second vector and the third vector are both marked vectors, the unmarked first vector is adjusted by using the marked vectors, so that the updated first vector has a pseudo label, thereby being capable of updating the richness of the sample types in the first vector and the accuracy of the updated first vector.
And finally, inputting the updated first vector and the emotion polarity of the implicit emotion type text into the neural network to obtain a sample emotion type result to which the sample text belongs.
Taking the updated first vector and the emotion polarity of the implicit emotion type text as training samples, and training the neural network to obtain sample emotion type results; then, determining the loss between the sample emotion classification result and the updated pseudo label of the first vector, the label of the second vector and the label of the third vector; finally, network parameters such as the weight and the channel number of the neural network are adjusted based on the loss, so that the accuracy of emotion category results output by the trained neural network is greater than a preset accuracy threshold value, namely, a convergence condition is met; therefore, the implicit emotion type text sentence and the turning type text are input into the neural network together, and the neural network is trained, so that the trained neural network can analyze the emotion polarity of the input text more accurately.
In some embodiments, the emotion analysis is performed by: and (3) judging the polarity of phrases, symbols and the like in the text by means of methods such as an emotion dictionary and the like based on a set of artificially specified rules, and finally combining into a final emotion result.
In one specific example, first, two polarized word lists and a score corresponding to each word are defined (e.g., a positive word list such as excellent 3, nice 4, etc. and a negative word list such as difficult-4 and bad-5, etc.); then, for a given text, calculating the number of positive words and the number of negative words and corresponding scores; if the final score is greater than a threshold, the emotion is positive, otherwise the emotion is negative.
In this way, the rule-based method has a great possibility of judging the emotion polarity of the wrong text when the implicit emotional sentence appears in the text, because the implicit emotional sentence does not have obvious emotional words, and the method is an emotional sentence based on the context. For the sentences with emotion turning, although rules can be defined to avoid the sentences with emotion turning to a certain extent, misjudgment can also occur when no emotion words appear after turning words or the difference between the number of the emotion words before and after the words is too large.
In some embodiments, the emotion analysis may be implemented by: the technical method based on automatic learning does not depend on artificially formulated rules, but learns and judges the emotion polarity of the text from data by an algorithm model.
In a specific example, firstly, in the training process, data and corresponding labels are input into a model, so that the model learns the intrinsic rules among the data; then, only text data is input into the model in the prediction process, so that the text can predict the corresponding label.
The automatic learning mode aims to enable the model to understand texts like human beings, but the automatic learning mode is difficult to achieve in practical use, and the problems of the rule-based method cannot be solved well.
Based on this, an embodiment of the present application provides a text analysis method, as shown in fig. 3, fig. 3 is a system framework diagram for implementing the text analysis method according to the embodiment of the present application, and the system includes the following modules:
a news text module 301 for inputting original news text and inputting the news text to a text processing module 302.
The text processing module 302 is configured to perform data cleaning on an input original news text, and input the cleaned text into the implicit emotion processing module 303 and the turning processing module 304.
And the implicit emotion processing module 303 is configured to perform implicit emotion sentence recognition by using a text implication technology.
In some possible implementation manners, the implicit emotion processing module 303 is mainly used to help the model judgment when the influence of the implicit emotion sentence on the overall emotion of the article is identified and analyzed. The flow of the whole implicit emotion sentence processing module is shown in fig. 4, fig. 4 is a frame diagram for implementing implicit emotion analysis in the embodiment of the present application, and the process of performing implicit emotion analysis on input text data is as follows:
first, a vectorized text representation is made of text data 401 (i.e., sample text) to implement a text representation 402.
Next, the vector after the text representation is input into the classification module 403, and in the classification module 403, the implicit emotional sentence in the text data 401 is identified, and the implicit emotional sentence 404 (i.e., the implicit emotion-type text) is output.
Here, the implicit emotion sentence is identified based on the Bidirectional Encoder (BERT) encodingFig. 5A is another frame diagram of the implicit emotion analysis according to the embodiment of the present application, as shown in fig. 5A, in which the input sentence 501 in fig. 5A is a single sentence, and a classification tag is output through the intermediate neural network 502. Tok 1 to Tok N are labels (tokens) of input sentences E1To ENIndicating input embedding, T1To TNContext representation representing different tokens, C is a vector used as output class, CLS is a special symbol for classifying the output.
Thirdly, the implicit emotional sentence 404 is input into the text implication model 405, in the text implication model 405, logical reasoning is performed on the relation between the input implicit emotional sentence 404 and the context sentence to determine a classification label to which the logical relation belongs, and text vectors of the type and the implicit emotional sentence are input into the deep learning model 406, so that the deep learning model 406 outputs the probability that the input text belongs to each category.
The text is vectorized and then input into a text implication model to identify the implicit emotional sentences, the identified implicit emotional sentences are input into the text implication model to carry out logic reasoning, if the implicit emotional sentences have a logic relation with the explicit emotional sentences of the context, the emotional polarities are consistent to a great extent, and finally the obtained implicit emotional sentences are vectorized and expressed and then input into a deep learning model.
In some embodiments, the implementation of the text implication module 405 is performed on the basis of BERT, as shown in fig. 5B, fig. 5B is another frame diagram of the implicit emotion analysis in the embodiment of the present application, in which the input sentences 511 and 512 in fig. 5B are two sentences, and the classification labels of each sentence are output through the intermediate neural network 513. Tok 1 to Tok N are labels of the input sentence 511, Tok 11 to Tok 1N are labels of the input sentence 512; e1To ENInput embedding, E, representing sentences 5121To ENInput embedding representing sentence 511; t is1To TNContext representation, T, representing different tokens in the sentence 5111To TNA context representation representing the different tokens in the sentence 512; c is a vector used as an output class and CLS is a special symbol used to classify the output. SEP is byIn the special symbols separating the non-contiguous token sequences. The output labels of the text inclusion module 405 are inclusion, contradiction and neutral. The text inclusion module is helpful for reasoning and judging the preceding and following text of the model, and enhances the emotion analysis capability of the model.
First, the implicit emotional sentences are recognized for the entire text by using the model of fig. 5A, and the implicit emotional sentences are extracted. The number and the length of the implicit emotional sentences of a certain text can be counted, the maximum value is taken or a threshold value is set as the maximum number and the length of the implicit emotional sentences of each article to be identified, and then the model is sent to identify the articles. If the number reaches a threshold value, the recognition is stopped, and if the number is insufficient, the vector is filled up with patches. And then, the structure of fig. 5B is used for respectively carrying out text inclusion processing on the identified implicit emotion sentences and the upper sentences and the lower sentences. The text contains two words which need to be input, the lengths of the two words also need to be fixed, the length of the implicit sentence is determined by an implicit sentence recognition model, the length of the context sentence can be counted to take the maximum value or set a threshold value, and the length of the output vector is fixed because the input length is fixed. Finally all vectors T are input into the deep learning model 406.
And the turning processing module 304 is used for helping the model judgment after identifying and analyzing the influence of the implicit emotional sentences on the overall emotion of the article.
Here, as shown in fig. 6, fig. 6 is a frame diagram for implementing the break-over sentence analysis according to the embodiment of the present application, and a process of performing the break-over analysis on the input text data is as follows:
first, text data 601 is compared with a turning word dictionary 602 to identify a turning sentence 603 (i.e., turning-type text).
Then, the text representation 604 is performed on the recognized turning sentence 603, and the vector after the text representation is input to the deep learning model 605.
In the embodiment of the application, the turning sentence analysis and identification mainly identifies the sentence containing the turning word by means of the turning dictionary. And finally, performing text representation on the turning sentence by using BERT and inputting the turning sentence into a deep learning module. The input length of the turning sentence is also set with a certain threshold value or takes the maximum value as an implicit processing module, and under the condition of insufficient length, the turning sentence is filled with patches; and if the text expression vector exceeds the preset text expression vector, intercepting the exceeding part, inputting the vector of the turning sentence into a BETR model to obtain a text expression vector, and splicing the text expression vector into the final deep learning model.
The text representation module 305, which converts words into vector forms that the model can understand, is an important factor that affects the performance of the system.
Here, as shown in fig. 7, fig. 7 is a block diagram for implementing text representation in the embodiment of the present application, and a process of performing text representation on input text data is as follows:
firstly, inputting text data 701 for pre-training into a BERT model 702 to obtain a preliminary text vector; based on the text vector (i.e. the second vector) of the implicit emotion sentence obtained by the implicit emotion module 703 and the text vector (i.e. the first vector) of the turning sentence obtained by the turning processing module 704, the preliminary text vector is adjusted to obtain a final text vector representation 705.
In the embodiment of the application, the text representation module performs pre-training on a large amount of text data, and then performs fine adjustment by using the implicit emotion sentences and the turning sentences to obtain the final text representation vector. The vector fine tuning mode is to add two pre-training tasks (an implicit emotion sentence task and a turning sentence task) on the basis of BERT.
Wherein, the implicit emotional sentence task: during training, the same number of sentences as the implicit emotional sentences are taken from the training text for training, for example:
input for this battery
Label (Label) ═ implicit (IsImplicit)
Input (Input) today Haoka [ MASK ]
Label (Label) ═ non-implicit (NotImplicit)
Turning sentence tasks: during training, sentences with the same number as turning sentences are taken from the training text for training.
Input (Input) although he learns seriously, it does not ideally result in MASK
Label (Label) ═ turn (IsTurn)
Input (Input) is considered as a difficult one by you to have no [ MASK ] sample
Label (Label) ═ non-turn (NotTurn)
Thus, the model can learn the expression modes of the implicit emotional sentences and the turning sentences.
And a deep learning module 306, configured to perform comprehensive emotion polarity analysis on the text by using BiLSTM + Attention, and output the obtained emotion analysis result (i.e., the classification category in the emotion analysis result output module 307).
Here, as shown in fig. 8, fig. 8 is a block diagram for implementing emotion polarity analysis according to an embodiment of the present application, and a process of text representation of input text data is as follows:
first, a text X is input in the embedding layer, denoted as X ═ X (X)1,x2,…,xT) Wherein x isiRepresenting a text vector.
Next, the text vector is input into the bidirectional LSTM layer 801, and any text vector x is outputiCorresponding hidden state hi
Here, in the bidirectional LSTM layer 801, the state is hidden
Figure BDA0002597122290000141
Again, the bi-directional LSTM layer 801 inputs the output result into the bi-directional LSTM layer 802 to update the hidden state hi
Finally, the hidden states output in the bi-directional LSTM layer 802 are input into an attention (attention) layer 803 to weight each hidden state as shown in the following equation, ei=tanh(Whhi+bh),ei∈[-1,1]Wherein, in the step (A),
Figure BDA0002597122290000142
aithe probability of each of the categories is represented,
Figure BDA0002597122290000143
Whand bnTo representA weight vector.
Finally, the weighted vectors output by the attention layer 803 are input to a normalization (softmax) layer 804, where the input vectors are classified to output probabilities that each input text sentence belongs to different categories.
Here, the vector of the text is input into the BiLSTM to obtain a vector h _ i, a vector r is obtained after going through an attention (attention) layer, and finally the probability of each category is obtained by softmax. The embedding layer of the deep learning model is formed by adding a vector of an implicit processing module and a vector of a turning processing module and a vector of the whole text, the vector of the whole text also needs to be statistically maximized or a threshold value is set, the three vectors are spliced on the length of a sequence, and the length of the last input vector is fixed because the length of the vector is limited before.
The text analysis method provided in the embodiment of the application can be completed through the following two stages, namely, the first stage is to perform a system pre-training process, for example, firstly, before the whole system is operated, a pre-training task of a BERT model is completed; then, large-scale unmarked text data is used for pre-training to obtain a universal text expression vector, and the text data can be obtained by crawling and cleaning from Baidu encyclopedia, news websites and the like. And finally, fine adjustment is carried out on the previous text representation by using the small-scale marked implicit emotional sentences and turning sentences, and the small-scale marked data needs manual marking. The second stage is that the system training process, for example, when the system training, the inputted text enters the text representation module, the implicit emotion processing module, and the turning processing module, then the turning sentence is inputted into the text representation module to obtain the vector representation of the turning sentence, the implicit emotion sentence directly inputs the obtained vector into the BilSTM, and finally the vectors are inputted into the model as the embedding layer of the BilSTM to obtain the result.
In the embodiment of the application, the deep learning module uses BilSTM + attention as a model for finally performing text polarity analysis, and an implicit emotion processing module, a turning processing module and a text representation module are added on the model. The implicit emotion processing module uses a text implication technology to recognize implicit emotion sentences, the turning processing module uses a turning dictionary to recognize turning sentences, and the text representation module adds two pre-training subtasks on the basis of BERT and finely adjusts text representation vectors. The three modules are all used for enabling the final deep learning model to learn the expression modes of the implicit emotional sentences and the turning sentences, and the emotion analysis accuracy rate is improved.
An embodiment of the present application provides a text analysis apparatus, fig. 9 is a schematic structural diagram of the text analysis apparatus in the embodiment of the present application, and as shown in fig. 9, the apparatus 900 includes:
a first obtaining module 901, configured to obtain a text to be analyzed;
a first classification module 902, configured to classify, by using a trained neural network, emotion categories included in the text to be analyzed, so as to obtain emotion category results; the trained neural network is obtained by training according to the turning text and the implicit emotion text;
and a first output module 903, configured to output the emotion classification result.
In the above apparatus, the apparatus further comprises:
the first determining module is used for determining a turning text from the obtained sample text;
a second determining module, configured to determine an implicit emotion type text from the sample text;
and the first input module is used for training a neural network to be trained according to the implicit emotion type text and the turning type text to obtain the trained neural network.
In the above apparatus, the first determining module is configured to:
and searching for turning type texts matched with turning words in a turning word library from the sample texts.
In the above apparatus, the second determining module includes:
the first extraction submodule is used for extracting the characteristics of the sample text to obtain a characteristic set;
the first determining submodule is used for determining the category to which each feature in the feature set belongs to obtain a category set;
the second determining submodule is used for determining that the category in the category set is a first characteristic subset of the implicit emotion category;
and the third determining submodule is used for determining the text to which the first characteristic subset belongs as the implicit emotion type text.
In the above apparatus, the apparatus further comprises:
a third determining module, configured to determine a second feature subset of the category set, where the category is an explicit emotion category;
a fourth determining module, configured to determine, from the second feature subset, a target feature of the display emotion class for which the number of texts spaced from the features of the implicit emotion class is smaller than a preset number of texts;
a fifth determining module, configured to determine, according to the features of the implicit emotion category and the target features, an emotion polarity of the implicit emotion category text;
and the second input module is used for inputting the emotion polarity of the implicit emotion type text, the implicit emotion type text and the turning type text into a neural network so as to obtain a sample emotion type result to which the sample text belongs.
In the above apparatus, the fifth determining module includes:
a fourth determining submodule, configured to determine, according to the features of the implicit emotion category and the target features, a logical relationship between the implicit emotion category text and the target text corresponding to the target features;
and the fifth determining submodule is used for determining the emotion polarity of the implicit emotion type text according to the logic relationship.
In the above apparatus, the fourth determination sub-module includes:
a first determining unit, configured to determine that a logical relationship between the implicit emotion class text and the target text is a compliance relationship if the feature of the implicit emotion class is included in the target feature;
and a second determining unit, configured to determine that the logical relationship between the implicit emotion class text and the target text is a neutral relationship if the feature of the implicit emotion class is not included in the target feature.
In the above apparatus, the fifth determining sub-module includes:
a third determining unit, configured to determine, if the logical relationship is a sequential relationship, an emotion polarity of the target text as an emotion polarity of the implicit emotion type text;
and a fourth determining unit, configured to determine, if the logical relationship is a neutral relationship, an emotion polarity of the implicit emotion type text according to the feature of the implicit emotion type text and the target feature.
In the above apparatus, the first input module includes:
a sixth determining submodule, configured to determine an unmarked text in the sample text, except for the implicit emotion-like text and the turning-like text;
a seventh determining submodule, configured to determine a first vector corresponding to the unmarked text;
the eighth determining submodule is used for determining a second vector corresponding to the implicit emotion type text;
a ninth determining submodule, configured to determine a third vector corresponding to the turning text;
the first adjusting submodule is used for adjusting the first vector according to the second vector and the third vector to obtain an updated first vector;
and the first input submodule is used for inputting the updated first vector and the emotion polarity of the implicit emotion type text into the neural network so as to obtain a sample emotion type result to which the sample text belongs.
The text analysis network training device provided by the embodiment of the application comprises modules, sub-modules and units, and can be realized by a processor in a terminal; of course, the implementation can also be realized through a specific logic circuit; in implementation, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.
It should be noted that the above description of the embodiment of the apparatus, similar to the above description of the embodiment of the method, has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
Accordingly, embodiments of the present application provide a computer storage medium having stored therein computer-executable instructions configured to perform a text analysis method provided in other embodiments of the present application, or to cause a processor to execute, implement a text analysis method.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, etc.) to execute the method described in the embodiments of the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (11)

1. A text analysis method, characterized in that the method comprises
Acquiring a text to be analyzed;
classifying the emotion classes contained in the text to be analyzed by adopting a trained neural network to obtain emotion class results; the trained neural network is obtained by training according to the turning text and the implicit emotion text;
and outputting the emotion classification result.
2. The method of claim 1, wherein the training process of the trained neural network comprises:
determining a turning text from the obtained sample text;
determining implicit emotion type text from the sample text;
and training a neural network to be trained according to the implicit emotion text and the turning text to obtain the trained neural network.
3. The method as recited in claim 2, wherein said determining a hinge type text from the obtained sample text comprises:
and searching for turning type texts matched with turning words in a turning word library from the sample texts.
4. The method of claim 3, wherein said determining implicit emotion type text from said sample text comprises:
extracting features of the sample text to obtain a feature set;
determining the category to which each feature in the feature set belongs to obtain a category set;
determining that a category in the category set is a first feature subset of an implicit emotion category;
and determining the text to which the first feature subset belongs as the implicit emotion class text.
5. The method as recited in claim 4, wherein after said determining an implicit emotion type text from said sample text, said method further comprises:
determining a second feature subset of the category set, wherein the category is an explicit emotion category;
determining target features of the display emotion class, wherein the number of texts spaced from the features of the implicit emotion class is smaller than the preset number of texts, from the second feature subset;
determining the emotion polarity of the implicit emotion type text according to the characteristics of the implicit emotion type and the target characteristics;
and inputting the emotion polarity of the implicit emotion type text, the implicit emotion type text and the turning type text into a neural network to obtain a sample emotion type result to which the sample text belongs.
6. The method of claim 5, wherein said determining the emotion polarity of the implicit emotion class text from the features of the implicit emotion class and the target features comprises:
determining the logic relation between the implicit emotion type text and a target text corresponding to the target feature according to the features of the implicit emotion type and the target feature;
and determining the emotion polarity of the implicit emotion type text according to the logic relationship.
7. The method of claim 6, wherein the determining the logical relationship between the implicit emotion class text and the text corresponding to the target feature according to the features of the implicit emotion class and the target feature comprises:
if the characteristics of the implicit emotion category are contained in the target characteristics, determining that the logic relation between the implicit emotion category text and the target text is a sequential relation;
and if the characteristics of the implicit emotion category are not contained in the target characteristics, determining that the logic relationship between the implicit emotion category text and the target text is a neutral relationship.
8. The method of claim 7, wherein said determining an emotion polarity for said implicit emotion class text based on said logical relationship comprises:
if the logic relation is a sequential relation, determining the emotion polarity of the target text as the emotion polarity of the implicit emotion type text;
and if the logical relationship is a neutral relationship, determining the emotion polarity of the implicit emotion type text according to the characteristics of the implicit emotion type text and the target characteristics.
9. The method of claim 5, wherein before said training a neural network to be trained based on said implicit emotion-like text and said transition-like text to obtain said trained neural network, said method further comprises:
determining unmarked text in the sample text except the implicit emotion type text and the turning type text;
determining a first vector corresponding to the unmarked text;
determining a second vector corresponding to the implicit emotion type text;
determining a third vector corresponding to the turning text;
adjusting the first vector according to the second vector and the third vector to obtain an updated first vector;
inputting the updated first vector and the emotion polarity of the implicit emotion type text into the neural network to obtain a sample emotion type result to which the sample text belongs;
and training the neural network to be trained according to the sample emotion classification result to obtain the trained neural network.
10. A text analysis apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring a text to be analyzed;
the first classification module is used for classifying the emotion types contained in the text to be analyzed by adopting a trained neural network to obtain emotion type results; the trained neural network is obtained by training according to a turning text and an implicit emotion text;
and the first output module is used for outputting the emotion classification result.
11. A computer-readable storage medium having stored thereon computer-executable instructions for causing a processor, when executed, to implement the method of any one of claims 1 to 9.
CN202010712773.8A 2020-07-22 2020-07-22 Text analysis method, text analysis device and storage medium Pending CN113971201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010712773.8A CN113971201A (en) 2020-07-22 2020-07-22 Text analysis method, text analysis device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010712773.8A CN113971201A (en) 2020-07-22 2020-07-22 Text analysis method, text analysis device and storage medium

Publications (1)

Publication Number Publication Date
CN113971201A true CN113971201A (en) 2022-01-25

Family

ID=79585110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010712773.8A Pending CN113971201A (en) 2020-07-22 2020-07-22 Text analysis method, text analysis device and storage medium

Country Status (1)

Country Link
CN (1) CN113971201A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492456A (en) * 2022-01-26 2022-05-13 北京百度网讯科技有限公司 Text generation method, model training method, device, electronic equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492456A (en) * 2022-01-26 2022-05-13 北京百度网讯科技有限公司 Text generation method, model training method, device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
CN113094578B (en) Deep learning-based content recommendation method, device, equipment and storage medium
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN111177326A (en) Key information extraction method and device based on fine labeling text and storage medium
CN112364170B (en) Data emotion analysis method and device, electronic equipment and medium
CN112906397B (en) Short text entity disambiguation method
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
CN114756681B (en) Evaluation and education text fine granularity suggestion mining method based on multi-attention fusion
CN112188312A (en) Method and apparatus for determining video material of news
CN115080750B (en) Weak supervision text classification method, system and device based on fusion prompt sequence
CN111639185B (en) Relation information extraction method, device, electronic equipment and readable storage medium
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
CN114064901B (en) Book comment text classification method based on knowledge graph word meaning disambiguation
Laxmi et al. Cyberbullying detection on Indonesian twitter using doc2vec and convolutional neural network
CN111737475B (en) Unsupervised network public opinion spam long text recognition method
CN113971201A (en) Text analysis method, text analysis device and storage medium
CN110888983B (en) Positive and negative emotion analysis method, terminal equipment and storage medium
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
CN111985223A (en) Emotion calculation method based on combination of long and short memory networks and emotion dictionaries
CN116522165A (en) Public opinion text matching system and method based on twin structure
CN115878847A (en) Video guide method, system, equipment and storage medium based on natural language
CN113361615B (en) Text classification method based on semantic relevance
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination