CN110046248B - Model training method for text analysis, text classification method and device - Google Patents

Model training method for text analysis, text classification method and device Download PDF

Info

Publication number
CN110046248B
CN110046248B CN201910176632.6A CN201910176632A CN110046248B CN 110046248 B CN110046248 B CN 110046248B CN 201910176632 A CN201910176632 A CN 201910176632A CN 110046248 B CN110046248 B CN 110046248B
Authority
CN
China
Prior art keywords
word
vector
sentence
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910176632.6A
Other languages
Chinese (zh)
Other versions
CN110046248A (en
Inventor
蒋亮
张家兴
温祖杰
梁忠平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910176632.6A priority Critical patent/CN110046248B/en
Publication of CN110046248A publication Critical patent/CN110046248A/en
Application granted granted Critical
Publication of CN110046248B publication Critical patent/CN110046248B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the specification provides a model training method, a text classification method and a device for text analysis, wherein the method comprises the following steps: firstly, utilizing a first bidirectional converter model, and aiming at each word in a first training sentence, obtaining a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word; then, utilizing a first bidirectional converter model, and aiming at each word in the first training sentence, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word; then, according to the position of each word in the first training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; and training the first bidirectional converter model and the first language model aiming at the target word vector corresponding to each position by utilizing the first language model, so that the operation speed is high, and the robustness of the model can be ensured.

Description

Model training method for text analysis, text classification method and device
Technical Field
One or more embodiments of the present specification relate to the field of computers, and more particularly, to a model training method, a text classification method, and an apparatus for text analysis.
Background
The converter (transducer) model is a neural network model proposed by Google's Ashish Vaswani et al in 2017, can be used for deep modeling of sequence data, can replace a long and short memory network (long short term memory, LSTM) model, and has the characteristic of high operation speed.
The transducer model only processes the sequence from one direction, only considers the information of all the previous positions in the processing sequence, and does not consider the information of the following positions, which greatly limits the robustness of the model.
Therefore, an improved scheme is desired, and the characteristic of high running speed of a transducer model can be utilized when the sequence data is subjected to depth modeling, so that the robustness of the model is ensured.
Disclosure of Invention
One or more embodiments of the present disclosure describe a model training method, a text classification method, and an apparatus for text analysis, which can utilize the characteristic of fast operation of a transform model and ensure the robustness of the model when deep modeling is performed on sequence data.
In a first aspect, a model training method for text analysis is provided, the method comprising:
obtaining a forward vector corresponding to each word in a first training sentence by using a first bidirectional converter model based on an initial word vector of the word and the above information of the word in the first training sentence;
obtaining a reverse vector corresponding to each word in the first training sentence by using the first bidirectional converter model based on an initial word vector of the word and the context information of the word in the first training sentence;
according to the position of each word in the first training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position;
predicting a target word vector corresponding to each position in the first training sentence by using a first language model to obtain a first probability of a word corresponding to the position;
training the first bi-directional converter model and the first language model by minimizing a first loss function associated with the first probability to obtain a trained second bi-directional converter model and a second language model.
In a possible implementation manner, the obtaining, with the first bi-directional converter model, for each word in the first training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the first training sentence includes:
extracting, with the first bi-directional converter model, for each word in the first training sentence, a plurality of important information from different angles using a self-attention mechanism based on an initial word vector for the word and the contextual information of the word in the first training sentence;
and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
In a second aspect, a model training method for text analysis is provided, the method comprising:
obtaining, for each word in a second training sentence, a forward vector corresponding to the word based on an initial word vector of the word and the context information of the word in the second training sentence, using the second bidirectional converter model trained by the method according to the first aspect;
obtaining, by using the second bidirectional converter model, for each word in the second training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the second training sentence;
According to the position of each word in the second training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position;
predicting, by using the second language model trained by the method according to the first aspect, a first probability of a word corresponding to each position in the second training sentence, with respect to a target word vector corresponding to the position; generating a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence;
predicting a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence by using a multi-classification model;
training the second bidirectional converter model, the second language model and the multi-classification model by minimizing the sum of the first loss function and the second loss function to obtain a third bidirectional converter model, a third language model and a second multi-classification model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability.
In a possible implementation manner, the obtaining, with the second bidirectional converter model, for each word in the second training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the second training sentence includes:
extracting, for each word in the second training sentence, a plurality of important information from different angles using a self-attention mechanism based on an initial word vector of the word and the context information of the word in the second training sentence using the second bi-directional converter model;
and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
In a possible implementation manner, the generating a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence includes:
and taking an average value of the target word vectors corresponding to each position in the second training sentences, and taking the average value as the representation vector of the sentences corresponding to the second training sentences.
In one possible implementation, the training the second bi-directional converter model, the second language model, and the multi-classification model by minimizing a sum of the first loss function and the second loss function includes:
Minimizing the sum of the first and second loss functions by a gradient descent method to determine model parameters of the second bi-directional converter model, the second language model, and the multi-classification model.
In a third aspect, a text classification method is provided, the method comprising:
obtaining, for each word in a sentence to be classified, a forward vector corresponding to the word based on an initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model trained by the method according to the second aspect;
obtaining a reverse vector corresponding to each word in the sentence to be classified based on the initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model;
according to the position of each word in the sentence to be classified, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position;
generating a representation vector of a sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified;
And performing text classification on the sentence to be classified based on the representation vector of the sentence corresponding to the sentence to be classified by using the second multi-classification model trained by the method according to the second aspect.
In a fourth aspect, there is provided a model training apparatus for text analysis, the apparatus comprising:
the forward vector generation unit is used for obtaining a forward vector corresponding to each word in the first training sentence by utilizing the first bidirectional converter model based on the initial word vector of the word and the above information of the word in the first training sentence;
the reverse vector generation unit is used for obtaining a reverse vector corresponding to each word in the first training sentence by utilizing the first bidirectional converter model based on the initial word vector of the word and the context information of the word in the first training sentence;
the word vector generation unit is used for splicing the forward vector of the word before the position obtained by the forward vector generation unit and the reverse vector of the word after the position obtained by the reverse vector generation unit according to the position of each word in the first training sentence to be used as a target word vector corresponding to the position;
The prediction unit is used for predicting and obtaining a first probability of a word corresponding to each position in the first training sentence by using a first language model;
and the model training unit is used for training the first bidirectional converter model and the first language model by minimizing a first loss function related to the first probability obtained by the prediction unit to obtain a trained second bidirectional converter model and a trained second language model.
In a fifth aspect, there is provided a model training apparatus for text analysis, the apparatus comprising:
a forward vector generating unit, configured to obtain, for each word in a second training sentence, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the second training sentence, using the second bidirectional converter model trained by the method according to the first aspect;
the reverse vector generation unit is used for obtaining a reverse vector corresponding to each word in the second training sentence by utilizing the second bidirectional converter model based on the initial word vector of the word and the context information of the word in the second training sentence;
The word vector generation unit is used for splicing the forward vector of the word before the position obtained by the forward vector generation unit and the reverse vector of the word after the position obtained by the reverse vector generation unit according to the position of each word in the second training sentence to be used as a target word vector corresponding to the position;
a first prediction unit, configured to predict, using the second language model trained by the method according to the first aspect, a first probability of a word corresponding to each position in the second training sentence, for a target word vector corresponding to the position;
a sentence vector generating unit, configured to generate a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence obtained by the word vector generating unit;
a second prediction unit, configured to predict, using a multi-classification model, a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence obtained by the sentence vector generation unit;
the model training unit is used for training the second bidirectional converter model, the second language model and the multi-classification model by minimizing the sum of the first loss function and the second loss function to obtain a third bidirectional converter model, a third language model and a second multi-classification model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability.
In a sixth aspect, there is provided a text classification apparatus, the apparatus comprising:
a forward vector generating unit, configured to obtain, for each word in a sentence to be classified, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the sentence to be classified, using the third bidirectional converter model trained by the method according to the second aspect;
the reverse vector generation unit is used for obtaining a reverse vector corresponding to each word in the sentence to be classified based on the initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model;
the word vector generation unit is used for splicing the forward vector of the word before the position obtained by the forward vector generation unit and the reverse vector of the word after the position obtained by the reverse vector generation unit according to the position of each word in the sentence to be classified, and the forward vector and the reverse vector are used as target word vectors corresponding to the position;
a sentence vector generating unit, configured to generate a representation vector of a sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified obtained by the word vector generating unit;
And a text classification unit, configured to perform text classification on the sentence to be classified based on the representation vector of the sentence corresponding to the sentence to be classified obtained by the sentence vector generating unit, using the second multi-classification model trained by the method according to the second aspect.
In a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second or third aspect.
In an eighth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method of the first or second or third aspect.
According to the method and the device provided by the embodiment of the specification, on one hand, a first bidirectional converter model is utilized, and for each word in a first training sentence, a forward vector corresponding to the word is obtained based on an initial word vector of the word and the above information of the word in the first training sentence; then, utilizing the first bidirectional converter model, aiming at each word in the first training sentence, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word in the first training sentence; then, according to the position of each word in the first training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; then, predicting a first probability of a word corresponding to each position in the first training sentence by using a first language model aiming at a target word vector corresponding to the position; and finally, training the first bidirectional converter model and the first language model by minimizing a first loss function related to the first probability to obtain a trained second bidirectional converter model and a trained second language model. In the embodiment of the present disclosure, unlike a general unidirectional transducer model, the bidirectional transducer model fully considers the context information of each word, rather than only considering the context information, and when the sequence data is deeply modeled, the characteristic of fast running speed of the transducer model can be utilized, and robustness of the model is ensured.
On the other hand, firstly, the second bidirectional converter model trained by the method according to the first aspect is utilized, and for each word in a second training sentence, a forward vector corresponding to the word is obtained based on an initial word vector of the word and the above information of the word in the second training sentence; then, utilizing the second bidirectional converter model, aiming at each word in the second training sentence, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word in the second training sentence; then, according to the position of each word in the second training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; then, predicting a first probability of a word corresponding to each position in the second training sentence according to the target word vector corresponding to the position by using the second language model trained by the method according to the first aspect; generating a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence; then, predicting a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence by utilizing a multi-classification model; finally, training the second bidirectional converter model, the second language model and the multi-classification model by minimizing the sum of the first loss function and the second loss function to obtain a third bidirectional converter model, a third language model and a second multi-classification model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability. In the embodiment of the specification, the characteristic of high running speed of a transducer model can be utilized when the sequence data is subjected to depth modeling, and the robustness of the model is ensured; moreover, on the basis of model training of the bidirectional converter model and the language model in the first aspect, the bidirectional converter model, the language model and the multi-classification model are further subjected to joint training, so that a better model training effect is achieved.
In yet another aspect, the third bidirectional converter model trained by the method according to the second aspect is first utilized to obtain, for each word in a sentence to be classified, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the sentence to be classified; then, utilizing the third bidirectional converter model, aiming at each word in the sentence to be classified, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word in the sentence to be classified; then, according to the position of each word in the sentence to be classified, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; generating a representation vector of the sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified; finally, the second multi-classification model trained by the method according to the second aspect is used for carrying out text classification on the sentences to be classified based on the representation vectors of the sentences corresponding to the sentences to be classified. In the embodiment of the specification, when the depth modeling is performed on the sequence data, the characteristic of high running speed of the transducer model can be utilized, the robustness of the model is ensured, and the bidirectional converter model and the multi-classification model after two-stage training are beneficial to obtaining a better text classification result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a model training method flow diagram for text analysis, according to one embodiment;
FIG. 3 illustrates a model training method flow diagram for text analysis according to another embodiment;
FIG. 4 shows a flow chart of a text classification method according to another embodiment;
FIG. 5 is a schematic diagram of the internal structure of a unidirectional transducer model according to an embodiment of the present disclosure;
FIG. 6 shows a schematic block diagram of a model training apparatus for text analysis, according to one embodiment;
FIG. 7 shows a schematic block diagram of a model training apparatus for text analysis according to another embodiment;
fig. 8 shows a schematic block diagram of a text classification apparatus according to another embodiment.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. The implementation scenario involves text classification, and training of models for text analysis. Referring to fig. 1, this implementation scenario involves three types of models: bi-directional converter models (also known as bi-directional converter models), language models, and multi-class models (also known as multi-class models), multiple models can be trained in combination to form a multi-task study when the models are trained.
Wherein, text classification: text classification refers to the task of classifying text entered by a user into one or more of several classes that are predefined.
Language model: the language model judges whether a sentence belongs to the correct natural language by calculating the probability of the sentence in the natural language, and plays an important role in tasks such as information retrieval, machine translation, voice recognition and the like. A neural language model is one of language models, which models the probability of each sentence occurrence using a neural network. By learning from a large number of corpora, the neuro-linguistic model can learn the internal rules and knowledge of the language.
Multitasking learning: the multi-task learning is a machine learning research field, and aims to put a plurality of related tasks into the same model or frame for combined learning so as to achieve the effect of improving each task through knowledge transfer among the tasks.
As shown in fig. 1, training of the model is divided into two phases: a pre-training stage and a fine tuning stage.
In the pre-training phase, for a sentence s= { w composed of N words 1 ,w 2 ,…,w N The bidirectional transducer model first converts S into N vectors v 1 ,v 2 ,…,v N Each representing an output vector of a word, the vector being generated taking into account the context information of each word. Then use the output vector v of each word i Predicting word w of current position by language model i Thereby model training the bi-directional transducer model and the language model based on the prediction results.
In the fine tuning stage, the labeled text data is used, and the sentence S= { w is also processed through the bidirectional transducer model 1 ,w 2 ,…,w N Conversion into vector { v } 1 ,v 2 ,…,v N Then output all words with vector v 1 ,v 2 ,…,v N Mean value of }As the expression vector of the sentence, the expression vector of the sentence is used as the input of a multi-classifier, text classification is carried out through the multi-classifier, meanwhile, the output vector of each word is used as the input of a language model, the current word is predicted through the language model, and the classification task in the fine tuning stage and the prediction task of the language model form multi-task learning, so that the generalization capability of the multi-classification model can be improved.
In the prediction stage, for the sentences input by the user, the representing vectors of the sentences are obtained after the average value of the vectors output by the bidirectional convertors is taken, and the representing vectors of the sentences are input into a multi-classifier for classification.
In the embodiment of the present specification, the bidirectional transducer model and the language model are first pre-trained, and the bidirectional transducer model fully considers the context information of each word, not just the context information. And then fine tuning is performed on the text classification task by using a pre-trained transducer model, so that the robustness of the model is improved.
FIG. 2 illustrates a model training method flow diagram for text analysis according to one embodiment, which may correspond to the pre-training phase mentioned in the application scenario illustrated in FIG. 1. As shown in fig. 2, the model training method for text analysis in this embodiment includes the steps of:
first, in step 21, for each word in the first training sentence, a forward vector corresponding to the word is obtained based on the initial word vector of the word and the context information of the word in the first training sentence by using the first bi-directional converter model. It will be appreciated that in this step 21, the process of obtaining the forward vector for each word is similar to that of the unidirectional transducer model.
Next, in step 22, for each word in the first training sentence, a reverse vector corresponding to the word is obtained based on the initial word vector of the word and the context information of the word in the first training sentence by using the first bi-directional converter model. It will be appreciated that in this step 22, the context information for each word is utilized in obtaining the corresponding reverse vector for that word.
In one example, using the first bi-directional converter model, for each word in the first training sentence, extracting a plurality of important information from different angles based on an initial word vector for the word and the contextual information of the word in the first training sentence using a self-attention mechanism; and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
Then, in step 23, according to the position of each word in the first training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as the target word vector corresponding to the position. It can be understood that the target word vector corresponding to each position represents the context before the position and the context after the position, and the robustness is good.
Then, in step 24, a first probability of a word corresponding to each position in the first training sentence is predicted for the target word vector corresponding to the position by using the first language model. It can be understood that the target word vector corresponding to each position in the first training sentence is obtained by using the bidirectional converter model, and then the probability of the word corresponding to the position is obtained by predicting the language model according to the target word vector.
Finally, at step 25, the first bi-directional converter model and the first language model are trained by minimizing a first loss function associated with the first probability, resulting in a trained second bi-directional converter model and second language model. It can be understood that the training sentences adopted in model training do not need manual labeling, so that the model is conveniently trained by using extensive non-labeling corpus.
According to the method provided by the embodiment of the specification, first, a first bidirectional converter model is utilized, and a forward vector corresponding to each word in a first training sentence is obtained based on an initial word vector of the word and the context information of the word in the first training sentence; then, utilizing the first bidirectional converter model, aiming at each word in the first training sentence, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word in the first training sentence; then, according to the position of each word in the first training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; then, predicting a first probability of a word corresponding to each position in the first training sentence by using a first language model aiming at a target word vector corresponding to the position; and finally, training the first bidirectional converter model and the first language model by minimizing a first loss function related to the first probability to obtain a trained second bidirectional converter model and a trained second language model. In the embodiment of the present disclosure, unlike a general unidirectional transducer model, the bidirectional transducer model fully considers the context information of each word, rather than only considering the context information, and when the sequence data is deeply modeled, the characteristic of fast running speed of the transducer model can be utilized, and robustness of the model is ensured.
Fig. 3 shows a flow chart of a model training method for text analysis according to another embodiment, which may correspond to the fine tuning phase mentioned in the application scenario shown in fig. 1. As shown in fig. 3, the model training method in this embodiment includes the steps of:
first, in step 31, the second bi-directional converter model trained by the method described in fig. 2 is used to obtain, for each word in the second training sentence, a forward vector corresponding to the word based on the initial word vector of the word and the above information of the word in the second training sentence. It will be appreciated that in this step 31, the process of obtaining the forward vector for each word is similar to that of the unidirectional transducer model.
Next, in step 32, for each word in the second training sentence, a reverse vector corresponding to the word is obtained based on the initial word vector of the word and the context information of the word in the second training sentence using the second bi-directional converter model. It will be appreciated that in this step 32, the context information for each word is utilized in obtaining the corresponding reverse vector for that word.
In one example, using the second bi-directional converter model, for each word in the second training sentence, extracting a plurality of important information from different angles based on an initial word vector for the word and the contextual information of the word in the second training sentence using a self-attention mechanism; and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
Then, in step 33, according to the position of each word in the second training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as the target word vector corresponding to the position. It can be understood that the target word vector corresponding to each position represents the context before the position and the context after the position, and the robustness is good.
Then, in step 34, the first probability of the word corresponding to each position in the second training sentence is predicted for the target word vector corresponding to the position by using the second language model trained by the method described in fig. 2. It can be understood that the target word vector corresponding to each position in the second training sentence is obtained by using the bidirectional converter model, and then the probability of the word corresponding to the position is obtained by predicting the language model according to the target word vector.
In step 35, a representation vector of the sentence corresponding to the second training sentence is generated according to the target word vector corresponding to each position in the second training sentence. It may be appreciated that the generation of the representation vector of the sentence corresponding to the second training sentence incorporates the target word vectors corresponding to the plurality of positions, rather than just the target word vector corresponding to one of the positions.
In one example, the target word vector corresponding to each position in the second training sentence is averaged, and the average is used as the representation vector of the sentence corresponding to the second training sentence.
In step 36, a second probability of the label corresponding to the second training sentence is predicted based on the representation vector of the sentence corresponding to the second training sentence using the multi-classification model. It will be appreciated that the tags are pre-labeled categories of text classification.
Finally, training the second bi-directional converter model, the second language model and the multi-class model by minimizing the sum of the first loss function and the second loss function to obtain a third bi-directional converter model, a third language model and a second multi-class model in step 37; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability. It can be understood that the training sentences adopted in the model training need to be manually marked, so that the model is further trained by using marked corpus.
In one example, the sum of the first and second loss functions is minimized by a gradient descent method to determine model parameters of the second bi-directional converter model, the second language model, and the multi-classification model.
According to the method provided by the embodiment of the specification, firstly, the second bidirectional converter model trained by the method shown in fig. 2 is utilized, and for each word in a second training sentence, a forward vector corresponding to the word is obtained based on the initial word vector of the word and the above information of the word in the second training sentence; then, utilizing the second bidirectional converter model, aiming at each word in the second training sentence, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word in the second training sentence; then, according to the position of each word in the second training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; then, predicting a first probability of a word corresponding to each position in the second training sentence according to the target word vector corresponding to the position by using the second language model trained by the method shown in fig. 2; generating a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence; then, predicting a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence by utilizing a multi-classification model; finally, training the second bidirectional converter model, the second language model and the multi-classification model by minimizing the sum of the first loss function and the second loss function to obtain a third bidirectional converter model, a third language model and a second multi-classification model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability. In the embodiment of the specification, the characteristic of high running speed of a transducer model can be utilized when the sequence data is subjected to depth modeling, and the robustness of the model is ensured; moreover, on the basis of model training of the bidirectional converter model and the language model in the first aspect, the bidirectional converter model, the language model and the multi-classification model are further subjected to joint training, so that a better model training effect is achieved.
Fig. 4 shows a flow chart of a text classification method according to another embodiment, which may correspond to the prediction phase mentioned in the application scenario shown in fig. 1. As shown in fig. 4, the text classification method in this embodiment includes the steps of:
first, in step 41, the third bi-directional converter model trained by the method described in fig. 3 is used to obtain, for each word in the sentence to be classified, a forward vector corresponding to the word based on the initial word vector of the word and the above information of the word in the sentence to be classified. It will be appreciated that in this step 41, the process of obtaining the forward vector for each word is similar to that of the unidirectional transducer model.
Next, in step 42, for each word in the sentence to be classified, a reverse vector corresponding to the word is obtained based on the initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model. It will be appreciated that in this step 42, the context information for each word is utilized in obtaining the corresponding reverse vector for that word.
Then, in step 43, according to the position of each word in the sentence to be classified, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as the target word vector corresponding to the position. It can be understood that the target word vector corresponding to each position represents the context before the position and the context after the position, and the robustness is good.
In step 44, a representation vector of the sentence corresponding to the sentence to be classified is generated according to the target word vector corresponding to each position in the sentence to be classified. It can be understood that the generation of the representation vector of the sentence corresponding to the sentence to be classified combines the target word vectors corresponding to the plurality of positions, not just the target word vector corresponding to one of the positions.
Finally, in step 45, the second multi-classification model trained by the method described in fig. 3 is used to perform text classification on the sentence to be classified based on the representation vector of the sentence corresponding to the sentence to be classified. It can be understood that, by using the multi-classification model, based on the representation vector of the sentence corresponding to the sentence to be classified, the probability of each category corresponding to the sentence to be classified is predicted, and the category with the highest probability is taken as the text classification result.
According to the method provided by the embodiment of the specification, firstly, the third bidirectional converter model trained by the method shown in fig. 3 is utilized, and for each word in a sentence to be classified, a forward vector corresponding to the word is obtained based on the initial word vector of the word and the above information of the word in the sentence to be classified; then, utilizing the third bidirectional converter model, aiming at each word in the sentence to be classified, obtaining a reverse vector corresponding to the word based on an initial word vector of the word and the context information of the word in the sentence to be classified; then, according to the position of each word in the sentence to be classified, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position; generating a representation vector of the sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified; finally, the second multi-classification model trained by the method shown in fig. 3 is used for carrying out text classification on the sentences to be classified based on the expression vectors of the sentences corresponding to the sentences to be classified. In the embodiment of the specification, when the depth modeling is performed on the sequence data, the characteristic of high running speed of the transducer model can be utilized, the robustness of the model is ensured, and the bidirectional converter model and the multi-classification model after two-stage training are beneficial to obtaining a better text classification result.
Three models involved in the foregoing embodiments will be specifically described below: bi-directional converters models (bi-directional converters for short), language models, and multi-classification models (also called multi-classifiers).
Wherein, two-way transducer:
the principle of operation of a conventional transducer (i.e., unidirectional transducer) is first described and then extended to a bidirectional transducer.
1. Unidirectional transducer
Transformer is proposed by Google's Ashish Vaswani et al for converting text sequences into vectors, which overcomes the disadvantage of requiring word-by-word computation when LSTM processes text, by using an attention mechanism (attention mechanism) for each word in which to obtain the above information. In this process, the computation of the output vector for each word may be parallel.
Fig. 5 is a schematic diagram of an internal structure of a unidirectional transducer model according to an embodiment of the present disclosure, and refer to fig. 5:
the input to the transducer block is a vector sequence x= { X 1 ,x 2 ,…,x N X, where x i Is the representation vector of the i-th position. X first interacts with each word and each word in the text through a multi-headed self-attention module (multi-head self attention), adding important information to each word in the text. The multi-head self-attention module consists of a plurality of self-attention modules with the same structure, wherein the calculation process of each self-attention module is as follows:
First, each word x is processed by a full-join layer (feed forward) i Conversion into two vectors k i And t i
k i =tanh(W q x i +b)
t i =tanh(W v x i +b)
W q And W is v Is a trainable parameter in the model. k (k) i For calculating x i All words { x }, above 1 ,…,x i-1 For x i Importance of t i Then is used for storing x i Information in (c) is provided for other words to use:
the resulting vector c i I.e. the pair x extracted from the above i Useful important information. Multi-head self-attention mechanism, i.e. using a plurality of the above-mentioned attention modules, gives each word x from different angles i From above { x 1 ,…,x i-1 Important information is extracted from the information. Finally, all the attention modules splice the vectors extracted by each word into d i I.e., the output vector of the multi-headed self-attention module for each word.
Output vector d for each word i Then the output vector l is obtained after the normalization layer (layer normalization) -full-continuous layer (feed forward) -normalization layer i I.e. x i The vector after conversion by the transducer module is calculated as follows:
l i =LayerNorm(LayerNorm(x i +d i )+W·LayerNorm(x i +d i )
where W is a trainable parameter and LayerNorm is used to normalize one layer of the neural network, making the flow of information between layers more stable. LayerNorm is calculated as follows:
where μ is the mean of all neurons in a layer of neural network and σ is the standard deviation of all neurons in a layer of neural network.
The transducers modules may be stacked in multiple layers, with the output of the next layer of transducers being the input of the previous layer of transducers, thereby forming a multi-layer transducer network. The computation of the Transformer network can be expressed as
2. Bidirectional transducer
Bidirectional transducers are an extension of unidirectional transducers in that unidirectional transducers consider only the above information in the attention mechanism, ignoring the below information. While the information below each word is also useful to itself. Therefore, the bidirectional transducer models sentences from two directions of the context, and the expression capacity of the model is increased. The calculation process is as follows:
wherein the method comprises the steps ofAnd->Representing pairs x extracted from the context respectively i Important information. Obtaining d using a bidirectional transducer i Thereafter, as with unidirectional transducer, < + >>And->The two-way transducer pair x is also obtained through the normalization layer and the full-connection layer respectively i Is included in the set of the output vectors. After passing through the multi-layer bidirectional transducer, the last sentence s= { w 1 ,w 2 ,…,w N Is converted into two sets of vectors +>And->The calculation of the bidirectional transducer can be expressed as
Wherein, language model:
sentence s= { w 1 ,w 2 ,…,w N After passing through the multi-layer bidirectional transducer, the two sets of vectors are converted And->The model can be pre-trained by the language model task, and because the data of the language model task does not need to be marked, a large amount of data is very easy to obtain for fully pre-training the model. The purpose of the language model task is to pass each word x i Context { x 1 ,…,x i-1 ,x i+1 ,…,x N Prediction of x i The model learns the internal law of the natural language, and if one model can correctly predict each word through the context, the model learns the internal law of the natural language well. The calculation process of the bidirectional transducer language model is as follows:
first, the forward vector of the i-1 th word is calculatedAnd the reverse vector of the (i+1) th word +.>Are spliced together
Then using v i Predicting the ith word w i Probability of (2):
/>
wherein W is LM Is a trainable parameter in a language model, W j LM Represents W LM Is the j-th row of (2).
The loss function of the language model is the mean of the cross entropy loss functions of all words in the sentence
The purpose of language model is to minimize L LM Recording all trainable parameter sets in the bidirectional transducer as W, and recording all trainable parameter sets in the language model as W LM . W and W LM Iterative optimization is carried out by a gradient descent method:
during pre-training, the model is optimized through iteration until L LM The model is trained by being smaller than a set threshold beta (usually 0.1,0.01, etc.), and the model learns the internal rules of natural language. Gamma ray 1 Usually a real number on the order of 0.0001 is taken.
Wherein, multiple classifier:
in the trimming stage, the pre-trained bi-directional transducer model is trimmed using data with real labels. Because the bidirectional transducer is pre-trained and has knowledge of natural language, compared with a random initialization model, the bidirectional transducer is directly trained on labeled data, and better effect can be achieved by pre-training and classification fine tuning.
The fine tuning process includes twoPart, one part is the same language model part as the pre-training process, and the other part is the input sentence (s= { w 1 ,w 2 ,…,w N -l) classification, wherein l is the label classification process of the sentence as follows:
first, s= { w is converted by bidirectional conversion 1 ,w 2 ,…,w N Conversion to vector v 1 ,…,v N ]Wherein each vector is a representation vector of a corresponding position word. The vector averaged over the representative vectors of all words is then used as the representative vector for the entire sentence.
Then using a Softmax classifier pairCalculating the probability that S belongs to each tag:
wherein W is c Is a trainable parameter set in a multi-classifier, W k c Represents W c In row k. The loss function of the classifier is the cross entropy of each sample belonging to its true label i, i.e
L C =-logp C (l|S)
In the fine tuning process, the purpose of the model is to minimize the loss function L of the language model LM And a loss function L of the classifier C Sum L=L LM +L C All parameters in the model are optimized iteratively by a gradient descent method,
/>
γ 2 is generally greater than gamma 1 An order of magnitude smaller, about 0.00001.
In the prediction phase, sentence S is converted into a vector by only requiring bidirectional convertorsThen calculate S belonging to each tag l using a multi-classifier k And finally, outputting the label with the maximum probability.
l=argmax k p c (l k |S)
Thus, the effect of improving the text classification model by utilizing the bidirectional transducer is realized.
It should be noted that, the multi-task classifier and the task arbiter are not limited to the softmax classifier, and all models capable of classifying can be used as the multi-task classifier and the task arbiter, such as a support vector machine, a logistic regression, a multi-layer neural network, and the like.
According to an embodiment of another aspect, there is further provided a model training apparatus for text analysis, for performing the model training method for text analysis provided in the embodiment of the present specification, for example, the model training method for text analysis shown in fig. 2. FIG. 6 shows a schematic block diagram of a model training apparatus for text analysis according to one embodiment. As shown in fig. 6, the apparatus 600 includes:
A forward vector generating unit 61, configured to obtain, for each word in the first training sentence, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the first training sentence by using the first bidirectional converter model;
a reverse vector generating unit 62, configured to obtain, for each word in the first training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the first training sentence, using the first bidirectional converter model;
a word vector generating unit 63, configured to splice, according to the position of each word in the first training sentence, the forward vector of the word preceding the position obtained by the forward vector generating unit 61 and the reverse vector of the word following the position obtained by the reverse vector generating unit 62, as a target word vector corresponding to the position;
a prediction unit 64, configured to predict, using a first language model, a first probability of obtaining a word corresponding to each position in the first training sentence, for a target word vector corresponding to the position;
a model training unit 65, configured to train the first bidirectional converter model and the first language model by minimizing a first loss function related to the first probability obtained by the prediction unit 64, so as to obtain a trained second bidirectional converter model and a trained second language model.
Alternatively, as an embodiment, the reverse vector generating unit 62 is specifically configured to:
extracting, with the first bi-directional converter model, for each word in the first training sentence, a plurality of important information from different angles using a self-attention mechanism based on an initial word vector for the word and the contextual information of the word in the first training sentence;
and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
With the apparatus provided in the embodiment of the present specification, first, the forward vector generating unit 61 obtains, for each word in the first training sentence, a forward vector corresponding to the word based on the initial word vector of the word and the above information of the word in the first training sentence, using the first bidirectional converter model; then, the reverse vector generating unit 62 obtains, for each word in the first training sentence, a reverse vector corresponding to the word based on the initial word vector of the word and the context information of the word in the first training sentence by using the first bidirectional converter model; then, the word vector generating unit 63 concatenates the forward vector of the word preceding the position and the reverse vector of the word following the position according to the position of each word in the first training sentence, as a target word vector corresponding to the position; the prediction unit 64 further uses the first language model to predict, for each target word vector corresponding to the position in the first training sentence, a first probability of a word corresponding to the position; finally, the model training unit 65 trains the first bi-directional converter model and the first language model by minimizing a first loss function related to the first probability, and obtains a trained second bi-directional converter model and a second language model. In the embodiment of the present disclosure, unlike a general unidirectional transducer model, the bidirectional transducer model fully considers the context information of each word, rather than only considering the context information, and when the sequence data is deeply modeled, the characteristic of fast running speed of the transducer model can be utilized, and robustness of the model is ensured.
According to an embodiment of another aspect, there is further provided a model training apparatus for text analysis, for performing the model training method for text analysis provided in the embodiment of the present specification, for example, the model training method for text analysis shown in fig. 3. FIG. 7 shows a schematic block diagram of a model training apparatus for text analysis, according to one embodiment. As shown in fig. 7, the apparatus 700 includes:
a forward vector generating unit 71, configured to obtain, for each word in the second training sentence, a forward vector corresponding to the word based on the initial word vector of the word and the above information of the word in the second training sentence, using the second bidirectional converter model trained by the method described in fig. 2;
a reverse vector generating unit 72, configured to obtain, for each word in the second training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the second training sentence, using the second bidirectional converter model;
a word vector generating unit 73, configured to splice, according to the position of each word in the second training sentence, the forward vector of the word preceding the position obtained by the forward vector generating unit 71 and the reverse vector of the word following the position obtained by the reverse vector generating unit 72 together, as a target word vector corresponding to the position;
A first prediction unit 74, configured to predict, for each target word vector corresponding to a position in the second training sentence, a first probability of a word corresponding to the position by using the second language model trained by the method described in fig. 2;
a sentence vector generating unit 75, configured to generate a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence obtained by the word vector generating unit 73;
a second prediction unit 76 configured to predict a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence obtained by the sentence vector generation unit 75, using a multi-classification model;
a model training unit 77 for training the second bidirectional converter model, the second language model, and the multi-class model by minimizing the sum of the first loss function and the second loss function, to obtain a third bidirectional converter model, a third language model, and a second multi-class model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability.
Alternatively, as an embodiment, the reverse vector generating unit 72 is specifically configured to:
extracting, for each word in the second training sentence, a plurality of important information from different angles using a self-attention mechanism based on an initial word vector of the word and the context information of the word in the second training sentence using the second bi-directional converter model;
and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
Optionally, as an embodiment, the sentence vector generating unit 75 is specifically configured to average the target word vector corresponding to each position in the second training sentence, and use the average as the representation vector of the sentence corresponding to the second training sentence.
Optionally, as an embodiment, the model training unit 77 is specifically configured to minimize a sum of the first loss function and the second loss function by a gradient descent method to determine model parameters of the second bidirectional converter model, the second language model and the multi-classification model.
In the apparatus provided in the embodiment of the present disclosure, first, the forward direction vector generating unit 71 uses the second bidirectional converter model trained by the method described in fig. 2, for each word in the second training sentence, based on the initial word vector of the word and the above information of the word in the second training sentence, to obtain the forward direction vector corresponding to the word; then, the reverse vector generating unit 72 obtains, for each word in the second training sentence, a reverse vector corresponding to the word based on the initial word vector of the word and the context information of the word in the second training sentence, using the second bidirectional converter model; then, the word vector generating unit 73 concatenates the forward vector of the word preceding the position and the reverse vector of the word following the position as the target word vector corresponding to the position according to the position of each word in the second training sentence; the first prediction unit 74 re-uses the second language model trained by the method shown in fig. 2, and predicts, for each target word vector corresponding to the position in the second training sentence, a first probability of a word corresponding to the position; and, the sentence vector generating unit 75 generates a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence; next, the second prediction unit 76 predicts a second probability of the label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence using the multi-classification model; finally, the model training unit 77 trains the second bidirectional converter model, the second language model and the multi-class model by minimizing the sum of the first loss function and the second loss function, to obtain a third bidirectional converter model, a third language model and a second multi-class model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability. In the embodiment of the specification, the characteristic of high running speed of a transducer model can be utilized when the sequence data is subjected to depth modeling, and the robustness of the model is ensured; moreover, on the basis of model training of the bidirectional converter model and the language model in the first aspect, the bidirectional converter model, the language model and the multi-classification model are further subjected to joint training, so that a better model training effect is achieved.
According to an embodiment of another aspect, there is further provided a text classification apparatus for performing the text classification method provided in the embodiment of the present specification, for example, the text classification method shown in fig. 4. Fig. 8 shows a schematic block diagram of a text classification apparatus according to an embodiment. As shown in fig. 8, the apparatus 800 includes:
a forward vector generating unit 81, configured to obtain, for each word in the sentence to be classified, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the sentence to be classified, using the third bidirectional converter model trained by the method described in fig. 3;
a reverse vector generating unit 82, configured to obtain, for each word in the sentence to be classified, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the sentence to be classified, using the third bidirectional converter model;
a word vector generating unit 83, configured to splice, according to the position of each word in the sentence to be classified, the forward vector of the word preceding the position obtained by the forward vector generating unit 81 and the reverse vector of the word following the position obtained by the reverse vector generating unit 82, as a target word vector corresponding to the position;
A sentence vector generating unit 84, configured to generate a representation vector of a sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified obtained by the word vector generating unit 83;
a text classification unit 85, configured to perform text classification on the sentence to be classified based on the representation vector of the sentence corresponding to the sentence to be classified obtained by the sentence vector generating unit 84 by using the second multi-classification model trained by the method described in fig. 3.
According to the device provided by the embodiment of the present disclosure, firstly, the forward vector generating unit 81 uses the third bidirectional transducer model trained by the method described in fig. 3, and for each word in a sentence to be classified, obtains a forward vector corresponding to the word based on the initial word vector of the word and the above information of the word in the sentence to be classified; then, the reverse vector generating unit 82 obtains, for each word in the sentence to be classified, a reverse vector corresponding to the word based on the initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model; then, the word vector generating unit 83 concatenates the forward vector of the word preceding the position and the reverse vector of the word following the position according to the position of each word in the sentence to be classified, as a target word vector corresponding to the position; the sentence vector generating unit 84 generates a representation vector of the sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified; finally, the text classification unit 85 performs text classification on the sentence to be classified based on the representation vector of the sentence corresponding to the sentence to be classified by using the second multi-classification model trained by the method described in fig. 3. In the embodiment of the specification, when the depth modeling is performed on the sequence data, the characteristic of high running speed of the transducer model can be utilized, the robustness of the model is ensured, and the bidirectional converter model and the multi-classification model after two-stage training are beneficial to obtaining a better text classification result.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 to 4.
According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 2-4.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims (12)

1. A model training method for text analysis, the method comprising:
obtaining a forward vector corresponding to each word in a first training sentence by using a first bidirectional converter model based on an initial word vector of the word and the above information of the word in the first training sentence;
obtaining a reverse vector corresponding to each word in the first training sentence by using the first bidirectional converter model based on an initial word vector of the word and the context information of the word in the first training sentence;
according to the position of each word in the first training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position;
predicting a target word vector corresponding to each position in the first training sentence by using a first language model to obtain a first probability of a word corresponding to the position;
training the first bi-directional converter model and the first language model by minimizing a first loss function associated with the first probability to obtain a trained second bi-directional converter model and a second language model;
Obtaining a forward vector corresponding to each word in a second training sentence by using the second bidirectional converter model based on the initial word vector of the word and the above information of the word in the second training sentence;
obtaining, by using the second bidirectional converter model, for each word in the second training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the second training sentence;
according to the position of each word in the second training sentence, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position;
predicting a first probability of a word corresponding to each position in the second training sentence by using the second language model according to the target word vector corresponding to the position; generating a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence;
predicting a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence by using a multi-classification model;
Training the second bidirectional converter model, the second language model and the multi-classification model by minimizing the sum of the first loss function and the second loss function to obtain a third bidirectional converter model, a third language model and a second multi-classification model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability.
2. The method of claim 1, wherein the obtaining, for each word in the second training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the second training sentence using the second bi-directional converter model, comprises:
extracting, for each word in the second training sentence, a plurality of important information from different angles using a self-attention mechanism based on an initial word vector of the word and the context information of the word in the second training sentence using the second bi-directional converter model;
and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
3. The method of claim 1, wherein the generating the representation vector of the sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence comprises:
And taking an average value of the target word vectors corresponding to each position in the second training sentences, and taking the average value as the representation vector of the sentences corresponding to the second training sentences.
4. The method of claim 1, wherein the training the second bi-directional converter model, the second language model, and the multi-classification model by minimizing a sum of the first and second loss functions comprises:
minimizing the sum of the first and second loss functions by a gradient descent method to determine model parameters of the second bi-directional converter model, the second language model, and the multi-classification model.
5. A method of text classification, the method comprising:
the third bidirectional converter model trained by the method according to claim 1 is utilized to obtain a forward vector corresponding to each word in the sentence to be classified based on the initial word vector of the word and the context information of the word in the sentence to be classified;
obtaining a reverse vector corresponding to each word in the sentence to be classified based on the initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model;
According to the position of each word in the sentence to be classified, the forward vector of the word before the position and the reverse vector of the word after the position are spliced together to be used as a target word vector corresponding to the position;
generating a representation vector of a sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified;
text classification is performed on the sentences to be classified based on the representation vectors of the sentences corresponding to the sentences to be classified by using the second multi-classification model trained by the method according to claim 1.
6. A model training apparatus for text analysis, the apparatus comprising:
the forward vector generation unit is used for obtaining a forward vector corresponding to each word in the first training sentence by utilizing the first bidirectional converter model based on the initial word vector of the word and the above information of the word in the first training sentence;
the reverse vector generation unit is used for obtaining a reverse vector corresponding to each word in the first training sentence by utilizing the first bidirectional converter model based on the initial word vector of the word and the context information of the word in the first training sentence;
The word vector generation unit is used for splicing the forward vector of the word before the position obtained by the forward vector generation unit and the reverse vector of the word after the position obtained by the reverse vector generation unit according to the position of each word in the first training sentence to be used as a target word vector corresponding to the position;
the prediction unit is used for predicting and obtaining a first probability of a word corresponding to each position in the first training sentence by using a first language model;
the model training unit is used for training the first bidirectional converter model and the first language model by minimizing a first loss function related to the first probability obtained by the prediction unit to obtain a trained second bidirectional converter model and a trained second language model;
the forward vector generation unit is further configured to obtain, for each word in the second training sentence, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the second training sentence by using the second bidirectional converter model obtained by the model training unit;
The reverse vector generation unit is further configured to obtain, for each word in the second training sentence, a reverse vector corresponding to the word based on an initial word vector of the word and context information of the word in the second training sentence by using the second bidirectional converter model;
the word vector generating unit is further configured to splice, according to the position of each word in the second training sentence, the forward vector of the word preceding the position obtained by the forward vector generating unit and the reverse vector of the word following the position obtained by the reverse vector generating unit, to be a target word vector corresponding to the position;
the first prediction unit is used for predicting and obtaining a first probability of a word corresponding to each position in the second training statement according to the target word vector corresponding to the position by using the second language model obtained by the model training unit;
a sentence vector generating unit, configured to generate a representation vector of a sentence corresponding to the second training sentence according to the target word vector corresponding to each position in the second training sentence obtained by the word vector generating unit;
a second prediction unit, configured to predict, using a multi-classification model, a second probability of a label corresponding to the second training sentence based on the representation vector of the sentence corresponding to the second training sentence obtained by the sentence vector generation unit;
The model training unit is further configured to train the second bidirectional converter model, the second language model and the multi-classification model by minimizing a sum of the first loss function and the second loss function, to obtain a third bidirectional converter model, a third language model and a second multi-classification model; wherein the first loss function is associated with the first probability and the second loss function is associated with the second probability.
7. The apparatus of claim 6, wherein the reverse vector generation unit is specifically configured to:
extracting, for each word in the second training sentence, a plurality of important information from different angles using a self-attention mechanism based on an initial word vector of the word and the context information of the word in the second training sentence using the second bi-directional converter model;
and splicing vectors corresponding to each piece of important information in the plurality of pieces of important information to obtain reverse vectors corresponding to the word.
8. The apparatus of claim 6, wherein the sentence vector generating unit is specifically configured to average a target word vector corresponding to each position in the second training sentence, and use the average as a representation vector of a sentence corresponding to the second training sentence.
9. The apparatus of claim 6, wherein the model training unit is configured to minimize a sum of the first and second loss functions by a gradient descent method to determine model parameters of the second bi-directional converter model, the second language model, and the multi-classification model.
10. A text classification apparatus, the apparatus comprising:
a forward vector generating unit, configured to obtain, for each word in a sentence to be classified, a forward vector corresponding to the word based on an initial word vector of the word and the above information of the word in the sentence to be classified, using the third bidirectional converter model trained by the method according to claim 1;
the reverse vector generation unit is used for obtaining a reverse vector corresponding to each word in the sentence to be classified based on the initial word vector of the word and the context information of the word in the sentence to be classified by using the third bidirectional converter model;
the word vector generation unit is used for splicing the forward vector of the word before the position obtained by the forward vector generation unit and the reverse vector of the word after the position obtained by the reverse vector generation unit according to the position of each word in the sentence to be classified, and the forward vector and the reverse vector are used as target word vectors corresponding to the position;
A sentence vector generating unit, configured to generate a representation vector of a sentence corresponding to the sentence to be classified according to the target word vector corresponding to each position in the sentence to be classified obtained by the word vector generating unit;
a text classification unit, configured to perform text classification on the sentence to be classified based on the representation vector of the sentence corresponding to the sentence to be classified obtained by the sentence vector generating unit, using the second multi-classification model trained by the method according to claim 1.
11. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-5.
12. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-5.
CN201910176632.6A 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device Active CN110046248B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910176632.6A CN110046248B (en) 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910176632.6A CN110046248B (en) 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device

Publications (2)

Publication Number Publication Date
CN110046248A CN110046248A (en) 2019-07-23
CN110046248B true CN110046248B (en) 2023-08-25

Family

ID=67274609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910176632.6A Active CN110046248B (en) 2019-03-08 2019-03-08 Model training method for text analysis, text classification method and device

Country Status (1)

Country Link
CN (1) CN110046248B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11468246B2 (en) 2019-07-22 2022-10-11 Capital One Services, Llc Multi-turn dialogue response generation with template generation
CN110543566B (en) * 2019-09-06 2022-07-22 上海海事大学 Intention classification method based on self-attention neighbor relation coding
CN110781305B (en) * 2019-10-30 2023-06-06 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method
CN111221963B (en) * 2019-11-19 2023-05-12 成都晓多科技有限公司 Intelligent customer service data training model field migration method
CN111291183B (en) * 2020-01-16 2021-08-03 支付宝(杭州)信息技术有限公司 Method and device for carrying out classification prediction by using text classification model
CN113392193A (en) * 2020-03-12 2021-09-14 广东博智林机器人有限公司 Dialog text generation method and device
CN111506702A (en) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 Knowledge distillation-based language model training method, text classification method and device
CN111625645B (en) * 2020-05-14 2023-05-23 北京字节跳动网络技术有限公司 Training method and device for text generation model and electronic equipment
CN111554277B (en) * 2020-05-15 2023-11-03 深圳前海微众银行股份有限公司 Voice data recognition method, device, equipment and medium
CN112232088A (en) * 2020-11-19 2021-01-15 京北方信息技术股份有限公司 Contract clause risk intelligent identification method and device, electronic equipment and storage medium
CN112613316B (en) * 2020-12-31 2023-06-20 北京师范大学 Method and system for generating ancient Chinese labeling model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Bert:Pre-training of deep bidirectional transformers for language understanding;Jacob Devlin 等;《http://arXiv:1810.04805》;20181011;第1-14页 *

Also Published As

Publication number Publication date
CN110046248A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
CN110046248B (en) Model training method for text analysis, text classification method and device
JP6741357B2 (en) Method and system for generating multi-association label
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
CN107066464B (en) Semantic natural language vector space
She et al. Text classification based on hybrid CNN-LSTM hybrid model
WO2021143396A1 (en) Method and apparatus for carrying out classification prediction by using text classification model
CN109887484B (en) Dual learning-based voice recognition and voice synthesis method and device
US11734519B2 (en) Systems and methods for slot relation extraction for machine learning task-oriented dialogue systems
CN109325229B (en) Method for calculating text similarity by utilizing semantic information
CN110377916B (en) Word prediction method, word prediction device, computer equipment and storage medium
CN112015868B (en) Question-answering method based on knowledge graph completion
GB2547068A (en) Semantic natural language vector space
JP2018513405A (en) Spoken language understanding system
CN111401084A (en) Method and device for machine translation and computer readable storage medium
CN114973062A (en) Multi-modal emotion analysis method based on Transformer
Rendel et al. Using continuous lexical embeddings to improve symbolic-prosody prediction in a text-to-speech front-end
CN113326374B (en) Short text emotion classification method and system based on feature enhancement
CN110647919A (en) Text clustering method and system based on K-means clustering and capsule network
CN113516152A (en) Image description method based on composite image semantics
Xu et al. Convolutional neural network using a threshold predictor for multi-label speech act classification
Nguyen et al. Loss-based active learning for named entity recognition
US20230121404A1 (en) Searching for normalization-activation layer architectures
CN112989803B (en) Entity link prediction method based on topic vector learning
CN111026848B (en) Chinese word vector generation method based on similar context and reinforcement learning
CN112560440A (en) Deep learning-based syntax dependence method for aspect-level emotion analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant