CN114429143A - Cross-language attribute level emotion classification method based on enhanced distillation - Google Patents

Cross-language attribute level emotion classification method based on enhanced distillation Download PDF

Info

Publication number
CN114429143A
CN114429143A CN202210044125.9A CN202210044125A CN114429143A CN 114429143 A CN114429143 A CN 114429143A CN 202210044125 A CN202210044125 A CN 202210044125A CN 114429143 A CN114429143 A CN 114429143A
Authority
CN
China
Prior art keywords
attribute
sequence
target
network
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210044125.9A
Other languages
Chinese (zh)
Inventor
吴含前
王志可
吴国威
李露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210044125.9A priority Critical patent/CN114429143A/en
Publication of CN114429143A publication Critical patent/CN114429143A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a cross-language attribute level emotion classification method based on reinforced distillation, which is characterized in that a teacher network is trained based on source language linguistic data, and attribute emotion information in the source language linguistic data is migrated to a target classifier based on a knowledge distillation frame; selecting attribute emotion related information from a target translation sentence sequence according to a specific attribute by adopting a sequence selector, and providing denoised sentence sequence representation for a target classifier; and constructing a target classifier based on cross-language distillation by using a self-attention layer, and modeling fine-grained interaction between the attribute sequence and the denoised target translation sentence sequence. The method and the device solve the problem of domain deviation between the translated corpus and the real corpus, and the target classifier has stronger generalization capability. The method can fully utilize effective attribute emotional information in the source language and the target translation, and simultaneously better model fine-grained interaction between sentences and attributes. Compared with a baseline method, the emotion classification performance of the method is improved to a certain extent.

Description

Cross-language attribute level emotion classification method based on enhanced distillation
Technical Field
The invention relates to a cross-linguistic attribute level emotion classification method based on enhanced distillation, and belongs to the technical field of cross-linguistic attribute level emotion classification in semantic recognition.
Background
Most of current machine translation tools are trained based on parallel corpora in the general field, translation noise words are inevitably introduced when texts with strong domain are translated, and the translation corpus quality is prone to decline due to the fact that the noise words comprise abnormal words, translation ambiguity and the like. Meanwhile, the problem of exposure deviation exists in the decoding process of the translation model, so that obvious expression difference exists between the translation language and the natural language. The above problems are the main reasons for the domain difference between the translated corpus and the real corpus.
Unlike traditional cross-language text classification or cross-language sentence level emotion classification, attribute level emotion classification does not rely on the overall semantics of the sentence, but rather on a set of words or phrases that are related to the attribute, which requires modeling fine-grained interactive relationships between the attribute and the sentence. The existence of noise words in the machine translation result can increase the difficulty of fine-grained modeling of the model, because the attention mechanism (such as self-attention) is based on a 'soft' information selection mechanism, and redundant noise words can cause distraction of attention weights and even cause weight assignment errors. On the other hand, emotion classification based on attribute level only depends on a group of words or phrases related to the attribute and is not related to the overall semantics of the sentence, so that attention weight only needs to be concentrated on emotion keywords for determining attribute emotion. In addition, the attribute emotion polarity discrimination depends on the descriptive language semantics related to the attribute rather than the specific language form, and it is a waste of resources to train the model only by using the translated target language corpus and ignore the rich attribute emotion information in the source language corpus. Intuitively speaking, as a source language corpus with rich resources and high quality, the attribute emotion information contained in the source language corpus is more accurate and reliable than a target translation corpus, and if the attribute emotion information contained in the source language corpus can be used for assisting a target language classifier to train in the training process, the performance of cross-language attribute level emotion classification is further improved.
Disclosure of Invention
In order to solve the problems, the invention provides a cross-language attribute level emotion classification method based on enhanced distillation, and provides enhanced self-attention mechanism modeling attribute level representation by combining the characteristics of cross-language attribute level emotion classification and aiming at the problems in a corpus migration method based on machine translation.
In order to achieve the purpose, the invention provides the following technical scheme:
a cross-language attribute level emotion classification method based on enhanced distillation comprises the following steps:
training a teacher network based on source language corpora, and migrating attribute emotion information in the source language corpora to a target classifier of a student network based on a knowledge distillation frame;
selecting attribute emotion related information from the target translation sentence sequence according to a specific attribute by adopting an attribute sensitive sequence selector, and providing denoised sentence sequence representation for a target classifier by using the attribute emotion related information as an intermediate module of a model;
and step three, constructing a target classifier based on cross-language distillation by using the self-attention layer, and modeling fine-grained interaction between the attribute sequence and the denoised target translation sentence sequence.
Further, the sequence selector uses the LSTM network modeling policy network pπAnd learning an optimal strategy pi (a) by using a strategy gradient algorithm1:n) Policy network pπLearning optimal strategies by defining a reward and with a probability pπ(ai|si;θr) Deciding whether to select xi
Further, the state, action and reward of the policy network are defined as follows:
state: the state of the ith time step is defined as si(ii) a Depending on the given attributes, the state needs to provide enough information to decide whether to select x or notiThus state siThe device consists of the following three parts:
Figure BDA0003471461210000021
wherein h isiIs a hidden state representation of the i-th time step of LSTM, viIs the ith word xiVector representation of vAIs an attribute vector representation;
the actions: policy network pπWith probability pπ(ai|si;θr) Performing action aiE {0,1}, and this probability is calculated using a logistic function:
a=[a1,a2,...,an]~pπ(A|S;θr)
wherein theta isrFor strategic network parameters, represent a sampling operation, S represents a state, A represents an action, wrAnd brRepresenting trainable parameters;
return: defining an attribute-sensitive reward R that integrates attribute emotion classification loss and cross-language distillation loss for a training sample<xs,xt,y>The payback is defined as follows:
Figure BDA0003471461210000022
wherein theta issrcAs a teacher's network parameter, θtgtFor student network parameters, γ N'/N is a penalty term to prevent overfitting.
Further, the second step and the third step specifically include the following processes:
for target translated sentence representation
Figure BDA0003471461210000023
Attribute representation
Figure BDA0003471461210000024
Obtaining denoised sentence representation H by a sequence selectorDNamely:
a=[a1,a2,…,aN]=RATS(HS,υA)
HD=HS~a
wherein RATS represents a sequence selector, generates an action sequence a, and represents a sequence from HSIn all the positions aiExtracting and splicing the vectors of 1 into a new sentence sequence representation;
then, fine-grained interaction between the attributes and the denoised sentence representation is modeled by means of the self-attention layer in a cross-language distillation-based target classifier:
H=SelfAttention(HA,HD)
finally, the average pooling layer and the full-link layer are used to calculate the non-normalized probability for each class, i.e., q ═ q1,q2,...,qK]Wherein K represents the number of categories; the probabilities are normalized by softening the softmax layer:
Figure BDA0003471461210000031
where T denotes temperature, and degrades to a softmax function when T is 1.
Further, for the strategy network in the sequence selector, optimizing by using a REINFORCE algorithm based on strategy gradient; parameter thetarThe optimization objective of (1) is to maximize the expected return
Figure BDA0003471461210000032
With respect to parameter θrThe policy gradient of (2) is defined as follows:
Figure BDA0003471461210000033
where D represents the data set size, N represents the sentence sequence length,
Figure BDA0003471461210000034
representing the action representing the t time step of the ith sample,
Figure BDA0003471461210000035
indicating the state of the ith sample at the tth time step.
Further, the parameter θ for the object classifiertgtOptimizing by using a reverberation propagation algorithm; seeking parameter thetatgtMinimizing attribute emotion classification loss for the target classifier:
Figure BDA0003471461210000036
wherein the content of the first and second substances,<xs,xt>representing source language training samples and target translation training samples, respectively, thetasrcAs a teacher's network parameter, θtgtNetwork parameters for students; teacher network parameters under the knowledge distillation framework need to be frozen in the training process.
Further, at the beginning of model training, θrNot participating in the training process, when the parameter thetatgtAfter the loss on the development set begins to converge, the parameter θ begins to be matchedrAnd thetatgtTraining is performed together.
Furthermore, a teacher network is trained by adopting a BERTBase model provided by Google officials, and a target classifier of a student network uses an interactive relation between a multi-head self-attention layer modeling attribute and a denoised sentence sequence and is composed of 3 layers of transform encoder sub-modules. The maximum length of the sentence sequence is set to be 60, the maximum length of the attribute sequence is 5, the dimension of the word vector is 768, and the sentence sequence and the attribute sequence share the target language encoder.
Further, the model is optimized by using an Adam optimizer, the initial learning rate is set to be 1e-5 for training a student network, the knowledge distillation temperature T is set to be 3, and the penalty parameter gamma in return is set to be 1 e-5. In addition, the batch size of the model training is 32, the number of training iteration rounds is 10, and in order to reduce the influence of overfitting, the random inactivation rate of the neurons is set to be 0.3.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the method, the target translation sentence is denoised by using the sequence selector, only emotion keywords are selected to interact with the follow-up attribute modeling fine granularity, and emotion irrelevant words and noise words are discarded, so that the difficulty of model modeling attribute level interaction is reduced in the denoising process, and the follow-up soft attention weight distribution is more concentrated; meanwhile, the problem of domain deviation between the translated corpus and the real corpus is solved.
2. The target classifier is trained under a knowledge distillation framework, once training is completed, a machine translation tool is not relied on, so that attribute emotion information in source language corpora can be fully utilized, and the target classifier has stronger generalization capability.
3. The model can fully utilize effective attribute emotional information in source language and target translation, and simultaneously better model fine-grained interaction between sentences and attributes.
4. Experiments show that the performance of the emotion classification method in all aspects is improved to a certain extent compared with a baseline method.
Drawings
FIG. 1 is a diagram of emotion keywords for attribute level emotion classification according to the present invention;
FIG. 2 is a diagram of an implementation architecture of the present invention;
FIG. 3 is a schematic diagram of a model sequence selector of the present invention;
FIG. 4 is a schematic diagram of the model object classifier of the present invention.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
The invention provides a cross-language attribute level emotion classification method based on reinforced distillation, which comprises the following steps:
training a teacher network based on source language corpora, and migrating attribute emotion information in the source language corpora to a target classifier of a student network based on a knowledge distillation frame; therefore, the attribute emotion information in the source language corpus can be fully utilized, and the target classifier has stronger generalization capability.
Selecting attribute emotion related information from a target translation sentence sequence according to a specific attribute by adopting an attribute sensitive sequence selector, and providing denoised sentence sequence representation for a target classifier by using the attribute emotion related information as an intermediate module of a model; the denoising process reduces the difficulty of model modeling attribute level interaction and simultaneously relieves the problem of domain difference.
And step three, constructing a target classifier based on cross-language distillation by using the self-attention layer, and modeling fine-grained interaction between the attribute sequence and the denoised target translation sentence sequence.
FIG. 1 shows a schematic diagram of attribute-level emotion classification emotion keywords. The overall architecture for implementing the present invention is shown in FIG. 2, and includes a teacher network and a student network. The teacher network is trained by adopting a BERTBase model provided by Google officials, and a target classifier of the student network uses an interactive relation between a multi-head self-attention layer modeling attribute and a denoised sentence sequence and is composed of 3 layers of transform encoder sub-modules. The maximum length of the sentence sequence is set to be 60, the maximum length of the attribute sequence is 5, the dimension of the word vector is 768, and the sentence sequence and the attribute sequence share the target language encoder. The innovation point of the invention is two core modules: (1) an attribute-sensitive sequence selector; (2) a cross-language distillation based target classifier.
The invention provides a sequence selector based on an enhanced attention mechanism, and particularly relates to a sequence selector which selects attribute emotion-related information from a target translation sentence sequence according to specific attributes and serves as an intermediate module of a model to provide denoised sentence sequence representation for a target classifier. The invention uses a reinforcement learning algorithm, namely a strategy gradient trains a sequence selector, and the selector can learn an optimal strategy to select the sequence. The architecture of the sequence selector is shown in fig. 3.
In order to fully consider the context information and the history information of the sequence selection in the sequence selector, the invention uses the LSTM network to model the policy network pπAnd use the strategyLearning to obtain an optimal strategy pi (a) by a slight gradient algorithm1:n) Policy network pπLearning optimal strategies by defining a reward and with a probability pπ(ai|si;θr) Deciding whether to select xiHere, the definition of status, action and reward is as follows:
state of the ith time step is defined as si. Depending on the given attributes, the state needs to provide enough information to decide whether to select x or notiThus state siThe device consists of the following three parts:
Figure BDA0003471461210000051
wherein h isiIs a hidden state representation of the i-th time step of LSTM, viIs the ith word xiVector representation of vAIs an attribute vector representation.
Action policy network pπWith probability pπ(ai|si;θr) Performing action aiE {0,1 }. And this probability is calculated using the logistic function:
a=[a1,a2,...,an]~pπ(A|S;θr)
wherein theta isrFor strategic network parameters, represent a sampling operation, S represents a state, A represents an action, wrAnd brTrainable parameters are represented.
In order to encourage policy networks pπTo make the right decision, the present invention defines an attribute-sensitive reward R that integrates attribute emotion classification loss and cross-language distillation loss. In particular, for a training sample<xs,xt,y>The payback is defined as follows:
Figure BDA0003471461210000052
wherein theta issrcNetwork parameters for teachers,θtgtFor student network parameters, γ N'/N is a penalty term to prevent overfitting.
Given a target translation sentence sequence s with attribute labelstgt=x1,x2,...,xnThe sequence selector generates an equal-length binary sequence a ═ a1,a2,...,anWherein when aiWhen 1 denotes xiIs selected when aiWhen 0, x is representediIs discarded. In this way, the sequence selector plays a role of a hard attention mechanism, and can select a group of emotion keywords for determining attribute emotion polarity from the translated sentences according to specific attribute words for attribute emotion classification and discard other emotion irrelevant words and noise words, so that the difficulty of model modeling fine-grained attribute level representation can be reduced, and subsequent soft attention weight distribution is more concentrated. By discarding the emotion irrelevant words and the translation noise words, the problem of domain deviation between the translation corpus and the real corpus can be relieved to a certain extent.
Through a sequence selector, the target translation sentence sequence is subjected to word selection so as to filter out noise words, and on the basis, a target classifier is used for modeling fine-grained interaction between attribute representation and denoised sentence representation, and the framework of the target classifier is shown in fig. 4.
For target translated sentence representation
Figure BDA0003471461210000061
Attribute representation
Figure BDA0003471461210000062
We can obtain denoised sentence representation H by sequence selectorD. Namely:
a=[a1,a2,...,aN|=RATS(HS,υA)
HD=HS~a
wherein RATS represents a sequence selector that generates an action sequence a, representing the sequence from HSIn all positions aiVector extraction of 1To be spliced into a new sentence sequence representation. Then, fine-grained interaction between the attributes and the denoised sentence representation is modeled by means of a self-attention layer in a cross-language distillation-based target classifier:
H=SelfAttention(HA,HD)
finally, the average pooling layer and the full-link layer are used to calculate the non-normalized probability for each class, i.e., q ═ q1,q2,...,qK]Where K represents the number of categories. By softening the softmax layer, the probabilities are normalized,
Figure BDA0003471461210000063
where T denotes temperature, and degrades to a softmax function when T is 1. Since higher temperature values cause the entropy of probability distribution to be larger and the category information to be more abundant, in the knowledge distillation framework, the higher temperature values are generally selected to maximally utilize the category knowledge in the teacher network to help the student network train.
And finally, an optimization strategy is adopted to obtain a model with better performance. The optimization strategy is characterized in that the cross-language attribute level emotion classification model parameter based on the enhanced attention is divided into two parts, wherein one part is a strategy network parameter theta of a sequence selector partrThe other part is the target classifier parameter θtgt. For the policy network in the sequence selector, we optimize using the policy gradient-based REINFORCE algorithm. Parameter thetarThe optimization objective of (1) is to maximize the expected return
Figure BDA0003471461210000064
With respect to parameter θrThe policy gradient of (2) is defined as follows:
Figure BDA0003471461210000065
where D represents the data set size, N represents the sentence sequence length,
Figure BDA0003471461210000071
representing the action representing the t time step of the ith sample,
Figure BDA0003471461210000072
indicating the state of the ith sample at the tth time step.
Parameter θ for target classifiertgtThe method utilizes the echo propagation algorithm to carry out optimization. Specifically, the parameter θ is soughttgtAttribute emotion classification loss of the target classifier is minimized.
Figure BDA0003471461210000073
Wherein the content of the first and second substances,<xs,xt>representing source language training samples and target translation training samples, respectively, thetasrcAs a teacher's network parameter, θtgtIs a student network parameter. And teacher network parameters under the knowledge distillation framework need to be frozen in the training process.
At the beginning of model training, θrWill not participate in the training process, which means that the sequence selector will not discard any sequences when the parameter θtgtAfter the loss on the development set begins to converge, the parameter θ begins to be matchedrAnd thetatgtTraining is performed together.
To verify the advantages of the present invention over other models, a series of comparative experiments were performed. The experimental steps mainly comprise three aspects, namely firstly, data preparation; then training a model; and finally, testing through the trained model to show the effect of the model.
1) Data preparation
The corpus resources used in the experiment are from attribute-level labeled corpus disclosed by SemEval-2016 and the published corpus of a Sunning easy-to-purchase e-commerce platform, and in order to comprehensively evaluate the performance of the cross-language attribute emotion classification model, the corpus contains comment texts in two fields, namely a catering field and a note field. On a catering field data set, taking English as a source language and Spanish and Russian as target languages to carry out cross-language attribute level emotion classification research; on note field data sets, we studied english as the source language and chinese as the target language. In the experimental process, the source language corpus or the target translation corpus is selected from the training set and the development set, no additional target markup corpus resources are used, and the ratio of the training set to the development set is 8: 2. Table 1 shows the corpus statistics before and after translation using the machine translation tool.
TABLE 1
Figure BDA0003471461210000074
2) Model training
During training, the model is optimized by using an Adam optimizer, the initial learning rate is set to be 1e-5 and used for training a student network, the knowledge distillation temperature T is set to be 3, and the penalty parameter gamma in return is set to be 1 e-5. In addition, the batch size of the model training is 32, the number of training iteration rounds is 10, and in order to reduce the influence of overfitting, the random inactivation rate of the neurons is set to be 0.3.
To demonstrate the effectiveness of the enhanced distillation based cross-linguistic attribute level emotion classification method proposed by the present invention, we selected the following reference model on the same dataset as compared to the ReKD proposed by the present invention:
MTDAN: the method translates a source language to a target language. Based on the target translation corpus, training a deep average Network (Deepaverage Network) by using the translated corpus to generate sentence vector representation for emotion classification. And finally, constructing a target language emotion classification model, and directly testing by using the real target language corpus in the testing stage.
ATAELSTM (S2T): the method translates a source language to a target language. Based on the target translation corpus, attribute representations and sentence representations are modeled using the LSTM network, and the attribute representations are added to the sentence representations using an attentive mechanism, generating attribute-level sentence representations for emotion classification. And finally, constructing a target language emotion classification model, and directly testing by using the real target language corpus in the testing stage.
ATAELSTM (T2S): the method is based on source language corpora, utilizes an LSTM network to model attribute representation and sentence representation, and utilizes an attention mechanism to generate attribute-level sentence representation for emotion classification. And finally, constructing a source language emotion classifier, and translating the real corpus into the source language by using a machine translation tool in a testing stage for testing.
CLDKCNN: the method is based on source language linguistic data to train a teacher network, utilizes a convolutional neural network as a language encoder and migrates emotion information to a student network through cross-language distillation. And finally, constructing a target language emotion classifier, and directly using the real target language corpus to test in the testing stage.
mBERT: the method is based on source language linguistic data, an attribute and sentence pair is constructed to be used as a model input and model building attribute level sentence representation, and vector representation obtained through an average pooling layer is used for emotion classification. And directly testing by using the cross-language representation capability of mBERT by using the real target language corpus in the testing stage.
mBERTSL: the method is based on a self-learning (SelfLearning) framework, a source language attribute emotion classifier is input and trained by using a source language corpus as a model input, then an unlabeled target language corpus is predicted by using the cross-language representation capability of mBERT in combination with a selection mechanism, and the target language corpus is expanded through continuous iteration to complete target language attribute level emotion classification. And directly using the real target language corpus to test in the testing stage.
DualBERT: the method is based on a source language corpus and a target translation language corpus, and utilizes emotion information in the source language corpus to assist a target language classifier in attribute-level emotion classification in a training stage. And finally, constructing an emotion classifier taking the dual-language text as input, and testing the real target language corpus and the corresponding source language translation in the testing stage.
TransMatch: the method translates a source language to a target language. Based on the target translation corpus and the unmarked real target language corpus, the target language encoder is used for encoding attributes, target translation sentences and real sentence expressions respectively, then a domain discriminator and the encoder are introduced to generate domain-independent feature expressions for countermeasure training, and attribute-level emotion classification training is carried out by using the target translation corpus. And directly using the real target language corpus to test in the testing stage.
ReKD: the method trains a teacher network based on source language corpora, denoises target translation sentences by using an enhanced distillation method, and models and attributes level sentences through a target classifier constructed by a multi-head attention layer to represent, so that the denoising process reduces the difficulty of model modeling attribute level interaction and simultaneously relieves the problem of domain difference. And finally, constructing a target language classifier, and directly using the real target language corpus to test in the testing stage.
3) Results of the experiment
Applying the prepared data to the above model, the results shown in table 2 were obtained. The results show the accuracy of the trained model on the test set and F1-measure, and the larger these evaluation index values are, the better the model is.
TABLE 2
Figure BDA0003471461210000091
From table 2, it can be seen that the proposed cross-linguistic attribute emotion classification method based on enhanced distillation is superior to other reference models, and achieves the best results. From the experimental results, it can be seen that the MTDAN method translates the language material of the source language into the target language, and uses the sentence-level representation of the deep average network modeling. However, the fine-grained interaction process between the attributes and the sentence representation is omitted, and the finally obtained sentence representation does not contain attribute information, so that the method is poorer in performance than other benchmark models. This also demonstrates the importance of modeling attribute-level interactions in the attribute-level emotion classification task. By comparing ATAELSTM (S2T) with ATAELSTM (T2S), it can be found that the forward translation result is worse than the reverse translation result, mainly because the forward translation method uses a machine translation tool to translate the source language corpus into the target language corpus, but the translation result has lower quality, so that the model trained based on the target translation corpus has worse performance, i.e. error propagation. The reverse translation method is characterized in that a source language emotion classifier is trained on source language corpora, a target language is translated into the source language in a testing stage, the source language corpora used by the classifier are high in quality and good in performance, and even if translation corpora poor in quality are used in the testing stage, the good performance can still be kept.
Secondly, the cldckn method also achieves superior performance using cross-language distillation, proving the effectiveness of the cross-language distillation method. But compared with ReKD, the method constructs a target language encoder by using a convolutional neural network, ignores the interaction between the attributes and the sentence representation, and also ignores the noise problem in the translated sentence. The ReKD method uses a sequence selector with sensitive attributes to filter noise words in target translation sentences, and uses a multi-head self-attention network as a target classifier to model attribute-level sentence representation as a final classification basis, so that the ReKD method is remarkably improved. Furthermore, mBERT achieves good performance without any external corpora and machine translation tools, which also illustrates the potential of current cross-language pre-training models. The mBERTSL method continuously enlarges the size of a target language training set in an iterative mode by combining a self-learning framework on the basis of mBERT, and finally constructs an attribute emotion classification model with the target language classification capability. The method is remarkably improved compared with mBERT, and the effectiveness of the self-learning framework in the cross-language task is also proved. The ReKD method provided by the method is superior to all other reference models.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (9)

1. A cross-linguistic attribute level emotion classification method based on enhanced distillation is characterized by comprising the following steps of:
training a teacher network based on source language corpora, and migrating attribute emotion information in the source language corpora to a target classifier of a student network based on a knowledge distillation frame;
selecting attribute emotion related information from the target translation sentence sequence according to a specific attribute by adopting an attribute sensitive sequence selector, and providing denoised sentence sequence representation for a target classifier by using the attribute emotion related information as an intermediate module of a model;
and step three, constructing a target classifier based on cross-language distillation by using the self-attention layer, and modeling fine-grained interaction between the attribute sequence and the denoised target translation sentence sequence.
2. The method for classifying emotion according to claim 1, wherein the sequence selector uses LSTM network modeling strategy network pπAnd learning an optimal strategy pi (a) by using a strategy gradient algorithm1:n) Policy network pπLearning optimal strategies by defining a reward and with a probability pπ(ai|si;θr) Deciding whether to select xi
3. The method for emotion classification based on enhanced distillation across linguistic attribute levels as recited in claim 2, wherein the state, action and return of the policy network are defined as follows:
state: the state of the ith time step is defined as si(ii) a Depending on the given attributes, the state needs to provide enough information to decide whether to select x or notiThus state siThe device consists of the following three parts:
Figure FDA0003471461200000011
wherein h isiIs a hidden state representation of the i-th time step of LSTM, viIs the ithWord xiVector representation of vAIs an attribute vector representation;
the actions: policy network pπWith probability pπ(ai|si;θr) Performing action aiE {0,1}, and this probability is calculated using a logistic function:
a=[a1,a2,...,an]~pπ(A|S;θr)
wherein theta isrFor strategic network parameters, represent a sampling operation, S represents a state, A represents an action, wrAnd brRepresenting trainable parameters;
return: defining an attribute-sensitive reward R that integrates attribute emotion classification loss and cross-linguistic distillation loss for a training sample<xs,xt,y>The payback is defined as follows:
Figure FDA0003471461200000012
wherein theta issrcAs a teacher's network parameter, θtgtFor student network parameters, γ N'/N is a penalty term to prevent overfitting.
4. The method for classifying emotion according to claim 1 or 3, wherein the second step and the third step specifically comprise the following steps:
for target translated sentence representation
Figure FDA0003471461200000021
Attribute representation
Figure FDA0003471461200000022
Obtaining denoised sentence representation H by a sequence selectorDNamely:
a=[a1,a2,...,aN]=RATS(HS,vA)
HD=HS~a
wherein RATS represents a sequence selector, generates an action sequence a, and represents a sequence from HSIn all the positions aiExtracting and splicing the vectors of 1 into a new sentence sequence representation;
then, fine-grained interaction between the attributes and the denoised sentence representation is modeled by means of the self-attention layer in a cross-language distillation-based target classifier:
H=Self Attention(HA,HD)
finally, the average pooling layer and the full-link layer are used to calculate the non-normalized probability for each class, i.e., q ═ q1,q2,...,qK]Wherein K represents the number of categories; the probabilities are normalized by softening the softmax layer:
Figure FDA0003471461200000023
where T denotes temperature, and degrades to a softmax function when T is 1.
5. The method for emotion classification based on enhanced distillation cross-linguistic attribute level as claimed in claim 2 or 3, wherein the strategy network in the sequence selector is optimized by using a REINFORCE algorithm based on strategy gradient; parameter thetarThe optimization objective of (1) is to maximize the expected return
Figure FDA0003471461200000024
With respect to parameter θrThe policy gradient of (c) is defined as follows:
Figure FDA0003471461200000025
where D represents the data set size, N represents the sentence sequence length,
Figure FDA0003471461200000026
representing the action representing the t time step of the ith sample,
Figure FDA0003471461200000027
indicating the state of the ith sample at the tth time step.
6. The method of classifying emotion according to claim 4, wherein θ is a parameter of a target classifiertgtOptimizing by using a reverberation propagation algorithm; seeking parameter thetatgtMinimizing attribute emotion classification loss for the target classifier:
Figure FDA0003471461200000028
wherein the content of the first and second substances,<xs,xt>representing source language training samples and target translation training samples, respectively, thetasrcAs a teacher's network parameter, θtgtNetwork parameters for students; teacher network parameters under the knowledge distillation framework need to be frozen in the training process.
7. The method for emotion classification based on enhanced distillation cross-linguistic attribute level as claimed in claim 4 or claim 6, wherein θ is θ at the initial stage of model trainingrNot participating in the training process, when the parameter thetatgtAfter the loss on the development set begins to converge, the parameter θ begins to be matchedrAnd thetatgtTraining is performed together.
8. The method for classifying emotion according to claim 1, wherein the teacher network is trained using the BERTBase model provided by *** officers, and the target classifier of the student network is constructed from 3-layer transform encoder sub-modules using the interaction between multi-headed self-attention layer modeling attributes and the denoised sentence sequence. The maximum length of the sentence sequence is set to be 60, the maximum length of the attribute sequence is 5, the dimension of the word vector is 768, and the sentence sequence and the attribute sequence share the target language encoder.
9. The method for classifying emotion based on enhanced distillation cross-linguistic attribute level, according to claim 1, wherein the model is optimized by using an Adam optimizer, the initial learning rate is set to 1e-5 for training a student network, the knowledge distillation temperature T is set to 3, and a penalty parameter γ in return is set to 1 e-5; in addition, the batch size of the model training is 32, the number of training iteration rounds is 10, and the random inactivation rate of the neurons is set to be 0.3.
CN202210044125.9A 2022-01-14 2022-01-14 Cross-language attribute level emotion classification method based on enhanced distillation Pending CN114429143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210044125.9A CN114429143A (en) 2022-01-14 2022-01-14 Cross-language attribute level emotion classification method based on enhanced distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210044125.9A CN114429143A (en) 2022-01-14 2022-01-14 Cross-language attribute level emotion classification method based on enhanced distillation

Publications (1)

Publication Number Publication Date
CN114429143A true CN114429143A (en) 2022-05-03

Family

ID=81311086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210044125.9A Pending CN114429143A (en) 2022-01-14 2022-01-14 Cross-language attribute level emotion classification method based on enhanced distillation

Country Status (1)

Country Link
CN (1) CN114429143A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203419A (en) * 2022-07-21 2022-10-18 北京百度网讯科技有限公司 Language model training method and device and electronic equipment
CN116468959A (en) * 2023-06-15 2023-07-21 清软微视(杭州)科技有限公司 Industrial defect classification method, device, electronic equipment and storage medium
CN116523031A (en) * 2023-07-05 2023-08-01 深圳须弥云图空间科技有限公司 Training method of language generation model, language generation method and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325112A (en) * 2018-06-27 2019-02-12 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji
CN112884150A (en) * 2021-01-21 2021-06-01 北京航空航天大学 Safety enhancement method for knowledge distillation of pre-training model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325112A (en) * 2018-06-27 2019-02-12 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji
CN112884150A (en) * 2021-01-21 2021-06-01 北京航空航天大学 Safety enhancement method for knowledge distillation of pre-training model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HANQIAN WU等: "Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification", 《HTTPS://MDPI.LONGHOE.NET/2079-9292/10/3/270》, 23 January 2021 (2021-01-23), pages 1 - 14 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203419A (en) * 2022-07-21 2022-10-18 北京百度网讯科技有限公司 Language model training method and device and electronic equipment
CN116468959A (en) * 2023-06-15 2023-07-21 清软微视(杭州)科技有限公司 Industrial defect classification method, device, electronic equipment and storage medium
CN116468959B (en) * 2023-06-15 2023-09-08 清软微视(杭州)科技有限公司 Industrial defect classification method, device, electronic equipment and storage medium
CN116523031A (en) * 2023-07-05 2023-08-01 深圳须弥云图空间科技有限公司 Training method of language generation model, language generation method and electronic equipment
CN116523031B (en) * 2023-07-05 2024-05-10 深圳须弥云图空间科技有限公司 Training method of language generation model, language generation method and electronic equipment

Similar Documents

Publication Publication Date Title
Bhutani et al. Learning to answer complex questions over knowledge bases with query composition
CN110287481B (en) Named entity corpus labeling training system
WO2023225858A1 (en) Reading type examination question generation system and method based on commonsense reasoning
CN114429143A (en) Cross-language attribute level emotion classification method based on enhanced distillation
CN114168749A (en) Question generation system based on knowledge graph and question word drive
CN115048447B (en) Database natural language interface system based on intelligent semantic completion
CN113254604B (en) Reference specification-based professional text generation method and device
CN115510814B (en) Chapter-level complex problem generation method based on dual planning
CN114969278A (en) Knowledge enhancement graph neural network-based text question-answering model
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN115510226A (en) Emotion classification method based on graph neural network
Liu et al. Cross-domain slot filling as machine reading comprehension: A new perspective
CN111538838A (en) Question generation method based on article
CN114238636A (en) Translation matching-based cross-language attribute level emotion classification method
Li et al. Approach of intelligence question-answering system based on physical fitness knowledge graph
US20230289528A1 (en) Method for constructing sentiment classification model based on metaphor identification
CN116523402A (en) Multi-mode data-based network learning resource quality assessment method and system
CN114548117A (en) Cause-and-effect relation extraction method based on BERT semantic enhancement
CN115017910A (en) Entity relation joint extraction method, network, equipment and computer readable storage medium based on Chinese electronic medical record
He et al. [Retracted] Application of Grammar Error Detection Method for English Composition Based on Machine Learning
CN114943216A (en) Case microblog attribute-level viewpoint mining method based on graph attention network
CN114692615A (en) Small sample semantic graph recognition method for small languages
CN113360606A (en) Knowledge graph question-answer joint training method based on Filter
Zhang et al. Robust dialog state tracker with contextual-feature augmentation
Guo RETRACTED: An automatic scoring method for Chinese-English spoken translation based on attention LSTM [EAI Endorsed Scal Inf Syst (2022), Online First]

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination