CN116561325B - Multi-language fused media text emotion analysis method - Google Patents

Multi-language fused media text emotion analysis method Download PDF

Info

Publication number
CN116561325B
CN116561325B CN202310826886.4A CN202310826886A CN116561325B CN 116561325 B CN116561325 B CN 116561325B CN 202310826886 A CN202310826886 A CN 202310826886A CN 116561325 B CN116561325 B CN 116561325B
Authority
CN
China
Prior art keywords
language
encoder
source
data
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310826886.4A
Other languages
Chinese (zh)
Other versions
CN116561325A (en
Inventor
吴林
王永滨
周亭
李海滨
李�瑞
刘嘉暄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN202310826886.4A priority Critical patent/CN116561325B/en
Publication of CN116561325A publication Critical patent/CN116561325A/en
Application granted granted Critical
Publication of CN116561325B publication Critical patent/CN116561325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a multi-language fused media text emotion analysis method, which belongs to the technical field of data processing and specifically comprises the following steps: the source domain language vector is used as input to obtain the output of a source language encoder, the difference between the output of the target language encoder and the output of the source language encoder is determined through a language discriminator, and a learning module and a bilinear module are adopted to correct the parameters of the target language encoder until the difference meets the requirement, and the trained target language encoder is obtained; and carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data, so that emotion analysis work of the multi-language fused media text is better realized.

Description

Multi-language fused media text emotion analysis method
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a multi-language fused media text emotion analysis method.
Background
The public opinion text data of different areas and different languages have important reference values for the determinants, and the information of different languages can be used as research data to supplement each other, so that the determinants can analyze the special opinion of the event to be treated in different areas, and the corresponding strategy deployment is adjusted. Based on this, research of cross-language emotion classification methods is particularly important.
In order to accurately evaluate emotion tendencies of a cross-language text, in an invention patent CN115080734a, namely a cross-domain emotion classification method based on attention mechanism and reinforcement learning, a random strategy is applied to perform feature selection by using reinforcement learning thought, policy optimization is performed according to delay rewards obtained by calculation, and an optimal emotion classification strategy is used to realize cross-domain emotion classification, but the following technical problems exist:
in the reinforcement learning stage, differences among different languages are ignored, the differences between a target domain and a source domain which are composed of different languages are different, and when reinforcement learning is performed, emotion recognition and classification cannot be accurately realized without considering the differences.
Aiming at the technical problems, the invention provides a multi-language fused media text emotion analysis method.
Disclosure of Invention
The invention aims to provide a multi-language fused media text emotion analysis method.
In order to solve the technical problems, the invention provides a multi-language fused media text emotion analysis method, which is characterized by comprising the following steps:
s11, acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;
s12, initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;
s13, taking the source domain language vector as input to obtain the output of a source language encoder, determining the difference between the output of the target language encoder and the output of the source language encoder through a language discriminator, and correcting the parameters of the target language encoder by adopting a learning module and a bilinear module until the difference meets the requirement, thereby obtaining the target language encoder after training is completed;
s14, carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.
The method is characterized in that the source domain encoder is constructed based on a mBERT-S model, and the target domain encoder is constructed based on a mBERT-T model.
A further technical solution is that determining, by means of a language discriminator, a difference between the output of the target language encoder and the output of the source language encoder, comprising in particular:
obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;
obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;
and constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function.
The further technical scheme is that the construction of the loss function is performed by the probability of the target language encoder and the probability of the source language encoder, and specifically comprises the following steps:
constructing a target language loss function of the target language encoder through the probability of the target language encoder;
constructing a source language loss function of the source language encoder through the probability of the source language encoder;
constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:
;
wherein D is a language discriminator,for the arbiter loss function +.>For source language text, < >>For target language text, < >>For the source language feature extractor->Is a target language feature extractor. />To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>For the arbiter to determine the probability that the input data is from the source language model,determining for the arbiter a probability that the input data is from the target language model; />Means that the state of the arbiter approaches to the optimum, i.e. +.>Approach 1 @ as much as possible>Approaching 0 as much as possible.
The further technical scheme is that the data enhancement processing is performed on the source domain language data and the translated target language data, and specifically includes:
performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;
and constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.
The further technical scheme is that the model total loss function is composed of a contrast loss function and a mixed distance loss function, wherein the mixed distance loss function is determined according to a Euclidean distance function and a Manhattan distance function, and the calculation formula of the model total loss function is as follows:;
wherein the method comprises the steps ofRefers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twains loss function, < ->And->The specific formula of (2) is as follows:
;
wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>The representation dimension is +.>Is a non-diagonal matrix of (a);
;
wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Sample +.f. in feature space before data enhancement>A dimension vector;is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.
The further technical scheme is that the specific steps of determining the emotion classification result of the target language data are as follows:
s21, the source language text after data enhancement processing and the target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;
s22, through a bilinear pooling layer, determining cross-language joint mapping based on the single-head pair interaction matrix, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;
s23, through the dense feature mapping, using a full connection layer and softmax operation to obtain emotion classification probability.
The further technical scheme is that the loss function of the comprehensive encoder is determined by adopting negative log likelihood loss, and the difference condition of the input of the comprehensive encoder is measured by KL divergence, wherein the calculation formula of the loss function of the comprehensive encoder is as follows:
;
wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>For the weight parameter of loss, +.>Measuring the difference condition of two distributions for DL divergence, wherein M is the probability distribution of source domain language, and N is the probability distribution of target domain;
;
wherein the method comprises the steps ofLoss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the probability that the ith sentence model outputs the correct emotion of the sample.
In another aspect, the present invention provides a computer storage medium having a computer program stored thereon, which when executed in a computer causes the computer to perform a multi-lingual fused media text emotion analysis method as described above.
The invention has the beneficial effects that:
the invention carries out data enhancement processing based on traditional generation countermeasure type cross-language knowledge migration, and consists of two parts of comparison learning and a mixed distance formula, the module directly acts on a generator part, and the output of the same sample is compared with the output of a source language encoder and a target language encoder by the two encoders, thus the invention acts on the target language encoder simultaneously with generating a loss function of a countermeasure network, thereby helping the target language encoder to better obtain the knowledge of the source language encoder and pulling up the language characteristic distribution among different languages.
The source language field and the target language field are regarded as two different modes, a paired language interaction module taking bilinear attention as a core is designed, the module learns interaction expression of input source language and target language through double channels, and richer joint information is provided compared with a traditional single attention channel, so that the model learns similarity of positive semantics and negative semantics between the two languages, and the model is improved to finish cross-language emotion classification performance.
Additional features and advantages will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a flow chart of a multi-lingual fused media text emotion analysis method according to embodiment 1.
Fig. 2 is a flowchart showing specific steps for determining the emotion classification result of target language data in embodiment 1.
Fig. 3 is a frame diagram of a computer storage medium in embodiment 2.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus detailed descriptions thereof will be omitted.
The terms "a," "an," "the," and "said" are used to indicate the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.
Example 1
In order to solve the above technical problems, as shown in fig. 1, the present invention provides a multi-language fused media text emotion analysis method, which is characterized by specifically comprising:
s11, acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;
s12, initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;
the source domain encoder is constructed based on the mBERT-S model, and the target domain encoder is constructed based on the mBERT-T model.
It will be appreciated that determining, by the speech discriminator, the difference between the output of the target speech coder and the output of the source speech coder, comprises:
obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;
it should be noted that the source domain and target domain mappings can be forced to agree by limiting all network layers, learning such symmetric transformations can save parameters in the model, but such constraints often degrade optimization conditions, resulting in loss of some domain features, rendering the performance less than ideal when one network processes data from two different domains. Another approach is to learn an asymmetry transformation, i.e. to constrain only a portion of the network layer, thereby forcing that portion to align. In order to enable the source domain and the target domain to extract more common characteristics, the method adopts the latter, namely, under the condition of only fixing a small amount of super parameters, other constraints are canceled.
Obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;
and constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function.
Theoretically, this is obviously difficult to achieve in order to bring the arbiter to a perfect state. The target language model can only be continuously trained to gradually acquire language knowledge in the source language model, so that the arbiter cannot judge the source of the data. The nature of training is that the arbiter learns to determine the source of the data rather than how it is "spoofed".
Specifically, the construction of the loss function through the probability of the target language encoder and the probability of the source language encoder specifically includes:
constructing a target language loss function of the target language encoder through the probability of the target language encoder;
constructing a source language loss function of the source language encoder through the probability of the source language encoder;
constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:
;
wherein D is a language discriminator,for the arbiter loss function +.>For source language text, < >>For target language text, < >>For the source language feature extractor->Is a target language feature extractor. />To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>For the arbiter to determine the probability that the input data is from the source language model,determining for the arbiter a probability that the input data is from the target language model; />Means that the state of the arbiter approaches to the optimum, i.e. +.>Approach 1 @ as much as possible>Approaching 0 as much as possible.
S13, taking the source domain language vector as input to obtain the output of a source language encoder, determining the difference between the output of the target language encoder and the output of the source language encoder through a language discriminator, and correcting the parameters of the target language encoder by adopting a learning module and a bilinear module until the difference meets the requirement, thereby obtaining the target language encoder after training is completed;
it should be noted that, performing data enhancement processing on the source domain language data and the translated target language data specifically includes:
performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;
note that, the enhancement mode of the sample pair is constructed: code-switching. The core principle is that partial vocabulary in the source language text is replaced by the target language to mix the texts, so as to achieve the purpose of constructing the sample pair. The sample pair constructed by the principle can further obtain the learning ability of implicit characteristics of the cross-language model, but the method needs to maintain a particularly large bilingual emotion dictionary in the process of training the model, which is unacceptable in the absence of large-scale manual labeling. The invention adopts the combined operation of Dropout and Code-switching which are less dependent on external conditions in a data enhancement mode. The method comprises the steps of setting a certain probability of each element in a sample to be zero, constructing a positive sample pair, and comparing the sample before data enhancement and the sample after data enhancement in the training process, and shortening the distance between the positive sample pair in a feature space so as to learn similar features in the sample, thereby improving the model effect.
And constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.
In particular, wherein the model total loss function is composed of a contrast loss function and a hybrid distance loss function, wherein the hybrid distance loss function is determined from a Euclidean distance function and a Manhattan distance function,
;
wherein the method comprises the steps ofRefers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twains loss function, < ->And->The specific formula of (2) is as follows:
;
wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>The representation dimension is +.>Is a non-diagonal matrix of (a);
;
wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Sample +.f. in feature space before data enhancement>A dimension vector;is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.
In contrast learning, simCLR and Barlow Twins are commonly used. In this simple case, simCLR is very high in order of magnitude for batch size, and SimCLR relies heavily on negative sample pairs. In view of the fact that the cross-language emotion classification task itself is different from the task in the computer vision field, the main purpose of the task is to shorten the distance between the language features of the source language and the target language, the used sample is only the sample of the source language and the target language, it is difficult to construct a negative sample pair, and a negative sample pair is forced to have a counterproductive effect on the task. In summary, the invention selects a more suitable method of contrast learning by Barlow Twons.
S14, carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.
Specifically, as shown in fig. 2, the specific steps for determining the emotion classification result of the target language data are as follows:
s21, the source language text after data enhancement processing and the target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;
it should be further noted that the source language text after the data enhancement processing and the target language text obtained after the translation pass through the source language encoderAnd target language encoder->Encoding to obtain source languageEmbedded expression of (a)Embedded expression of the target language +.>Where M and N are the lengths of the source language sentence and the target language sentence, respectively. The present invention then uses these hidden representations to construct a bilinear interaction map to obtain a single-headed pair-wise interaction matrix +.>Interaction matrix->The structural formula of (2) is formula 4;
;
single-head paired interaction matrix 4Construction formula
Where U is a learnable weight matrix for the source language domain representation, V is a learnable weight matrix for the target language domain representation, q is a learnable weight vector,is a fixed full 1 vector, +.>Representing the Hadamard product. For interaction matrix->,/>Can be calculated by formula 5;
;
5 interaction matrixCalculation formula
S22, through a bilinear pooling layer, determining cross-language joint mapping based on the single-head pair interaction matrix, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;
it should be further noted that in order to obtain cross-language federated mappingsThe invention is in the interaction matrix->A bilinear pooling layer is introduced above. Specifically, the->The K-th element of (2) can be obtained from formula 6;
;
formula for calculating combined mapping of 6 languages
Wherein the method comprises the steps ofAnd->The kth columns of the weight matrices U and V are represented, respectively. It should be noted that this layer has no new learnable parameters. The weight matrices U and V are shared with the previous interactive mapping layer to reduce the number of parameters and mitigate overfitting. Furthermore, in cross-language joint mapping +.>As above, the present invention adds a Sum pool (Sum-pooling) to obtain a dense feature map +.>
S23, through the dense feature mapping, using a full connection layer and softmax operation to obtain emotion classification probability.
It should be noted that, the loss function of the integrated encoder is determined by adopting negative log likelihood loss, and the difference condition of the input of the integrated encoder is measured by KL divergence, wherein the calculation formula of the loss function of the integrated encoder is as follows:
;
wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>For the weight parameter of loss, +.>Measuring the difference condition of two distributions for DL divergence, wherein M is the probability distribution of source domain language, and N is the probability distribution of target domain;
;
wherein the method comprises the steps ofLoss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the ith sentence patternAnd outputting the probability of the correct emotion of the sample.
Finally, the invention maps featuresThe emotion classification probability is obtained by a full connection layer and softmax operation as shown in formula 7:
;
emotion classification probability calculation formula 7
Wherein the method comprises the steps ofRepresenting a learnable weight matrix, +.>Representing the bias. In this section, the present invention is trained using Negative log-likelihood Loss (NLLL) as shown in equation 8:
;
8 negative log likelihood loss
Wherein the method comprises the steps ofIndicate->True emotion tags of individual sentences,>is the output probability of the model.
Furthermore, given the probability distribution p of the source domain language and the probability distribution q of the target domain, the present invention uses the KL divergence to measure the difference of the two distributions. The final objective optimization function of the present module is therefore shown in equation 9:
;
equation 9 module total loss function.
Example 2
As shown in fig. 2, the present invention provides a computer storage medium having a computer program stored thereon, which when executed in a computer, causes the computer to perform a multi-language fused media text emotion analysis method as described above.
The invention has the beneficial effects that:
the invention carries out data enhancement processing based on traditional generation countermeasure type cross-language knowledge migration, and consists of two parts of comparison learning and a mixed distance formula, the module directly acts on a generator part, and the output of the same sample is compared with the output of a source language encoder and a target language encoder by the two encoders, thus the invention acts on the target language encoder simultaneously with generating a loss function of a countermeasure network, thereby helping the target language encoder to better obtain the knowledge of the source language encoder and pulling up the language characteristic distribution among different languages.
The source language field and the target language field are regarded as two different modes, a paired language interaction module taking bilinear attention as a core is designed, the module learns interaction expression of input source language and target language through double channels, and richer joint information is provided compared with a traditional single attention channel, so that the model learns similarity of positive semantics and negative semantics between the two languages, and the model is improved to finish cross-language emotion classification performance.
In embodiments of the present invention, the term "plurality" refers to two or more, unless explicitly defined otherwise. The terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly attached, detachably attached, or integrally attached. The specific meaning of the above terms in the embodiments of the present invention will be understood by those of ordinary skill in the art according to specific circumstances.
In the description of the embodiments of the present invention, it should be understood that the directions or positional relationships indicated by the terms "upper", "lower", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience in describing the embodiments of the present invention and to simplify the description, and do not indicate or imply that the devices or units referred to must have a specific direction, be configured and operated in a specific direction, and thus should not be construed as limiting the embodiments of the present invention.
In the description of the present specification, the terms "one embodiment," "a preferred embodiment," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention and is not intended to limit the embodiment of the present invention, and various modifications and variations can be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present invention should be included in the protection scope of the embodiments of the present invention.

Claims (7)

1. A multi-language fused media text emotion analysis method is characterized by comprising the following steps:
acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;
initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;
the source domain language vector is used as input to obtain the output of a source language encoder, the difference between the output of the target language encoder and the output of the source language encoder is determined through a language discriminator, and a learning module and a bilinear module are adopted to correct the parameters of the target language encoder until the difference meets the requirement, and the trained target language encoder is obtained;
determining, by a speech discriminator, a difference between the output of the target speech coder and the output of the source speech coder, comprising:
obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;
obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;
constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function;
performing data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain emotion classification results of the target language data;
the data enhancement processing is carried out on the source domain language data and the translated target language data, and specifically comprises the following steps:
performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;
and constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.
2. The method of claim 1, wherein the source language encoder is constructed based on a mBERT-S model and the target language encoder is constructed based on a mBERT-T model.
3. The method for emotion analysis of multilingual fused media text according to claim 1, wherein constructing a loss function by probability of the target language encoder and probability of the source language encoder comprises:
constructing a target language loss function of the target language encoder through the probability of the target language encoder;
constructing a source language loss function of the source language encoder through the probability of the source language encoder;
constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein D is a language discriminator, +.>For the loss function of the language discriminator, +.>For source language text, < >>For target language text, < >>For the source language feature extractor->A target language feature extractor; />For text from source languageOne of the individual samples is selected randomly for feature extraction, < >>To be text->One of the individual samples is selected randomly for feature extraction, < >>Judging that the input data is probability from source language model for language discriminator, +.>Probability of input data from the target language model for the determination of the language discriminator;meaning that the state of the language discriminator is made to approach the optimum.
4. The method of claim 1, wherein the model total loss function is comprised of a contrast loss function and a hybrid distance loss function, wherein the hybrid distance loss function is determined from a euclidean distance function and a manhattan distance function, wherein the model total loss function is calculated as:wherein->Refers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twins loss function,and->The specific formula of (2) is as follows: />Wherein->Is the Barlow Twains loss function, < ->Is a positive constant that balances the importance of the lost first and second terms by equating the diagonal element of C to 1 by the incariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal element to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>Diagonal matrix representing dimension +.>Representing a non-diagonal matrix of dimensions;wherein->For the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Sample +.f. in feature space before data enhancement>A dimension vector; />Is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.
5. The emotion analysis method of multi-language fused media text according to claim 1, wherein the specific steps of determining emotion classification result of the target language data are:
the method comprises the steps that a source language text after data enhancement processing and a target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;
determining cross-language joint mapping based on the single-head pair interaction matrix through a bilinear pooling layer, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;
and obtaining emotion classification probability by adopting the dense feature mapping and adopting a full connection layer and softmax operation.
6. The method of claim 1, wherein the loss function of the integrated encoder is determined using negative log likelihood loss and the difference in input to the integrated encoder is measured by KL divergence, wherein the loss function of the integrated encoder is calculated by the formula:wherein->For the overall function of the module +.>For bilinear pooling loss function, +.>In order to be a weight parameter for the loss,measuring the difference of two distributions for DL divergence, M is the probability distribution of source domain language, N is the probability distribution of target domain>Wherein->Loss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the probability of the i-th sentence model outputting the correct emotion of the sample.
7. A computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a multi-lingual fused media text emotion analysis method as claimed in any one of claims 1 to 6.
CN202310826886.4A 2023-07-07 2023-07-07 Multi-language fused media text emotion analysis method Active CN116561325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310826886.4A CN116561325B (en) 2023-07-07 2023-07-07 Multi-language fused media text emotion analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310826886.4A CN116561325B (en) 2023-07-07 2023-07-07 Multi-language fused media text emotion analysis method

Publications (2)

Publication Number Publication Date
CN116561325A CN116561325A (en) 2023-08-08
CN116561325B true CN116561325B (en) 2023-10-13

Family

ID=87500420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310826886.4A Active CN116561325B (en) 2023-07-07 2023-07-07 Multi-language fused media text emotion analysis method

Country Status (1)

Country Link
CN (1) CN116561325B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648410B (en) * 2024-01-30 2024-05-14 中国标准化研究院 Multi-language text data analysis system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326214A (en) * 2016-08-29 2017-01-11 中译语通科技(北京)有限公司 Method and device for cross-language emotion analysis based on transfer learning
CN109325112A (en) * 2018-06-27 2019-02-12 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji
CN113901208A (en) * 2021-09-15 2022-01-07 昆明理工大学 Method for analyzing emotion tendentiousness of intermediate-crossing language comments blended with theme characteristics
CN114238636A (en) * 2021-12-14 2022-03-25 东南大学 Translation matching-based cross-language attribute level emotion classification method
CN115080734A (en) * 2022-04-29 2022-09-20 石燕青 Cross-domain emotion classification method based on attention mechanism and reinforcement learning
CN115952787A (en) * 2023-03-13 2023-04-11 北京澜舟科技有限公司 Emotion analysis method, system and storage medium for specified target entity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11694042B2 (en) * 2020-06-16 2023-07-04 Baidu Usa Llc Cross-lingual unsupervised classification with multi-view transfer learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326214A (en) * 2016-08-29 2017-01-11 中译语通科技(北京)有限公司 Method and device for cross-language emotion analysis based on transfer learning
CN109325112A (en) * 2018-06-27 2019-02-12 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji
CN113901208A (en) * 2021-09-15 2022-01-07 昆明理工大学 Method for analyzing emotion tendentiousness of intermediate-crossing language comments blended with theme characteristics
CN114238636A (en) * 2021-12-14 2022-03-25 东南大学 Translation matching-based cross-language attribute level emotion classification method
CN115080734A (en) * 2022-04-29 2022-09-20 石燕青 Cross-domain emotion classification method based on attention mechanism and reinforcement learning
CN115952787A (en) * 2023-03-13 2023-04-11 北京澜舟科技有限公司 Emotion analysis method, system and storage medium for specified target entity

Also Published As

Publication number Publication date
CN116561325A (en) 2023-08-08

Similar Documents

Publication Publication Date Title
CN108829801B (en) Event trigger word extraction method based on document level attention mechanism
WO2021155699A1 (en) Global encoding method for automatic abstract of chinese long text
CN109190131B (en) Neural machine translation-based English word and case joint prediction method thereof
JP7253848B2 (en) Fine Grained Emotion Analysis Method for Supporting Interlanguage Transition
CN116561325B (en) Multi-language fused media text emotion analysis method
CN111767718A (en) Chinese grammar error correction method based on weakened grammar error feature representation
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
Xiu et al. A handwritten Chinese text recognizer applying multi-level multimodal fusion network
Liu et al. Cross-domain slot filling as machine reading comprehension: A new perspective
CN111428518B (en) Low-frequency word translation method and device
CN114595700A (en) Zero-pronoun and chapter information fused Hanyue neural machine translation method
Tarride et al. A comparative study of information extraction strategies using an attention-based neural network
CN113779992A (en) Method for realizing BcBERT-SW-BilSTM-CRF model based on vocabulary enhancement and pre-training
CN117875395A (en) Training method, device and storage medium of multi-mode pre-training model
Zhao et al. Tibetan multi-dialect speech recognition using latent regression Bayesian network and end-to-end mode
CN112507717A (en) Medical field entity classification method fusing entity keyword features
CN117113937A (en) Electric power field reading and understanding method and system based on large-scale language model
Shi et al. Adding Visual Information to Improve Multimodal Machine Translation for Low‐Resource Language
CN116680407A (en) Knowledge graph construction method and device
CN113342982B (en) Enterprise industry classification method integrating Roberta and external knowledge base
CN115659242A (en) Multimode emotion classification method based on mode enhanced convolution graph
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN116415587A (en) Information processing apparatus and information processing method
CN114548117A (en) Cause-and-effect relation extraction method based on BERT semantic enhancement
CN114357984A (en) Homophone variant processing method based on pinyin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant