CN116189064A

CN116189064A - Barrage emotion analysis method and system based on joint model

Info

Publication number: CN116189064A
Application number: CN202310458854.3A
Authority: CN
Inventors: 宋彦; 陈伟东; 罗常凡
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-05-30
Anticipated expiration: 2043-04-26
Also published as: CN116189064B

Abstract

The invention discloses a barrage emotion analysis method and system based on a joint model, wherein barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments, the joint model comprises a coding module and a decoding module, the coding module comprises a video coding module, a text coding module, a gating fusion module and a multi-mode fusion module, the decoding module comprises a barrage reconstruction module and an emotion analysis module, and the decoding module takes the output of the coding module as input to output emotion analysis tendencies corresponding to the barrage comments; the barrage emotion analysis method and system utilize a gating fusion screening mechanism to take surrounding barrage comments as context information of target barrage comments, and utilize a multi-mode fusion mode to take video information into consideration, and fully utilize useful information to strengthen characteristic representation of video barrages so as to accurately identify emotion tendencies of the target barrage comments.

Description

Barrage emotion analysis method and system based on joint model

Technical Field

The invention relates to the technical field of barrage emotion analysis, in particular to a barrage emotion analysis method and system based on a joint model.

Background

The emotion analysis of the video barrage refers to the emotion polarity of real-time comments of the video.

The existing video barrage emotion analysis method is prone to extracting sentence-level features for emotion analysis and classification, and is based on grammar and semantics of rules, but the barrage is characterized in that: the method is short, has serious aphasia, uses special characters to represent specific meanings, has extremely irregular grammar and the like, so the traditional emotion analysis method cannot accurately segment the bullet screen properly, analyze grammar and the like, and further cannot accurately analyze emotion.

In addition, the existing barrage comment is short, insufficient context information is not available, grammar is very irregular, the existing barrage comment is related to a video theme at the time, interactivity is strong, instantaneity is strong, and the like, so that the existing method cannot effectively and accurately analyze emotion of the existing barrage comment in a short time.

Disclosure of Invention

Based on the technical problems in the background technology, the invention provides a barrage emotion analysis method and system based on a joint model, which can accurately identify emotion tendencies of target barrage comments.

According to the barrage emotion analysis method based on the joint model, barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments;

the training process of the joint model is as follows:

s1: constructing a training sample set, the training sample set comprising moments

Bullet comment>

Time->

To the point of

Inward bullet comment->

Surrounding video->

And comment on bullet screen->

Surrounding barrage comment +.>

；

S2: for the video

Coding and concatenating to obtain coded video feature ∈ ->

Comment on the barrage->

And the surrounding barrage comment->

Coding to obtain the coded target barrage characteristic +.>

And surrounding barrage features

；

S3: based on the target barrage feature

For the surrounding barrage feature->

After screening and filtering, connecting in series to obtain all surrounding barrage comments +.>

；

S4: video characterization through self-attention layers and cross-attention layers

Target barrage feature->

Comment on surrounding barrage->

Enhancement processing to obtain enhanced video features>

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

；

S5, performing S5; enhancement of video features based on multi-layered multi-headed pairs of attention layers

Enhanced target barrage feature->

Reinforcing surrounding barrage->

Reconstructing to obtain reconstructed barrage comments, and constructing a barrage reconstructed loss function by using the reconstructed barrage comments and the real barrage comments by using cross entropy>

；

S6: for enhanced video features

Enhanced target barrage feature->

Reinforcing surrounding barrage->

Sequentially carrying out regularization and normalization operations, and outputting the barrage comment +.>

Corresponding to predicted barrage emotion +.>

；

S7: predicted barrage emotion using cross entropy

And true barrage emotion->

Construction of a loss function for emotion prediction>

Loss function based on barrage reconstruction>

And loss function of emotion prediction->

Calculating the overall loss function->

Updating parameters of the joint model based on the total loss function and the back propagation algorithm until the performance of the joint model reaches a set expected value;

the surrounding barrage comments

The calculation formula is as follows:

wherein ,

for post-selection->

Comment on the surrounding bullet screen->

Is->

Comment on the surrounding bullet screen->

Is (are) peripheral features of->

，/>

Is a learnable gate matrix +.>

Is a learnable gate offset vector, +.>

For ReLU function>

Representing series connection,/->

Representing the product.

Further, the video features

Is calculated by the formula of (2)The following are provided:

the target barrage feature

The calculation formula of (2) is as follows:

the surrounding barrage feature

The calculation formula of (2) is as follows:

wherein ,

，/>

，/>

representing series connection,/->

Representing a video encoder>

Representing a long and short term memory network.

Further, in step S4: video characterization through self-attention layers and cross-attention layers

Target barrage feature->

Comment on surrounding barrage->

Enhancement processing to obtain enhanced video features>

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

Specifically, the method comprises the following steps:

characterizing video

Target barrage feature->

Comment on surrounding barrage->

Inputting as a first layer of the self-attention layer and the cross-attention layer and performing an L-layer iteration, wherein the L-layer is the total layer number of the self-attention layer and the cross-attention layer;

in the first place

Layer input video feature->

Obtaining the input video feature of the next layer +.>

The following are provided:

/>

in the first place

Layer input target barrage feature->

Obtaining the input target barrage feature of the next layer>

：

In the first place

Layer input surrounding barrage comment->

Obtaining the comment +.>

：

Where SA represents the self-attention layer and CA represents the cross-attention layer.

Further, in step S5, the barrage reconstructed loss function

The construction formula is as follows:

wherein ,

indicating batch processing, +.>

Representing cross entropy loss, < >>

Representing a reconstruction module->

Comment of bullet generated by the reconstruction module is represented, < ->

Indicating time->

Is a true bullet comment;

specifically, the bullet comments generated by the reconstruction module are specifically expressed in the following form:

wherein

Representing a multi-layer perceptron, LN representing regularization operation, MHA representing cross-multi-headed attention.

Further, in step S6, predicted barrage emotion

The calculation formula is as follows:

wherein ,

is a Softmax function, LN represents a layer regularization operation, +.>

Representing a multi-layer sensor->

For a learnable emotion prediction matrix, +.>

Is a learnable video emotion matrix, +.>

As a learnable ambient barrage emotion matrix,

representing a learnable target barrage emotion matrix, < ->

Representing a tandem operation, representing a product.

Further, in step S7, the loss function of emotion prediction

The construction formula is as follows:

the overall loss function

The calculation process of (2) is as follows:

wherein ,

for predicted barrage emotion, +.>

Representing cross entropy loss, < >>

Is true barrage emotion +.>

Representing loss balance parameters, +.>

Indicating batch processing.

A barrage emotion analysis system based on a joint model inputs barrage comments into the trained joint model to output emotion tendencies corresponding to the barrage comments;

the analysis system comprises a construction module, a video coding module, a text coding module, a door control fusion module, a multi-mode fusion module, a barrage reconstruction module, a barrage emotion prediction module and a loss calculation module;

the construction module is used for constructing a training sample set, and the training sample set comprises moments

Bullet comment>

Time->

To->

Inward bullet comment->

Surrounding video->

And comment on bullet screen->

Surrounding barrage comment +.>

；

The video coding module is used for coding the video

Coding and concatenating to obtain coded video features

；

The text coding module is used for commenting on the barrage

And the surrounding barrage comment->

Coding to obtain the coded target barrage characteristic +.>

And surrounding barrage feature->

；

The gating fusion module is used for processing the video

Coding and concatenating to obtain coded video features

Comment on the barrage->

And the surrounding barrage comment->

Coding to obtain the characteristics of the coded target barrage

And surrounding barrage feature->

Based on the target barrage feature +.>

For the surrounding barrage feature

；

The multi-mode fusion module is used for video features through the self-attention layer and the cross-attention layer

Target barrage feature->

Comment on surrounding barrage->

Processing to obtain enhanced video features->

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

；

The bullet screen reconstruction module is used for enhancing video features based on multi-layer multi-head attention layer pairs

Enhanced target barrage feature->

Reinforcing surrounding barrage->

；

The barrage emotion prediction module is used for enhancing video features

Enhanced target barrage feature->

Reinforcing surrounding barrage->

Corresponding to predicted barrage emotion +.>

；

The loss calculation module is used for predicting bullet screen emotion by using cross entropy

And true barrage emotion->

Construction of a loss function for emotion prediction>

Loss function based on barrage reconstruction>

And loss function for emotion prediction

Calculating the overall loss function->

Updating parameters of the joint model based on the total loss function and the back propagation algorithm until the performance of the joint model reaches a set expectation;

the surrounding barrage comments

The calculation formula is as follows:

/>

wherein ,

for post-selection->

Comment on the surrounding bullet screen->

Is->

Comment on the surrounding bullet screen->

Is (are) peripheral features of->

，/>

Is a learnable gate matrix +.>

Is a learnable gate offset vector, +.>

For ReLU function>

Representing series connection,/->

Representing the product.

The barrage emotion analysis method and system based on the joint model provided by the invention have the advantages that: according to the barrage emotion analysis method and system based on the joint model, the video information is included through the multi-mode fusion module, the relation between the video theme and the barrage is fully considered, the enhanced characteristic representation is obtained, and the emotion analysis performance of the joint model on the target barrage comment is improved; the video information is included through the multi-mode fusion module, the relation between the video theme and the barrage is fully considered, the enhanced characteristic representation is obtained, and the performance of the combined model for carrying out emotion analysis on the target barrage comment is improved; and the bullet screen reconstruction module is utilized to promote the overall learning effect of each module and improve the performance of the emotion analysis module.

Drawings

FIG. 1 is a schematic diagram of the structure of the present invention;

fig. 2 is a schematic view of a module frame according to the present invention.

Detailed Description

In the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit or scope of the invention, which is therefore not limited to the specific embodiments disclosed below.

As shown in fig. 1 and 2, according to the barrage emotion analysis method based on the combined model, barrage comments are input into the trained combined model so as to output emotion tendencies corresponding to the barrage comments; the combined model uses a coding-decoding architecture, and comprises a coding module and a decoding module, wherein the coding module comprises a video coding module, a text coding module, a door control fusion module and a multi-mode fusion module, the decoding module comprises a barrage reconstruction module and an emotion analysis module, the emotion analysis module comprises a barrage emotion prediction module and a loss calculation module, and the decoding module takes the output of the coding module as input so as to output emotion analysis trends corresponding to barrage evaluation.

The method mainly comprises the steps of taking surrounding comments as context information of a target barrage by using a gating screening mechanism in a joint model, taking video information into consideration by using a multi-mode fusion mode, fully utilizing useful information to strengthen characteristic representation of the video barrage, constructing the joint model based on a residual convolution neural network, a long-short-period memory network, a gating fusion self-attention layer, a cross-attention layer and the like, training and learning parameters in the joint model, and optimizing the learning parameters to realize the effect of accurately identifying emotion tendencies of the target barrage comments by the joint model, and is specifically as follows.

The training process of the joint model is as follows:

Bullet comment>

Time->

To the point of

Inward bullet comment->

Surrounding video->

And comment on bullet screen->

Surrounding barrage comment +.>

；

Video frequency

There is->

Frame video->

Surrounding barrage comment->

There is->

Comment on bullet screen

Surrounding barrage comment->

Is at the comment->

Surrounding comments.

For example, the bullet comment y "for itself, in the example shown in FIG. 2, insists on-! "as input, ambient barrage comment

"beautiful and" good stature "as the context of y, and video corresponding to the comment y of the bullet screen is given +.>

Together as an input.

S2: for the video

Coding and concatenating to obtain coded video feature ∈ ->

Comment on the barrage->

And said surroundingsBullet comment->

Coding to obtain the coded target barrage characteristic +.>

And surrounding barrage features

；

Encoding within a video encoding module using a residual convolutional neural network

Frame video->

And concatenating the obtained encoded vectors to obtain the encoded frame-level video feature +.>

：

wherein ,

representing a video encoder>

Representing a tandem operation;

in the text coding module, a long-term and short-term memory network is used

) Comment on the barrage respectively->

And its surroundings

Comment of bullet screen->

Coding to obtain the coded target barrage characteristic +.>

And surrounding barrage feature->

：

I.e.

wherein ,

；/>

=/>

；

it should be understood that the first

Comment on the surrounding bullet screen->

Is characterized by->

。

S3: based on the target barrage feature

For the surrounding barrage feature->

；

Based on the characteristics of the video barrage, some surrounding useful surrounding barrage comments with the same emotion can be used as the context information of the target barrage comments to provide assistance, so that the video barrage comments can be utilized by the gating fusion module

To pair(s)

Screening and filtering operation is carried out to obtain the +.>

Comment on the surrounding bullet screen->

：

wherein ,

for post-selection->

Comment on the surrounding bullet screen->

Is->

Comment on the surrounding bullet screen->

Is (are) peripheral features of->

，/>

Is a learnable gate matrix +.>

For a learnable gate offset vector, the function +.>

For ReLU function>

Representing series connection,/->

Representing the product>

and />

Are all learnable parameters, and parameter optimization is carried out in the combined model training process so as to achieve the expected effect by using the input model;

1 st to 1 st

Comment on the surrounding bullet screen->

All surrounding barrage comments are obtained by connecting in series>

：/>

wherein ,

representing a series operation.

Target barrage feature->

Comment on surrounding barrage->

Enhancement processing to obtain enhanced video features>

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

；

The multi-mode fusion module consists of an L-layer self-attention layer and a cross-attention layer, and is used for characterizing video

Target barrage feature->

Comment on surrounding barrage->

As input of the first layer of the multi-mode fusion module, after multi-layer iteration (i.e. after processing of the L layers of self-attention layers and cross-attention layers), corresponding enhanced video features fused with other modes are obtained in the last layer>

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

。

In the first place

Layer input video feature->

Obtaining the input video feature of the next layer +.>

The following are provided:

in the first place

Layer input target barrage feature->

Obtaining the input target barrage feature of the next layer>

：

In the first place

Layer input surrounding barrage comment->

Obtaining the comment +.>

：

S5, performing S5; based on multiple layersMulti-headed attention layer pair enhanced video features

Enhanced target barrage feature->

Reinforcing surrounding barrage->

；

The decoding module consists of a barrage reconstruction module and an emotion analysis module, and the decoding module encodes the enhanced video features obtained in the module

Enhanced target barrage feature->

Reinforcing surrounding barrage->

As input;

and in the barrage reconstruction module, the reconstruction loss is analyzed and calculated by the barrage reconstruction module and added into closed-loop training to promote the learning effect of the multi-mode fusion module and promote the effect of the emotion analysis module.

The barrage reconstruction module consists of a plurality of multi-head attention layers, and a loss function of barrage reconstruction

The method comprises the following steps:

wherein ,

representing batch processing, CE representing cross entropy loss, < ->

Representing a reconstruction module->

Comment of bullet generated by the reconstruction module is represented, < ->

Indicating time->

Is a true bullet comment; />

wherein

S6: for enhanced video features

Enhanced target barrage feature->

Reinforcing surrounding barrage->

Corresponding to predictedBullet screen emotion->

；

The emotion analysis module comprises a barrage emotion prediction module and a loss calculation module; wherein the bullet screen emotion predicted in the bullet screen emotion prediction module

The calculation formula is as follows:

wherein ,

is a Softmax function, LN represents a layer regularization operation, +.>

Representing a multi-layer sensor->

For a learnable emotion prediction matrix, +.>

Is a learnable video emotion matrix, +.>

As a learnable ambient barrage emotion matrix,

representing a learnable target barrage emotion matrix, < ->

Represents a series operation, & represents a product, & lt + & gt>

、/>

、/>

All are learnable parameters, and parameter optimization is performed in the combined model training process so as to achieve the expected effect by using the input model.

S7: predicted barrage emotion using cross entropy

And true barrage emotion->

Construction of a loss function for emotion prediction>

Loss function based on barrage reconstruction>

And loss function of emotion prediction->

Calculating the overall loss function->

loss function for emotion prediction in loss calculation module

The construction formula is as follows:

wherein ,

indicating batch processing, +.>

For the predicted bullet screen emotion, the predicted bullet screen emotion is bullet screen comment +.>

Predicted barrage emotion output through the joint model, < ->

The true barrage emotion is barrage comment +.>

Corresponding actual emotion;

overall loss function

The calculation process of (2) is as follows:

wherein ,

and representing the loss balance parameters, and updating the learnable parameters of the joint model based on the loss and the back propagation algorithm until the model performance achieves the expected effect.

First: step S3 provides a gating fusion mechanism, and utilizes the target barrage comments to carry out screening and filtering operation on surrounding barrage comments, so that some surrounding useful barrage comments with the same emotion can be used as context information of the target barrage comments to provide assistance, the problems that the barrage comments are short, insufficient context information exists and the like are solved, and the quality of the target barrage is improved.

Second,: step S4 provides a multi-mode fusion enhancement mechanism, video information is included through the multi-mode fusion module, the relation between a video theme and a barrage is fully considered, enhanced feature representation is obtained, and the performance of the joint model for carrying out emotion analysis on target barrage comments is improved.

Third, steps S5 to S7 provide a barrage reconstruction and emotion analysis mechanism, and the barrage reconstruction module is utilized to promote the overall learning effect of each module and the performance of the emotion analysis module.

The embodiment is mainly applied to emotion analysis of video real-time comments, for example, a comment is sent by a user at a certain moment, and emotion tendency of the comment is judged.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. The barrage emotion analysis method based on the joint model is characterized in that barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments;

the training process of the joint model is as follows:

Bullet comment>

Time->

To->

Inward bullet comment->

Surrounding video->

And bullet screenComment->

Surrounding barrage comment +.>

；

S2: for the video

Coding and concatenating to obtain coded video feature ∈ ->

Comment on the barrage->

And the surrounding barrage comment->

Coding to obtain the coded target barrage characteristic +.>

And surrounding barrage feature->

；

S3: based on the target barrage feature

For the surrounding barrage feature->

；

Target barrage feature->

Comment on surrounding barrage->

Enhancement processing to obtain enhanced video features>

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

；

Enhanced target barrage feature->

Reinforcing surrounding barrage->

；

S6: for enhanced video features

Enhanced target barrage feature->

Reinforcing surrounding barrage->

Corresponding to predicted barrage emotion +.>

；

S7: predicted barrage emotion using cross entropy

And true barrage emotion->

Constructing a loss function for emotion prediction

Loss function based on barrage reconstruction>

And loss function of emotion prediction->

Calculated overall loss function

Based on the overall loss function->

And updating parameters of the joint model by a back propagation algorithm until the performance of the joint model reaches a set expected value;

the surrounding barrage comments

The calculation formula is as follows:

wherein ,

for post-selection->

Comment on the surrounding bullet screen->

Is->

Comment on the surrounding bullet screen->

Is (are) peripheral features of->

，/>

Is a learnable gate matrix +.>

Is a learnable gate offset vector, +.>

For ReLU function>

Representing series connection,/->

Representing the product.

2. The method of collaborative model-based barrage emotion analysis of claim 1, wherein the video features

The calculation formula of (2) is as follows:

/>

the target barrage feature

The calculation formula of (2) is as follows:

the surrounding barrage feature

The calculation formula of (2) is as follows:

wherein ,

，/>

，/>

representing series connection,/->

Representing a video encoder>

Representing a long and short term memory network.

3. The barrage emotion analysis method based on joint model as set forth in claim 1, characterized in that in step S4: video characterization through self-attention layers and cross-attention layers

Target barrage feature->

Comment on surrounding bullet screen

Enhancement processing to obtain enhanced video features>

Enhanced target barrage feature->

And reinforcing the surrounding barrage

Specifically, the method comprises the following steps:

characterizing video

Target barrage feature->

Comment on surrounding barrage->

in the first place

Layer input video feature->

Obtaining the input video feature of the next layer +.>

The following are provided:

in the first place

Layer input target barrage feature->

Obtaining the input target barrage feature of the next layer>

：

In the first place

Layer input surrounding barrage comment->

Obtaining the comment +.>

：

4. A combined model-based barrage emotion analysis method as claimed in claim 3, characterized in that in step S5, the barrage reconstructed loss function

The construction formula is as follows:

wherein ,

indicating batch processing, +.>

Representing cross entropy loss, < >>

Representing a reconstruction module->

Comment of bullet generated by the reconstruction module is represented, < ->

Indicating time->

Is a true bullet comment;

wherein ,

representing a multi-layer perceptron, LN representing regularization operation, MHA representing cross-multi-headed attention. />

5. The method of claim 4, wherein in step S6, predicted barrage emotion is predicted

The calculation formula is as follows:

wherein ,

is a Softmax function, LN represents a layer regularization operation, +.>

Representing a multi-layer sensor->

For a learnable emotion prediction matrix, +.>

Is a learnable video emotion matrix, +.>

Is a surrounding barrage emotion matrix which can be learned, < + >>

Representing a learnable target barrage emotion matrix, < ->

Representing a tandem operation, representing a product.

6. The method of collaborative model-based barrage emotion analysis according to claim 5, wherein in step S7, the emotion predicted penalty function

The construction formula is as follows:

the overall loss function

The calculation process of (2) is as follows:

wherein ,

for predicted barrage emotion, +.>

Is true barrage emotion +.>

Representing cross entropy loss, < >>

Representing loss balance parameters, +.>

Indicating batch processing.

7. The barrage emotion analysis system based on the joint model is characterized in that barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments;

Bullet comment>

Time of day

To->

Inward bullet comment->

Surrounding video->

And comment on bullet screen->

Video in the same frameSurrounding barrage comments in->

；

The video coding module is used for coding the video

Coding and concatenating to obtain coded video feature ∈ ->

；

The text coding module is used for commenting on the barrage

And the surrounding barrage comment->

Coding to obtain the coded target barrage characteristic +.>

And surrounding barrage feature->

；

The gating fusion module is used for processing the video

Coding and concatenating to obtain coded video feature ∈ ->

Comment on the barrage->

And the surrounding barrage comment->

Coding to obtain coded target bulletCurtain characteristics->

And surrounding barrage feature->

Based on the target barrage feature +.>

For the surrounding barrage feature->

；

Target barrage feature->

Comment on surrounding barrage->

Processing to obtain enhanced video features->

Enhanced target barrage feature->

And reinforcing the surrounding barrage->

；

Enhanced target barrage feature->

Reinforcing surrounding barrage->

；

The barrage emotion prediction module is used for enhancing video features

Enhanced target barrage feature->

Reinforcing surrounding barrage->

Corresponding to predicted barrage emotion +.>

；

And true barrage emotion->

Construction of emotionPredicted loss function->

Loss function based on barrage reconstruction>

And loss function of emotion prediction->

Calculating the overall loss function->

Updating parameters of the joint model based on the overall loss function and the back propagation algorithm until the performance of the joint model reaches a set expectation;

the surrounding barrage comments