CN116363733A - Facial expression prediction method based on dynamic distribution fusion - Google Patents
Facial expression prediction method based on dynamic distribution fusion Download PDFInfo
- Publication number
- CN116363733A CN116363733A CN202310357220.9A CN202310357220A CN116363733A CN 116363733 A CN116363733 A CN 116363733A CN 202310357220 A CN202310357220 A CN 202310357220A CN 116363733 A CN116363733 A CN 116363733A
- Authority
- CN
- China
- Prior art keywords
- distribution
- sample
- category
- branch
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008921 facial expression Effects 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000004927 fusion Effects 0.000 title claims abstract description 40
- 238000003062 neural network model Methods 0.000 claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000008451 emotion Effects 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000005065 mining Methods 0.000 claims abstract description 9
- 238000007499 fusion processing Methods 0.000 claims abstract description 8
- 230000008901 benefit Effects 0.000 claims abstract description 7
- 230000001815 facial effect Effects 0.000 claims abstract description 7
- 230000010365 information processing Effects 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 7
- 230000015556 catabolic process Effects 0.000 claims description 5
- 238000006731 degradation reaction Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000013434 data augmentation Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002542 deteriorative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/84—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a facial expression prediction method based on dynamic distribution fusion, which comprises the steps of obtaining a facial expression data set, preprocessing a facial picture in the obtained data set, and obtaining a preprocessed data set; constructing auxiliary branches, and designing a double-branch neural network model based on the auxiliary branches; carrying out extraction sample distribution processing on the obtained pretreatment data set by adopting the constructed auxiliary branches; constructing category distribution, and mining emotion information processing aiming at the acquired sample distribution; carrying out dynamic distribution fusion processing on the constructed category distribution and the extracted sample distribution; constructing a multi-task learning frame and optimizing a double-branch neural network model; adopting an optimized double-branch neural network model to realize facial expression prediction; the invention introduces label distribution learning, and shows superiority compared with single label learning; dynamic distribution fusion is provided, and the effectiveness of label distribution learning is fully exerted; the method has the advantages of good prediction performance, high efficiency and less error.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a facial expression prediction method based on dynamic distribution fusion.
Background
Facial expression recognition is an important research direction in the field of computer vision. As a sub-field of emotion recognition, facial expression recognition can judge the expression state of a face through analysis of a facial image, and important support is provided for the fields of human-machine interaction, emotion calculation, intelligent monitoring and the like.
The facial expression recognition process mainly comprises facial expression image acquisition and preprocessing, facial expression feature extraction and facial expression classification. The facial expression preprocessing is to obtain the accurate position of the face from the acquired facial expression image through face detection and face alignment, and eliminate the interference of the picture background, and the success rate of the preprocessing is mainly influenced by factors such as image quality, light intensity, shielding and the like. The common facial expression feature extraction comprises geometric features, apparent features, mixed features and depth features, wherein the former three are used as traditional manual features and are widely applied in the early stage of facial expression recognition research, but the problems of low precision, poor robustness and the like of the methods are often existed; in recent years, with the rapid development of deep learning technology, deep features extracted through a deep convolutional neural network achieve good performance on facial expression recognition tasks. Facial expression classification is the last step of facial expression recognition, and the classification of traditional manual features often uses a K nearest neighbor method, a support vector machine, a random forest, an Adaboost algorithm, a Bayesian network, a single-layer perceptron and the like; in the deep learning framework, expression recognition can be performed in an end-to-end mode, namely, the deep neural network directly classifies and optimizes the features after learning the features.
Face expression models are mainly divided into 2D, 2.5D and 3D: the 2D face is an RGB face image shot by a common camera or an infrared image shot by an infrared camera, is an image for determining the representation color or texture under the visual angle, and does not contain depth information; 2.5D face is a face depth image shot by a depth camera under a certain visual angle, the curved surface information is discontinuous, and the depth information of the part which is not shielded is not shown; the 3D face is a point cloud or grid face image synthesized by face depth images with multiple angles, has complete curved surface information and contains depth information. The 2D facial expression recognition has long research time and complete software and hardware technology, and has been widely used, but the 2D facial expression only reflects two-dimensional plane information but does not contain depth information, so that the real facial expression cannot be completely expressed. Compared with a 2D human face, the 3D human face is not influenced by factors such as illumination, shielding or gesture, has better capability, can reflect human face information more truly, and is applied to the tasks such as human face synthesis, human face migration and the like. The 3D face generally obtains face depth information through professional equipment, and mainly comprises a binocular camera, an RGB-D camera based on a structured light principle and a TOF camera based on a light flight time principle. 2D facial expression recognition still dominates for the availability of 2D faces.
At present, a single-label learning method is selected for most of facial expression prediction methods to realize facial expression prediction. Although these methods have achieved good prediction performance, it is difficult to describe fuzzy or mislabeled samples due to insufficient emotion information contained in the single label, and overfitting of the neural network is easily caused, which makes it difficult to further improve prediction accuracy.
There are also few methods to select a label distribution learning method to implement facial expression prediction. Unlike the single tag learning method, these methods use tag distribution weights instead of single tags for training. Compared with a single label, the label distribution contains richer emotion information, and can effectively avoid the phenomenon of overfitting in the training process, so that the method has remarkable advantages. However, label distribution labeling is often difficult to obtain, so facial expression data sets that provide only a single label labeling still dominate. In recent years, label distribution learning methods focus on constructing label distribution from single labels, but the label distribution of these constructions is generally low in quality, and the advantages of label distribution learning cannot be fully exerted.
Disclosure of Invention
The invention aims to provide a facial expression prediction method based on dynamic distribution fusion, which has good prediction performance, high efficiency and less error.
The facial expression prediction method based on dynamic distribution fusion provided by the invention comprises the following steps:
s1, acquiring a facial expression data set, preprocessing a facial picture in the acquired data set, and acquiring a preprocessed data set;
s2, constructing auxiliary branches, and designing a double-branch neural network model based on the auxiliary branches;
s3, carrying out extraction sample distribution processing on the pretreatment data set obtained in the step S1 by adopting the auxiliary branches constructed in the step S2;
s4, constructing category distribution, and mining emotion information processing aiming at the sample distribution obtained in the step S3;
s5, carrying out dynamic distribution fusion processing on the category distribution constructed in the step S4 and the sample distribution obtained in the step S3;
s6, constructing a multi-task learning frame, and optimizing the double-branch neural network model designed in the step S2;
s7, adopting the double-branch neural network model obtained through optimization in the step S6 to realize facial expression prediction.
The step S1 of acquiring a facial expression data set, preprocessing a facial picture in the acquired data set, and acquiring a preprocessed data set specifically includes:
setting the facial expression data set asAnd data centralization culvertCovering the C-type label and N samples, performing face alignment processing by using an MTCNN algorithm, and outputting face pictures with fixed sizes; scaling the output face picture to a given size, and performing data augmentation by using a RandAugment technology; and carrying out normalization processing on the RGB channels of the face picture by using the mean value and standard deviation of the ImageNet dataset.
The constructing auxiliary branches in the step S2, and designing a dual-branch neural network model based on the auxiliary branches specifically includes:
and constructing a dual-branch neural network model by adopting a ResNet18 network model. The ResNet18 network model is divided into two parts: layer 1 in the ResNet18 network model is frozen as a feature extractor and the last layer 3 in the ResNet18 network model is used as a feature discriminator, which is defined as the target limb. The auxiliary branch is constructed based on the target branch, and the parameters and the structure of the auxiliary branch are consistent with those of the target branch. And designing and obtaining a dual-branch neural network model based on the feature extractor, the target branch and the constructed auxiliary branch.
The extracting sample distribution processing for the preprocessing data set obtained in the step S1 by using the auxiliary branches constructed in the step S2 in the step S3 specifically includes:
taking the probability distribution of the auxiliary branch output constructed in the step S2 as a sample distribution, and expressing the sample distribution by adopting the following formula:
wherein,,for sample x i Is the sample distribution of (y) j For the j-th class tag->Is a labely j For sample x i Description degree of->To assist the branch to sample x i Belonging to label y j Is used for predicting the probability of (1);
the auxiliary branches are trained through cross entropy loss to improve and maintain the distribution capacity of the auxiliary branch extraction samples, and the cross entropy loss function is expressed by adopting the following formula:
wherein L is ce Is a cross-entropy loss function,for sample x i Logic tag y of (2) i Is a function of the value of c,is the auxiliary branch to sample x i The prediction probability belonging to category c.
The step S4 of constructing category distribution, which is to mine emotion information processing for the sample distribution obtained in the step S3, specifically comprises the following steps:
using class distribution mining to find out implicit emotion information in sample distribution, eliminating influence of sample distribution errors on model performance, and expressing class distribution by adopting the following formula:
wherein,,for category distribution of category c->For samples belonging to category cx i Category distribution of N c For the number of samples belonging to category c;
setting a threshold t to judge whether the output category distribution meets the set robustness requirement, if the label y j For the description degree of the category c not reaching the threshold t, using a threshold distribution temporary substitution category distribution training model, describing by adopting the following formula:
wherein,,is the category distribution of category c, +.>Is the threshold distribution of category c, +.>For label y j The degree of description for category c.
The step S5 of performing dynamic distribution fusion processing on the category distribution constructed in the step S4 and the sample distribution obtained in the step S3 specifically includes:
the dynamic distribution fusion is based on category distribution, and the category distribution and the sample distribution are adaptively fused according to the attention weight of each sample. The dynamic distribution fusion is divided into two steps: attention weight extraction and adaptive distribution fusion;
1) Attention weight extraction:
for attention weight extraction, two attention modules are respectively embedded into the last layer of two branches to acquire the attention weight of a sample. The attention module is composed of a full connection layer and a Sigmoid function, the characteristics output by each branch are input to the corresponding attention module to extract attention weight of each sample, the attention weight value is used for judging whether a sample is clear or fuzzy, and the weight value is used for self-adaptive distribution fusion; the characteristics output by each branch are multiplied by the corresponding attention weight and then input into the corresponding classifier;
the flow of attention weight extraction is as follows:
a. for a batch of samples, the face features output by the feature extractor are input to the auxiliary branches and the target branches at the same time;
b. the attention weights output by the two attention modules are averaged to benefit from the sample ambiguity discrimination capability of the two limbs at the same time, and the averaged attention weights are expressed by the following formula:
wherein,,and->Sample x output by attention modules of two branches respectively i Is a weight of attention of (2);
c. the attention weights are rank regularized to avoid degradation of the discrimination capability of the attention module:
L RR =max(0,δ-(w H -w L ))
wherein w is H And w L Respectively M samples with high weight and low weightAttention weighted average of N-M samples, delta is a fixed difference, delta and M directly use the values in method SCN using the same attention module, L RR Is a loss function of ordering regularization;
d. the attention weight is normalized, and the following formula is adopted to represent the processing procedure:
wherein w is min For the lower limit of the attention weighting,is sample x i Attention weights after ordering regularization,is sample x i Attention weight after normalization treatment;
2) Adaptive distribution fusion:
the following general representation is used to represent the blended distribution after fusion:
wherein,,is sample x i Mixed distribution after fusion,/->Is sample x i Category distribution of->Is sample x i Tag distribution of->Is a samplex i And (5) carrying out attention weight after normalization processing.
The step S6 is to construct a multi-task learning framework, optimize the dual-branch neural network model designed in the step S2, and specifically comprise the following steps:
(1) optimizing the target branches:
training the target branch by using KL divergence loss, and expressing the training process by using the following formula:
wherein L is kld For the loss of the KL divergence,for class c for sample x i Description degree of->For the target branch to sample x i Belonging to label y j Is used for predicting the probability of (1);
(2) multitasking learning framework:
constructing a multi-task learning framework, minimizing a joint loss L through joint learning of distribution prediction and expression recognition, so as to optimize the prediction performance of a model, and expressing a joint loss function by adopting the following formula:
L=α 1 ·L kld +α 2 ·L ce +L RR
wherein alpha is 1 And alpha 2 For the weighted slope function related to training round e, beta is the threshold of training round, alpha is introduced 1 And alpha 2 Optimizing training process。
The implementation of facial expression prediction by adopting the double-branch neural network model obtained by optimization in the step S6 in the step S7 specifically comprises the following steps:
and (3) outputting probability distribution of each sample through the target branches by adopting the double-branch neural network model obtained by optimization in the step (S6) to predict the facial expression, and selecting the expression corresponding to the highest prediction probability from the output probability distribution as the predicted expression of the sample.
According to the facial expression prediction method based on dynamic distribution fusion, tag distribution learning is introduced, and overfitting is effectively avoided in the training process based on rich emotion information contained in tag distribution, so that the superiority compared with single tag learning is shown; meanwhile, dynamic distribution fusion is provided, and high-quality mixed distribution close to real distribution is generated by using extracted sample distribution and mined category distribution, so that the effectiveness of label distribution learning is fully exerted; the method has the advantages of good prediction performance, high efficiency and less error.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
Detailed Description
A schematic process flow diagram of the method of the present invention is shown in fig. 1: the facial expression prediction method based on dynamic distribution fusion provided by the invention comprises the following steps:
s1, acquiring a facial expression data set, preprocessing a facial picture in the acquired data set, and acquiring a preprocessed data set; the method specifically comprises the following steps:
assume that the facial expression dataset is s= { (x) i ,y i ) I=1, 2, …, N }, and the data set covers class C tags and N samples, because the sizes of face pictures in different data sets are different, the MTCNN algorithm is used for face alignment processing, and a face picture with a fixed size is output, and the invention outputs a 100×100 face picture; scaling the output face picture to a given size, obtaining 224 multiplied by 224 of the given size, and performing data augmentation by using RandAugment technology; face picture using mean and standard deviation of ImageNet datasetNormalization processing of RGB channels;
s2, constructing auxiliary branches, and designing a double-branch neural network model based on the auxiliary branches, wherein the method specifically comprises the following steps of:
and constructing a dual-branch neural network model by adopting a ResNet18 network model. The ResNet18 network model is divided into two parts: layer 1 in the ResNet18 network model is frozen as a feature extractor and the last layer 3 in the ResNet18 network model is used as a feature discriminator, which is defined as the target limb. The auxiliary branch is constructed based on the target branch, and the parameters and the structure of the auxiliary branch are consistent with those of the target branch. Designing and obtaining a double-branch neural network model based on the feature extractor, the target branch and the constructed auxiliary branch;
s3, carrying out extraction sample distribution processing on the preprocessing data set acquired in the step S1 by adopting the auxiliary branches constructed in the step S2, wherein the method specifically comprises the following steps:
the probability distribution training model directly output by the ResNet18 network model can cause degradation of model performance, the probability distribution output by the auxiliary branch constructed in the step S2 is taken as sample distribution, and the sample distribution is expressed by adopting the following formula:
wherein,,for sample x i Is the sample distribution of (y) j For the j-th class tag->For label y j For sample x i Description degree of->To assist the branch to sample x i Belonging to label y j Is used for predicting the probability of (1);
the auxiliary branches are trained through cross entropy loss to improve and maintain the distribution capacity of the auxiliary branch extraction samples, and the cross entropy loss function is expressed by adopting the following formula:
wherein L is ce Is a cross-entropy loss function,for sample x i Logic tag y of (2) i Is a function of the value of c,is the auxiliary branch to sample x i A predictive probability of belonging to category c;
s4, constructing category distribution, and mining emotion information processing aiming at the sample distribution obtained in the step S3, wherein the method specifically comprises the following steps:
based on the sensitivity of the deep neural network to fuzzy or error labeling samples, using class distribution mining to find out implicit emotion information in the sample distribution, eliminating the influence of sample distribution errors on model performance, and expressing class distribution by adopting the following formula:
wherein,,for the distribution of category c->For sample x belonging to category c i Category distribution of N c For the number of samples belonging to category c;
class distribution mining is performed by pairingAdding and averaging sample distribution of all samples belonging to a certain category to obtain category distribution of a corresponding category; because the parameters of the auxiliary branches are unstable in the initial training stage, the class distribution meeting the set stability requirement cannot be output, each expression cannot be accurately described by the class distribution at the moment, in order to avoid the prediction performance of the wrong class distribution degradation model, a threshold t is set to judge whether the output class distribution meets the set stability requirement, if yes, the label y j For the description degree of the category c not reaching the threshold t, using the threshold distribution to temporarily replace the category distribution training model, setting the threshold between 0 and 1, and determining a specific value through an ablation experiment. The threshold is set based on the following phenomena: the stronger the model's ability to extract features, the higher the value of the corresponding sample tag location in the tag distribution. Whether the feature extraction of the model is in place or not can be judged by setting a threshold value; the following formula is used for description:
wherein,,is the category distribution of category c, +.>Is the threshold distribution of category c, +.>For label y j The degree of description for category c;
s5, carrying out dynamic distribution fusion processing on the category distribution constructed in the step S4 and the sample distribution obtained in the step S3, wherein the dynamic distribution fusion processing specifically comprises the following steps:
the dynamic distribution fusion is based on category distribution, and the category distribution and the sample distribution are adaptively fused according to the attention weight of each sample. The dynamic distribution fusion is divided into two steps: attention weight extraction and adaptive distribution fusion;
1) Attention weight extraction:
for attention weight extraction, two attention modules are respectively embedded into the last layer of two branches to acquire the attention weight of a sample. The attention module is composed of a full connection layer and a Sigmoid function, the characteristics output by each branch are input to the corresponding attention module to extract attention weight of each sample, the attention weight value can judge whether a sample is clear or fuzzy, and the weight value is used for self-adaptive distribution fusion; the characteristics output by each branch are multiplied by the corresponding attention weight and then input into the corresponding classifier;
the flow of attention weight extraction is as follows:
a. for a batch of samples, the face features output by the feature extractor are input to the auxiliary branches and the target branches at the same time;
b. the attention weights output by the two attention modules are averaged to benefit from the sample ambiguity discrimination capability of the two limbs at the same time, and the averaged attention weights are expressed by the following formula:
wherein,,and->Sample x output by attention modules of two branches respectively i Is a weight of attention of (2);
c. the attention weights are rank regularized to avoid degradation of the discrimination capability of the attention module:
L RR =max(0,δ-(w H -w L ))
wherein w is H And w L Attention weight averages of M samples of high weight and N-M samples of low weight, respectively, delta being a fixed difference, delta and M being values in the method SCN employing the same attention module directly used in order to avoid repetition of experiments, are set to 0.07 and 0.7N, L, respectively, in the present invention RR Is a loss function of ordering regularization;
d. the attention weight is normalized, and the following formula is adopted to represent the processing procedure:
wherein w is min For the lower limit of the attention weighting,is sample x i Attention weights after ordering regularization,is sample x i After normalization treatment, setting the attention weight and the super parameter w min The method aims to prevent the ambiguity of the low-attention-weight sample in the fusion process from deteriorating the model performance, and the lower the attention weight is, the higher the sample ambiguity is.
2) Adaptive distribution fusion:
for adaptive distribution fusion, the category distribution and the sample distribution are adaptively fused based on the acquired attention weight, so that the robustness of the category distribution and the diversity of the sample distribution are considered, and the mixed distribution after fusion is represented by adopting the following public representation:
wherein,,is sample x i Mixed distribution after fusion,/->Is sample x i Category distribution of->Is sample x i Tag distribution of->Is sample x i Attention weight after normalization treatment;
s6, constructing a multi-task learning frame, and optimizing a double-branch neural network model designed in the step S2, wherein the method specifically comprises the following steps:
(1) optimizing the target branches:
training the target branch by using KL divergence loss, and expressing the training process by using the following formula:
wherein L is kld For the loss of the KL divergence,for class c for sample x i Description degree of->For the target branch to sample x i Belonging to label y j Is used for predicting the probability of (1);
(2) multitasking learning framework:
constructing a multi-task learning framework, and minimizing a joint loss L through joint learning of distributed prediction and expression recognition, so as to optimize the prediction performance of the model; the joint loss function is expressed using the following formula:
L=α 1 ·L kld +α 2 ·L ce +L RR
wherein alpha is 1 And alpha 2 For the weighted slope function related to training round e, beta is the threshold of training round, alpha is introduced 1 And alpha 2 Optimizing a training process; in the initial stage of training, training the auxiliary branches in an important mode so that the auxiliary branches can output sample distribution and category distribution meeting the set robustness requirement; in the later stage of training, training target branches and avoiding the auxiliary branches from being over fitted; in the reasoning stage, the auxiliary branches are removed, and only the target branches are used for predicting the expression of the sample;
s7, realizing facial expression prediction by adopting the double-branch neural network model obtained by optimizing in the step S6, wherein the method specifically comprises the following steps:
and (3) outputting probability distribution of each sample through the target branches by adopting the double-branch neural network model obtained by optimization in the step (S6) to predict the facial expression, and selecting the expression corresponding to the highest prediction probability from the output probability distribution as the predicted expression of the sample.
Claims (8)
1. A facial expression prediction method based on dynamic distribution fusion comprises the following steps:
s1, acquiring a facial expression data set, preprocessing a facial picture in the acquired data set, and acquiring a preprocessed data set;
s2, constructing auxiliary branches, and designing a double-branch neural network model based on the auxiliary branches;
s3, carrying out extraction sample distribution processing on the pretreatment data set obtained in the step S1 by adopting the auxiliary branches constructed in the step S2;
s4, constructing category distribution, and mining emotion information processing aiming at the sample distribution obtained in the step S3;
s5, carrying out dynamic distribution fusion processing on the category distribution constructed in the step S4 and the sample distribution obtained in the step S3;
s6, constructing a multi-task learning frame, and optimizing the double-branch neural network model designed in the step S2;
s7, adopting the double-branch neural network model obtained through optimization in the step S6 to realize facial expression prediction.
2. The facial expression prediction method based on dynamic distribution fusion according to claim 1, wherein the step S1 of obtaining a facial expression dataset, preprocessing a facial picture in the obtained dataset, and obtaining a preprocessed dataset specifically includes:
setting the facial expression data set asThe data set covers C-type labels and N samples, the MTCNN algorithm is used for face alignment processing, and face pictures with fixed sizes are output; scaling the output face picture to a given size, and performing data augmentation by using a RandAugment technology; and carrying out normalization processing on the RGB channels of the face picture by using the mean value and standard deviation of the ImageNet dataset.
3. The facial expression prediction method based on dynamic distribution fusion according to claim 2, wherein the constructing auxiliary branches in step S2, and designing a dual-branch neural network model based on the auxiliary branches, specifically comprises:
and constructing a dual-branch neural network model by adopting a ResNet18 network model. The ResNet18 network model is divided into two parts: layer 1 in the ResNet18 network model is frozen as a feature extractor and the last layer 3 in the ResNet18 network model is used as a feature discriminator, which is defined as the target limb. The auxiliary branch is constructed based on the target branch, and the parameters and the structure of the auxiliary branch are consistent with those of the target branch. And designing and obtaining a dual-branch neural network model based on the feature extractor, the target branch and the constructed auxiliary branch.
4. The facial expression prediction method based on dynamic distribution fusion according to claim 3, wherein the extracting sample distribution processing is performed on the preprocessing data set acquired in step S1 by the auxiliary branches constructed in step S2 in step S3, and specifically includes:
taking the probability distribution of the auxiliary branch output constructed in the step S2 as a sample distribution, and expressing the sample distribution by adopting the following formula:
wherein,,for sample x i Is the sample distribution of (y) j For the j-th class tag->For label y j For sample x i Description degree of->To assist the branch to sample x i Belonging to label y j Is used for predicting the probability of (1);
the auxiliary branches are trained through cross entropy loss to improve and maintain the distribution capacity of the auxiliary branch extraction samples, and the cross entropy loss function is expressed by adopting the following formula:
5. The facial expression prediction method based on dynamic distribution fusion according to claim 4, wherein the constructing of the category distribution in step S4, performing mining emotion information processing on the sample distribution obtained in step S3, specifically includes:
using class distribution mining to find out implicit emotion information in sample distribution, eliminating influence of sample distribution errors on model performance, and expressing class distribution by adopting the following formula:
wherein,,for category distribution of category c->For sample x belonging to category c i Category distribution of N c For the number of samples belonging to category c;
setting a threshold t to judge whether the output category distribution meets the set robustness requirement, if the label y j For the followingThe description degree of the category c does not reach the threshold t, and the threshold distribution is used for temporarily replacing the category distribution training model, so that the description is carried out by adopting the following formula:
6. The facial expression prediction method based on dynamic distribution fusion according to claim 5, wherein the step S5 is characterized in that the dynamic distribution fusion processing is performed on the category distribution constructed in the step S4 and the sample distribution obtained in the step S3, and specifically includes:
the dynamic distribution fusion is based on category distribution, and the category distribution and the sample distribution are adaptively fused according to the attention weight of each sample. The dynamic distribution fusion is divided into two steps: attention weight extraction and adaptive distribution fusion;
1) Attention weight extraction:
for attention weight extraction, two attention modules are respectively embedded into the last layer of two branches to acquire the attention weight of a sample. The attention module is composed of a full connection layer and a Sigmoid function, the characteristics output by each branch are input to the corresponding attention module to extract attention weight of each sample, the attention weight value is used for judging whether a sample is clear or fuzzy, and the weight value is used for self-adaptive distribution fusion; the characteristics output by each branch are multiplied by the corresponding attention weight and then input into the corresponding classifier;
the flow of attention weight extraction is as follows:
a. for a batch of samples, the face features output by the feature extractor are input to the auxiliary branches and the target branches at the same time;
b. the attention weights output by the two attention modules are averaged to benefit from the sample ambiguity discrimination capability of the two limbs at the same time, and the averaged attention weights are expressed by the following formula:
wherein,,and->Sample x output by attention modules of two branches respectively i Is a weight of attention of (2);
c. the attention weights are rank regularized to avoid degradation of the discrimination capability of the attention module:
L RR =max(0,δ-(w H -w L ))
wherein w is H And w L Attention weight averages of M samples with high weight and N-M samples with low weight, respectively, delta being a fixed difference, delta and M directly using values in the method SCN employing the same attention module, L RR Is a loss function of ordering regularization;
d. the attention weight is normalized, and the following formula is adopted to represent the processing procedure:
wherein w is min For the lower limit of the attention weighting,is sample x i Attention weight after ordering regularization, < ->Is sample x i Attention weight after normalization treatment;
2) Adaptive distribution fusion:
the following general representation is used to represent the blended distribution after fusion:
7. The facial expression prediction method based on dynamic distribution fusion according to claim 6, wherein the constructing a multi-task learning framework in step S6 optimizes the dual-branch neural network model designed in step S2, and specifically includes:
(1) optimizing the target branches:
training the target branch by using KL divergence loss, and expressing the training process by using the following formula:
wherein L is kld For the loss of the KL divergence,for class c for sample x i Description degree of->For the target branch to sample x i Belonging to label y j Is used for predicting the probability of (1);
(2) multitasking learning framework:
constructing a multi-task learning framework, minimizing a joint loss L through joint learning of distribution prediction and expression recognition, so as to optimize the prediction performance of a model, and expressing a joint loss function by adopting the following formula:
L=α 1 ·L kld +α 2 ·L ce +L RR
wherein alpha is 1 And alpha 2 For the weighted slope function related to the training round, beta is the threshold of the training round, alpha is introduced 1 And alpha 2 And optimizing the training process.
8. The facial expression prediction method based on dynamic distribution fusion according to claim 7, wherein the facial expression prediction is realized by adopting the double-branch neural network model obtained by optimization in step S6 in step S7, and specifically comprises the following steps:
and (3) outputting probability distribution of each sample through the target branches by adopting the double-branch neural network model obtained by optimization in the step (S6) to predict the facial expression, and selecting the expression corresponding to the highest prediction probability from the output probability distribution as the predicted expression of the sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310357220.9A CN116363733A (en) | 2023-04-06 | 2023-04-06 | Facial expression prediction method based on dynamic distribution fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310357220.9A CN116363733A (en) | 2023-04-06 | 2023-04-06 | Facial expression prediction method based on dynamic distribution fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116363733A true CN116363733A (en) | 2023-06-30 |
Family
ID=86920731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310357220.9A Pending CN116363733A (en) | 2023-04-06 | 2023-04-06 | Facial expression prediction method based on dynamic distribution fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116363733A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738120A (en) * | 2023-08-11 | 2023-09-12 | 齐鲁工业大学(山东省科学院) | Copper grade SCN modeling algorithm for X fluorescence grade analyzer |
-
2023
- 2023-04-06 CN CN202310357220.9A patent/CN116363733A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738120A (en) * | 2023-08-11 | 2023-09-12 | 齐鲁工业大学(山东省科学院) | Copper grade SCN modeling algorithm for X fluorescence grade analyzer |
CN116738120B (en) * | 2023-08-11 | 2023-11-03 | 齐鲁工业大学(山东省科学院) | Copper grade SCN modeling algorithm for X fluorescence grade analyzer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shao et al. | Performance evaluation of deep feature learning for RGB-D image/video classification | |
CN107609460B (en) | Human body behavior recognition method integrating space-time dual network flow and attention mechanism | |
Bai et al. | Subset based deep learning for RGB-D object recognition | |
US9558268B2 (en) | Method for semantically labeling an image of a scene using recursive context propagation | |
CN109948475B (en) | Human body action recognition method based on skeleton features and deep learning | |
CN111507378A (en) | Method and apparatus for training image processing model | |
CN110033007B (en) | Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion | |
CN113505634B (en) | Optical remote sensing image salient target detection method of double-flow decoding cross-task interaction network | |
Yan et al. | Monocular depth estimation with guidance of surface normal map | |
CN113255602A (en) | Dynamic gesture recognition method based on multi-modal data | |
CN114842238A (en) | Embedded mammary gland ultrasonic image identification method | |
Song et al. | Contextualized CNN for scene-aware depth estimation from single RGB image | |
CN110991500A (en) | Small sample multi-classification method based on nested integrated depth support vector machine | |
CN116363733A (en) | Facial expression prediction method based on dynamic distribution fusion | |
Xu et al. | Weakly supervised facial expression recognition via transferred DAL-CNN and active incremental learning | |
CN115035599A (en) | Armed personnel identification method and armed personnel identification system integrating equipment and behavior characteristics | |
Li et al. | SGML: A symmetric graph metric learning framework for efficient hyperspectral image classification | |
Li et al. | IIE-SegNet: Deep semantic segmentation network with enhanced boundary based on image information entropy | |
Poostchi et al. | Feature selection for appearance-based vehicle tracking in geospatial video | |
Singh et al. | Deep active transfer learning for image recognition | |
Lai et al. | Underwater target tracking via 3D convolutional networks | |
Chiu et al. | Real-time monocular depth estimation with extremely light-weight neural network | |
Kong et al. | Detection model based on improved faster-RCNN in apple orchard environment | |
CN113033263B (en) | Face image age characteristic recognition method | |
Girdhar et al. | Gibbs sampling strategies for semantic perception of streaming video data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |