CN116912921A - Expression recognition method and device, electronic equipment and readable storage medium - Google Patents

Expression recognition method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN116912921A
CN116912921A CN202311168658.9A CN202311168658A CN116912921A CN 116912921 A CN116912921 A CN 116912921A CN 202311168658 A CN202311168658 A CN 202311168658A CN 116912921 A CN116912921 A CN 116912921A
Authority
CN
China
Prior art keywords
expression
uncertainty
feature vector
loss value
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311168658.9A
Other languages
Chinese (zh)
Other versions
CN116912921B (en
Inventor
蒋召
张星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xumi Yuntu Space Technology Co Ltd filed Critical Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority to CN202311168658.9A priority Critical patent/CN116912921B/en
Publication of CN116912921A publication Critical patent/CN116912921A/en
Application granted granted Critical
Publication of CN116912921B publication Critical patent/CN116912921B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the field of artificial intelligence, and provides an expression recognition method, an expression recognition device, electronic equipment and a readable storage medium. The method comprises the following steps: acquiring an image to be identified; the expression of the image to be identified is identified through the expression identification model obtained through training, and an expression identification result is obtained; the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs. The embodiment of the application solves the problem of inaccurate recognition of the expression in the complex scene in the prior art.

Description

Expression recognition method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an expression recognition method, apparatus, electronic device, and readable storage medium.
Background
With the progress of science and technology, expression recognition in the field of artificial intelligence has been widely used, and in many fields, the application of expression recognition is seen, but there are some problems in practical application. For example, uncertainty in the expression recognition field includes data uncertainty and model uncertainty, the data uncertainty refers to the uncertainty of model learning that some samples in the expression recognition data set cannot judge the labels of the labeling personnel due to subjectivity of the labeling personnel, and the model uncertainty can be solved by adding more training data. For data uncertainty, the prior art can remark the data through an algorithm or manually, but the remark removes difficult samples in the data set, namely samples in complex scenes, and the effectiveness of model learning is reduced.
Therefore, the problem of inaccurate surface condition identification in a complex scene exists in the prior art.
Disclosure of Invention
In view of the above, the embodiments of the present application provide an expression recognition method, apparatus, electronic device, and readable storage medium, so as to solve the problem in the prior art that expression recognition is inaccurate in a complex scene.
In a first aspect of an embodiment of the present application, an expression recognition method is provided, including:
acquiring an image to be identified;
the expression of the image to be identified is identified through the expression identification model obtained through training, and an expression identification result is obtained;
the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs.
In a second aspect of an embodiment of the present application, there is provided an expression recognition apparatus, including:
the acquisition module is used for acquiring the image to be identified;
the recognition module is used for recognizing the expression of the image to be recognized through the expression recognition model obtained through training to obtain an expression recognition result;
the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs.
In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In a fourth aspect of the embodiments of the present application, there is provided a computer storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
Compared with the prior art, the embodiment of the application has the beneficial effects that:
obtaining a loss value of the expression recognition model through the uncertainty of the first image sample and the uncertainty of the second image sample in the image sample pair, training the expression recognition model according to the loss value, and recognizing the expression of the image to be recognized by using the expression recognition model obtained through training to obtain an expression recognition result. Thus, the loss value is associated with the uncertainty of the first image sample and the uncertainty of the second image sample, so that the learning of the expression recognition model on the image uncertainty is realized, the recognition precision and generalization of the expression recognition model are improved, and the problem of inaccurate expression recognition under complex scenes in the prior art is solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an expression recognition method according to an embodiment of the present application;
fig. 2 is a flowchart of another expression recognition method according to an embodiment of the present application;
FIG. 3 is a schematic workflow diagram of an uncertainty learning module provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of an expression recognition device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the application may be practiced otherwise than as specifically illustrated and described herein, and that the objects identified by "first," "second," etc. are generally of the same type and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
Furthermore, it should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
An expression recognition method, apparatus, electronic device, and readable storage medium according to embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of an expression recognition method according to an embodiment of the present application. As shown in fig. 1, the expression recognition method includes:
step 101, obtaining an image to be identified;
the image to be identified is an image which needs to be identified by using the expression identification model.
The background scene of the image to be identified may be a complex background scene, for example, the background scene may include expression recognition under an environment with insufficient light or complex background, expression recognition under a multi-person scene, expression recognition under different race, gender, age, etc., expression recognition under a non-frontal face angle, expression recognition with rapid change, such as smiling, blinking, etc.
Step 102, recognizing the expression of the image to be recognized through the expression recognition model obtained through training, and obtaining an expression recognition result.
The expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on the uncertainty of a first image sample and the uncertainty of a second image sample in the image sample pairs.
An expression recognition model is an artificial intelligence model that can recognize and understand the emotion and emotion conveyed by human facial expressions, and infer the emotional state of a person by analyzing facial features and dynamic changes.
The training set comprises a plurality of image sample pairs, the image sample pairs are randomly selected to train the expression recognition model, and it is to be noted that the expression labels of the first image sample and the expression labels of the second image sample in the image sample pairs can be the same or different.
In order to improve the accuracy and generalization of the expression recognition model, the number and variety of image sample pairs should be as large as possible.
The loss value of the expression recognition model is used for measuring the difference between the model prediction result and the real expression label.
Uncertainty of an image sample refers to the situation where there is uncertainty in understanding and deducing the content of an image due to factors such as blurring, noise, uncertainty of model predictions, etc. present in the image. Uncertainty is prevalent in image processing because there are a variety of factors interfering and affecting the image, such as illumination, noise, occlusion, blurring, etc. These factors can affect the quality and content of the image, resulting in some uncertainty in the understanding and inference of the image by the model. The higher the uncertainty of the image sample, the greater the likelihood of uncertainty and errors in understanding and inferring the image content.
The method for acquiring the uncertainty of the image sample may include an Entropy method (Entropy), a weighted average variance method (Weighted average variance), and the like, and is not particularly limited herein.
The loss value of the expression recognition model is obtained through the uncertainty of the first image sample and the uncertainty of the second image sample in the image sample pair, the uncertainty of the image sample is considered in the training process of the expression recognition model, the influence of the data uncertainty on the expression recognition result is avoided, and the high accuracy and the high generalization of the expression recognition model are improved.
The method comprises the steps of obtaining an expression recognition result by recognizing the expression of an image to be recognized through the expression recognition model obtained through training, and enabling the image to be recognized more accurately by utilizing the high precision and the high generalization of the trained expression recognition model, so that the more accurate recognition result is obtained.
In this way, the embodiment obtains the loss value of the expression recognition model through the uncertainty of the first image sample and the uncertainty of the second image sample in the image sample pair, trains the expression recognition model according to the loss value, recognizes the expression of the image to be recognized by using the expression recognition model obtained through training to obtain the expression recognition result, effectively utilizes the capability of the expression recognition model after training, enables the expression recognition model to fully recognize the image in the complex scene, obtains a more accurate recognition result, and solves the problem of inaccurate expression recognition in the complex scene in the prior art.
In some embodiments, the expression recognition model includes a backbone network module and an uncertainty learning module;
the method comprises the steps of identifying the expression of an image to be identified through an expression identification model obtained through training, and before obtaining an expression identification result, further comprising:
inputting the first image sample and the second image sample into a backbone network module, obtaining a first expression feature vector and first uncertainty data of the first image sample output by the backbone network module, and obtaining a second expression feature vector and second uncertainty data of the second image sample output by the backbone network module;
inputting the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data into an uncertainty learning module to obtain a loss value output by the uncertainty learning module;
and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.
Specifically, a backbone network (backboneNet) refers to a backbone part used for extracting image features in a deep neural network, and the backbone network can be used for tasks such as image classification, object detection, semantic segmentation and the like, and is used for extracting high-level feature representations of images so as to realize understanding and deducing of image contents. The backbone network is generally composed of a plurality of convolution layers and pooling layers, which can effectively reduce the dimension and complexity of the image and extract the characteristic representation with semantic information.
The backbone network may use a network including Alexacten network (AlexNet), residual network (ResNet), dense connectivity network (DenseNet), etc., which is not particularly limited.
In addition, conventional feature extraction networks are typically composed of multiple convolutional layers and pooling layers, with the last layer being a fully connected layer for flattening the feature map (or tensor) into a one-dimensional vector and connecting into a classifier. This layer is commonly referred to as the fully connected layer, classification layer, or top layer. Using a backbone network as a feature extractor, the last layer is typically removed and the output before the last layer is passed as a feature vector to a classifier for classification tasks. The purpose of this is to separate the feature extraction capability from the classification capability of the network, so that the migration learning and application can be more conveniently performed.
It should be noted that, the uncertainty data and the expression feature vectors extracted by the backbone network have the same channel number, so that the memory occupation of the data can be reduced, the network structure is more symmetrical, the robustness and stability of the network are improved, and the image sample pairs share the same network structure and parameters, so that the calculation amount and the parameter number of the model can be effectively reduced, and the generalization capability and efficiency of the model are improved. Meanwhile, since the image samples share the same network structure and parameters, the characteristic representation among the image samples is the same, and the similarity and the correlation among the images can be better utilized.
The expression feature vector refers to a group of numerical vectors of the facial expression, is usually obtained by processing and analyzing a facial image, generally contains semantic information and structural information of the facial expression, and can be used for tasks such as facial expression recognition, emotion analysis and the like.
The uncertainty learning module can obtain a loss value of an image sample based on uncertainty data and expression feature vectors extracted by the backbone network and is used for training an expression recognition model.
When the loss value is smaller than or equal to the preset value, the expression recognition model is trained, the loss value on the training data set reaches the preset target value, the model learns enough characteristic representation and can be used for carrying out expression recognition on new and unseen face images, and the trained backbone network module and a classifier can be combined to be used as the expression recognition model after training. In addition, the magnitude of the preset value may be set according to actual conditions, and is not particularly limited herein.
According to the embodiment, the first image sample and the second image sample are input into the backbone network module, the backbone network outputs the first expression feature vector and the first uncertainty data of the first image sample, and outputs the second expression feature vector and the second uncertainty data of the second image sample, and the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data are input into the uncertainty learning module to obtain the loss value output by the uncertainty learning module, so that the uncertainty of the image is considered in the training process of the expression recognition model, and the accuracy of the expression recognition model in recognizing the complex image sample is improved.
Further, in some embodiments, inputting the first expression feature vector, the first uncertainty data, the second expression feature vector, and the second uncertainty data into the uncertainty learning module, resulting in a loss value output by the uncertainty learning module, comprising:
obtaining a third expression feature vector based on the first expression feature vector and the first uncertainty data through an uncertainty learning module, obtaining a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, and carrying out mixing processing on the third expression feature vector and the fourth expression feature vector to obtain a mixed vector;
the loss value is obtained by an uncertainty learning module based on the mixed vector.
Specifically, the third expression feature vector is obtained from the first expression feature vector and the first uncertainty data of the first image sample, and the fourth expression feature vector is obtained from the second expression feature vector and the second uncertainty data of the second image sample.
The purpose of the mixing is to comprehensively utilize expression information in two pictures, and further improve accuracy of expression recognition, and the mixing method can include a linear mixing method, a nonlinear mixing method, a style migration method and the like, and is not limited herein.
In addition, when the expression features of the two pictures are mixed, the respective expression features can be mixed according to a certain weight in a weighted average mode. The weight distribution may be determined according to different situations, for example, expression metric values of two pictures, definition and brightness of the pictures, and these factors may affect the accuracy of the blending result. The expression information of the two pictures can be more effectively integrated by reasonably weighting and distributing according to the factors, and the recognition accuracy is improved.
And obtaining a loss value through the sample image of the mixed vector, and updating the expression recognition model.
According to the embodiment, the third expression feature vector and the fourth expression feature vector are obtained through the uncertainty learning module, the third expression feature vector and the fourth expression feature vector are mixed, a new sample image with more expression features is obtained, a loss value is obtained through the sample image, the expression information of the two pictures is effectively integrated to be used for training the expression recognition model, and the recognition precision of the model is improved.
In addition, in some embodiments, obtaining, by the uncertainty learning module, a third expression feature vector based on the first expression feature vector and the first uncertainty data, and obtaining, based on the second expression feature vector and the second uncertainty data, a fourth expression feature vector includes:
averaging the first uncertainty data and the second uncertainty data according to an output channel of the backbone network module to obtain first overall uncertainty data of the first image sample and second overall uncertainty data of the second image sample;
normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data;
and multiplying the first expression feature vector with the first normalization data to obtain a third expression feature vector, and multiplying the second expression feature vector with the second normalization data to obtain a fourth expression feature vector.
Specifically, by averaging the uncertainties according to the channels, an average uncertainty value of each channel can be obtained, so that the contribution degree of each channel to the overall uncertainty of the image, namely, which features or parts are more important to the prediction result and which parts have higher prediction uncertainty, is known.
For example, parameters of the feature extractor may be adjusted according to the average uncertainty values of different channels to improve the accuracy of extracting the surface features.
According to the embodiment, the first uncertainty data and the second uncertainty data are averaged according to the channel to obtain the first overall uncertainty data and the second overall uncertainty data, and the first overall uncertainty data and the second overall uncertainty data are used for helping to improve the robustness and the accuracy of the expression recognition model, so that the method has good recognition capability on images of different types.
Normalization refers to scaling data according to a certain proportion so that the data falls into a specific interval, and the comparability between different data are achieved, so that the data analysis effect is improved.
The normalization method may include a min-max normalization, a normalization (z-score) normalization, a mean variance normalization, etc., and is not specifically limited herein.
For example, the method uses Z-Score to normalize the uncertainty data, and the method converts the uncertainty data into a standard normal distribution with a mean value of 0 and a standard deviation of 1, so that all the uncertainty data is uniformly changed into a distribution with 0 as a center, and the data can be conveniently compared and analyzed.
For another example, assume that the uncertainty of the first image sample isThe uncertainty of the second image sample is +.>The first normalized data of the normalized first image sample is: />
The second normalized data of the normalized second image sample is:
in the embodiment, the first overall uncertainty data and the second overall uncertainty data are normalized to obtain the first normalized data and the second normalized data, so that dimension and size differences among different data are eliminated, and comparison and analysis can be performed under the same scale.
The embodiment can multiply different expression feature vectors element by element to obtain a new feature vector as a final recognition feature. The multiplication method may use a method including simple phase multiplication, linear phase multiplication, adaptive weighting method, and the like, and is not particularly limited herein.
The normalized data and the expression feature vectors are multiplied, and the importance among different features can be adjusted, so that the accuracy of expression recognition model recognition is improved.
For example, in an expression recognition task, expression features of certain regions may contribute more to recognition results, while expression features of certain regions may contribute less to recognition results. The expression features of different areas can be weighted by multiplying the normalized uncertainty and the facial expression features, so that the recognition accuracy is improved.
It should be noted that, the feature weighting method needs to be adjusted and optimized according to the specific application scenario and the data set. Under different data sets and application situations, important characteristics may be different, and adjustment is required according to actual situations. Meanwhile, the feature weighting method is also required to be used in combination with other feature selection and extraction methods so as to achieve the optimal expression recognition effect.
In this embodiment, the first expression feature vector and the first normalized data are multiplied to obtain a third expression feature vector, and the second expression feature vector and the second normalized data are multiplied to obtain a fourth expression feature vector. The important expression feature vectors are given greater weight by weighting the importance degrees of the expression feature vectors, so that the expression recognition model is trained, a more comprehensive expression recognition model is obtained, and the recognition capability of the model is improved.
In some embodiments, deriving the penalty value based on the post-mix vector by the uncertainty learning module includes:
performing loss calculation on the mixed vector and the label corresponding to the first image sample to obtain a first loss value, and performing loss calculation on the mixed vector and the label corresponding to the second image sample to obtain a second loss value;
a loss value is derived based on the first loss value and the second loss value.
Specifically, the loss calculation refers to calculating a prediction error of a model by comparing a difference between a model predicted value and an actual label, and updating parameters and weights of the model according to the error, and the loss calculation is one of core steps of training the model.
The loss calculation may use methods including cross entropy loss, mean square error loss, contrast loss, etc., and is not particularly limited herein.
Furthermore, when training a model, the loss calculation is an iterative process, where a loss function is calculated once per iteration, and parameters and weights of the model are updated according to the value of the loss function. Through continuous iteration and optimization, the prediction accuracy and robustness of the model can be gradually improved, and therefore a better model training effect is achieved.
It should be noted that, the loss calculation is only one link of machine learning and deep learning, and needs to be used in combination with other steps, such as feature extraction, model selection, super-parameter adjustment, etc., to achieve the best training effect.
According to the method, the first loss value and the second loss value are obtained through loss calculation of the mixed vector and the label corresponding to the image sample, and the loss value is obtained based on the first loss value and the second loss value and is used for updating the expression recognition model, so that the accuracy of the expression recognition model in an expression recognition task is improved.
Further, in some embodiments, deriving the loss value based on the first loss value and the second loss value includes:
and adding the first loss value and the second loss value to obtain a loss value.
According to the method and the device for obtaining the expression recognition model, the first loss value and the second loss value obtained through loss calculation are added to obtain the loss value, so that the expression recognition model can learn expression characteristics and uncertainty of more sample images, robustness and generalization of the model are improved, and accuracy and usability of the model in practical application are improved.
Additionally, in some embodiments, before acquiring the image to be identified, further comprises: pre-training the emotion recognition model through a face recognition training set, wherein the face recognition training set comprises an expression image and a label corresponding to the expression image.
In particular, pre-training refers to an unsupervised learning over a large data set, providing the model with initialization parameters or feature extractors that can help the model learn better about the features of the data set.
The facial recognition training set comprises a large number of facial feature classifications, and in order to improve the generalization capability and effect of the model, the facial recognition training set can be pre-trained on a large-scale face recognition data set before the facial expression recognition training, and the large-scale face recognition data set usually comprises a large number of face images, has rich visual information and diversity, and can help the model learn more general feature representation.
According to the embodiment, the model can better capture the characteristics of the face image and learn the characteristic representation which is more discriminant and generalizable by pre-training on a large-scale data set, so that the model can better adapt to the expression recognition task and has better generalization capability when the facial expression recognition training is carried out, and the new and unseen face image can be better processed.
Fig. 2 is a flowchart of another expression recognition method according to an embodiment of the present application, as shown in fig. 2, where the method includes:
first, a first expression feature vector and first uncertainty data of a first image sample are obtained through a backbone network, and a second expression feature vector and second uncertainty data of a second image sample are obtained.
The expression feature vector refers to a group of numeric vectors of the facial expression, contains semantic information and structural information of the facial expression, can be used for tasks such as facial expression recognition, emotion analysis and the like, uncertainty refers to the situation that uncertainty exists in understanding and deducing image content caused by factors such as blurring, noise, uncertainty of model prediction and the like in an image, a backbone network can extract high-level feature representation of the image, so that understanding and deducing of the image content are realized, and uncertainty data extracted by the backbone network and the expression feature vector have the same channel number, so that memory occupation of the data can be reduced, network structure is more symmetrical, robustness and stability of a network are improved, and image sample pairs share the same network structure and parameters, so that calculation amount and parameter quantity of the model can be effectively reduced, and generalization capability and efficiency of the model are improved. At the same time, since the image samples share the same network structure and parameters, the characteristic representation between the image samples is the same, and the similarity and the correlation between the images can be better utilized
Then, the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data are input into an uncertainty learning module, after the uncertainty learning module receives the related data, a third expression feature vector is obtained based on the first expression feature vector and the first uncertainty data, a fourth expression feature vector is obtained based on the second expression feature vector and the second uncertainty data, then the third expression feature vector and the fourth expression feature vector are subjected to mixed processing, the expression information in the two pictures is comprehensively utilized to train an expression recognition model, and the accuracy of expression recognition is further improved.
Finally, obtaining a loss value output by the uncertainty learning module, measuring the difference between the model prediction result and the real expression label, and using the loss value to update the expression recognition model to further improve the accuracy of expression recognition.
Fig. 3 is a schematic workflow diagram of an uncertainty learning module according to an embodiment of the present application, as shown in fig. 3, where the method includes:
firstly, the uncertainty of a first image sample and the uncertainty of a second image sample in an image sample pair are averaged according to channels, and the method for acquiring the uncertainty of the image samples can comprise an Entropy method (Entropy), a weighted average variance method (Weightedaverage variance) and the like, which are not particularly limited, so that the average uncertainty value of each channel can be obtained, the contribution degree of each channel to the overall uncertainty of the image is known, the robustness and the accuracy of an expression recognition model are improved, and the method has good recognition capability on images of different types.
And then normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data, wherein the normalization method can comprise minimum-maximum normalization, z-score normalization, mean variance normalization and the like, the normalization method is not particularly limited, and dimension and size differences among different data are eliminated, so that the comparison and analysis can be performed under the same dimension.
And secondly, multiplying the first expression feature vector and the first normalization data to obtain a third expression feature vector, multiplying the second expression feature vector and the second normalization data to obtain a fourth expression feature vector, and multiplying the different expression feature vectors element by element to obtain a new feature vector serving as a final recognition feature, wherein the importance among the different features can be adjusted, so that the recognition accuracy of the expression recognition model is improved.
And then, carrying out mixed processing on the third expression feature vector and the fourth expression feature vector, and comprehensively utilizing the expression information in the two pictures to train the expression recognition model so as to further improve the accuracy of expression recognition.
And finally, carrying out loss calculation on the mixed vector and the label corresponding to the first image sample to obtain a first loss value, carrying out loss calculation on the mixed vector and the label corresponding to the second image sample to obtain a second loss value, adding the first loss value and the second loss value to obtain a loss value, comparing the difference between the model predicted value and the actual label, calculating the predicted error of the model, updating parameters and weights of the model according to the error, so that the expression recognition model can learn expression features of more sample images, and the robustness and generalization of the model are improved, thereby improving the accuracy and the usability of the model in practical application.
Fig. 4 is a schematic diagram of an expression recognition device according to an embodiment of the present application. As shown in fig. 4, the expression recognition apparatus includes:
an acquisition module 401, configured to acquire an image to be identified;
the recognition module 402 is configured to recognize an expression of an image to be recognized through an expression recognition model obtained through training, and obtain an expression recognition result;
the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on the uncertainty of a first image sample and the uncertainty of a second image sample in the image sample pairs.
In some embodiments, the expression recognition model includes a backbone network module and an uncertainty learning module; the recognition module 402 is further configured to input the first image sample and the second image sample into the backbone network module, obtain a first expression feature vector and first uncertainty data of the first image sample output by the backbone network module, and obtain a second expression feature vector and second uncertainty data of the second image sample output by the backbone network module; inputting the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data into an uncertainty learning module to obtain a loss value output by the uncertainty learning module; and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.
In some embodiments, the recognition module 402 is specifically configured to obtain, by using the uncertainty learning module, a third expression feature vector based on the first expression feature vector and the first uncertainty data, obtain a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, and perform a mixing process on the third expression feature vector and the fourth expression feature vector to obtain a mixed vector; the loss value is obtained by an uncertainty learning module based on the mixed vector.
In some embodiments, the identification module 402 is specifically configured to average the first uncertainty data and the second uncertainty data according to an output channel of the backbone network module to obtain first overall uncertainty data of the first image sample and second overall uncertainty data of the second image sample; normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data; and multiplying the first expression feature vector and the first normalized data to obtain a third expression feature vector, and multiplying the second expression feature vector and the second normalized data to obtain a fourth expression feature vector.
In some embodiments, the identifying module 402 is specifically configured to perform a loss calculation on the label corresponding to the mixed vector and the first image sample to obtain a first loss value, and perform a loss calculation on the label corresponding to the mixed vector and the second image sample to obtain a second loss value; a loss value is derived based on the first loss value and the second loss value.
In some embodiments, the identification module 402 is specifically configured to add the first loss value and the second loss value to obtain the loss value.
In some embodiments, the obtaining module 401 is further configured to pretrain the emotion recognition model through a facial recognition training set, where the facial recognition training set includes an expression image and a label corresponding to the expression image.
The device provided by the embodiment of the application can realize all the method steps of the method embodiment and achieve the same technical effects, and is not described herein.
Fig. 5 is a schematic diagram of an electronic device 5 according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: a processor 501, a memory 502 and a computer program 503 stored in the memory 502 and executable on the processor 501. The steps of the various method embodiments described above are implemented by processor 501 when executing computer program 503. Alternatively, the processor 501, when executing the computer program 503, performs the functions of the modules/units in the above-described apparatus embodiments.
The electronic device 5 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 5 may include, but is not limited to, a processor 501 and a memory 502. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not limiting of the electronic device 5 and may include more or fewer components than shown, or different components.
The processor 501 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
The memory 502 may be an internal storage unit of the electronic device 5, for example, a hard disk or a memory of the electronic device 5. The memory 502 may also be an external storage device of the electronic device 5, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 5. Memory 502 may also include both internal storage units and external storage devices of electronic device 5. The memory 502 is used to store computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units may be stored in a readable storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a readable storage medium, where the computer program may implement the steps of the method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The readable storage medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. The content contained in the readable storage medium may be appropriately increased or decreased according to the requirements of the legislation and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. An expression recognition method, comprising:
acquiring an image to be identified;
the expression of the image to be identified is identified through the expression identification model obtained through training, and an expression identification result is obtained;
the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs.
2. The expression recognition method of claim 1, wherein the expression recognition model comprises a backbone network module and an uncertainty learning module;
the training-obtained expression recognition model recognizes the expression of the image to be recognized, and before obtaining the expression recognition result, the training-obtained expression recognition model further comprises:
inputting the first image sample and the second image sample into the backbone network module to obtain a first expression feature vector and first uncertainty data of the first image sample output by the backbone network module, and obtaining a second expression feature vector and second uncertainty data of the second image sample output by the backbone network module;
inputting the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data into the uncertainty learning module to obtain the loss value output by the uncertainty learning module;
and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.
3. The expression recognition method according to claim 2, wherein the inputting the first expression feature vector, first uncertainty data, second expression feature vector, and second uncertainty data into the uncertainty learning module, to obtain the loss value output by the uncertainty learning module, includes:
obtaining a third expression feature vector based on the first expression feature vector and the first uncertainty data through the uncertainty learning module, obtaining a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, and carrying out mixing processing on the third expression feature vector and the fourth expression feature vector to obtain a mixed vector;
and obtaining the loss value based on the mixed vector through the uncertainty learning module.
4. The expression recognition method of claim 3, wherein the obtaining, by the uncertainty learning module, a third expression feature vector based on the first expression feature vector and the first uncertainty data, and a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, comprises:
averaging the first uncertainty data and the second uncertainty data according to an output channel of a backbone network module to obtain first overall uncertainty data of the first image sample and second overall uncertainty data of the second image sample;
normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data;
and multiplying the first expression feature vector with the first normalization data to obtain the third expression feature vector, and multiplying the second expression feature vector with the second normalization data to obtain the fourth expression feature vector.
5. The expression recognition method of claim 3, wherein the deriving, by the uncertainty learning module, the loss value based on the post-mixing vector, comprises:
performing loss calculation on the mixed vector and the label corresponding to the first image sample to obtain a first loss value, and performing loss calculation on the mixed vector and the label corresponding to the second image sample to obtain a second loss value;
the loss value is derived based on the first loss value and the second loss value.
6. The expression recognition method of claim 5, wherein the deriving the loss value based on the first loss value and the second loss value comprises:
and adding the first loss value and the second loss value to obtain the loss value.
7. The expression recognition method according to claim 1, wherein before the image to be recognized is acquired, further comprising:
and pre-training the expression recognition model through a facial recognition training set, wherein the facial recognition training set comprises an expression image and a label corresponding to the expression image.
8. An expression recognition apparatus, characterized by comprising:
the acquisition module is used for acquiring the image to be identified;
the recognition module is used for recognizing the expression of the image to be recognized through the expression recognition model obtained through training to obtain an expression recognition result;
the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.
10. A readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
CN202311168658.9A 2023-09-12 2023-09-12 Expression recognition method and device, electronic equipment and readable storage medium Active CN116912921B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311168658.9A CN116912921B (en) 2023-09-12 2023-09-12 Expression recognition method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311168658.9A CN116912921B (en) 2023-09-12 2023-09-12 Expression recognition method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN116912921A true CN116912921A (en) 2023-10-20
CN116912921B CN116912921B (en) 2024-02-20

Family

ID=88367145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311168658.9A Active CN116912921B (en) 2023-09-12 2023-09-12 Expression recognition method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116912921B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401193A (en) * 2020-03-10 2020-07-10 海尔优家智能科技(北京)有限公司 Method and device for obtaining expression recognition model and expression recognition method and device
CN111539452A (en) * 2020-03-26 2020-08-14 深圳云天励飞技术有限公司 Image recognition method and device for multitask attributes, electronic equipment and storage medium
CN113222872A (en) * 2021-05-28 2021-08-06 平安科技(深圳)有限公司 Image processing method, image processing apparatus, electronic device, and medium
CN113239814A (en) * 2021-05-17 2021-08-10 平安科技(深圳)有限公司 Facial expression recognition method, device, equipment and medium based on optical flow reconstruction
CN114170654A (en) * 2021-11-26 2022-03-11 深圳数联天下智能科技有限公司 Training method of age identification model, face age identification method and related device
CN116206345A (en) * 2022-12-09 2023-06-02 支付宝(杭州)信息技术有限公司 Expression recognition model training method, expression recognition method, related device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401193A (en) * 2020-03-10 2020-07-10 海尔优家智能科技(北京)有限公司 Method and device for obtaining expression recognition model and expression recognition method and device
CN111539452A (en) * 2020-03-26 2020-08-14 深圳云天励飞技术有限公司 Image recognition method and device for multitask attributes, electronic equipment and storage medium
CN113239814A (en) * 2021-05-17 2021-08-10 平安科技(深圳)有限公司 Facial expression recognition method, device, equipment and medium based on optical flow reconstruction
CN113222872A (en) * 2021-05-28 2021-08-06 平安科技(深圳)有限公司 Image processing method, image processing apparatus, electronic device, and medium
CN114170654A (en) * 2021-11-26 2022-03-11 深圳数联天下智能科技有限公司 Training method of age identification model, face age identification method and related device
CN116206345A (en) * 2022-12-09 2023-06-02 支付宝(杭州)信息技术有限公司 Expression recognition model training method, expression recognition method, related device and medium

Also Published As

Publication number Publication date
CN116912921B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN111860573B (en) Model training method, image category detection method and device and electronic equipment
CN107944020B (en) Face image searching method and device, computer device and storage medium
CN110267119B (en) Video precision and chroma evaluation method and related equipment
CN106570464B (en) Face recognition method and device for rapidly processing face shielding
CN111582150B (en) Face quality assessment method, device and computer storage medium
JP2020522077A (en) Acquisition of image features
CN111783532B (en) Cross-age face recognition method based on online learning
CN109919252B (en) Method for generating classifier by using few labeled images
CN112016315B (en) Model training method, text recognition method, model training device, text recognition device, electronic equipment and storage medium
CN110532950B (en) Video feature extraction method and micro-expression identification method based on micro-expression video
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN111401105B (en) Video expression recognition method, device and equipment
CN110705600A (en) Cross-correlation entropy based multi-depth learning model fusion method, terminal device and readable storage medium
CN111680757A (en) Zero sample image recognition algorithm and system based on self-encoder
CN111401343B (en) Method for identifying attributes of people in image and training method and device for identification model
CN114241505A (en) Method and device for extracting chemical structure image, storage medium and electronic equipment
CN115984930A (en) Micro expression recognition method and device and micro expression recognition model training method
CN114118259A (en) Target detection method and device
Dong et al. A supervised dictionary learning and discriminative weighting model for action recognition
CN112183946A (en) Multimedia content evaluation method, device and training method thereof
CN116912921B (en) Expression recognition method and device, electronic equipment and readable storage medium
CN111242114A (en) Character recognition method and device
Song et al. Text Siamese network for video textual keyframe detection
CN115546554A (en) Sensitive image identification method, device, equipment and computer readable storage medium
Kanjanawattana et al. Deep Learning-Based Emotion Recognition through Facial Expressions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant