CN116912921A

CN116912921A - Expression recognition method and device, electronic equipment and readable storage medium

Info

Publication number: CN116912921A
Application number: CN202311168658.9A
Authority: CN
Inventors: 蒋召; 张星宇
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-10-20
Anticipated expiration: 2043-09-12
Also published as: CN116912921B

Abstract

The application relates to the field of artificial intelligence, and provides an expression recognition method, an expression recognition device, electronic equipment and a readable storage medium. The method comprises the following steps: acquiring an image to be identified; the expression of the image to be identified is identified through the expression identification model obtained through training, and an expression identification result is obtained; the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs. The embodiment of the application solves the problem of inaccurate recognition of the expression in the complex scene in the prior art.

Description

Expression recognition method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an expression recognition method, apparatus, electronic device, and readable storage medium.

Background

With the progress of science and technology, expression recognition in the field of artificial intelligence has been widely used, and in many fields, the application of expression recognition is seen, but there are some problems in practical application. For example, uncertainty in the expression recognition field includes data uncertainty and model uncertainty, the data uncertainty refers to the uncertainty of model learning that some samples in the expression recognition data set cannot judge the labels of the labeling personnel due to subjectivity of the labeling personnel, and the model uncertainty can be solved by adding more training data. For data uncertainty, the prior art can remark the data through an algorithm or manually, but the remark removes difficult samples in the data set, namely samples in complex scenes, and the effectiveness of model learning is reduced.

Therefore, the problem of inaccurate surface condition identification in a complex scene exists in the prior art.

Disclosure of Invention

In view of the above, the embodiments of the present application provide an expression recognition method, apparatus, electronic device, and readable storage medium, so as to solve the problem in the prior art that expression recognition is inaccurate in a complex scene.

In a first aspect of an embodiment of the present application, an expression recognition method is provided, including:

acquiring an image to be identified;

the expression of the image to be identified is identified through the expression identification model obtained through training, and an expression identification result is obtained;

the expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on uncertainty of a first image sample and uncertainty of a second image sample in the image sample pairs.

In a second aspect of an embodiment of the present application, there is provided an expression recognition apparatus, including:

the acquisition module is used for acquiring the image to be identified;

the recognition module is used for recognizing the expression of the image to be recognized through the expression recognition model obtained through training to obtain an expression recognition result;

In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

obtaining a loss value of the expression recognition model through the uncertainty of the first image sample and the uncertainty of the second image sample in the image sample pair, training the expression recognition model according to the loss value, and recognizing the expression of the image to be recognized by using the expression recognition model obtained through training to obtain an expression recognition result. Thus, the loss value is associated with the uncertainty of the first image sample and the uncertainty of the second image sample, so that the learning of the expression recognition model on the image uncertainty is realized, the recognition precision and generalization of the expression recognition model are improved, and the problem of inaccurate expression recognition under complex scenes in the prior art is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an expression recognition method according to an embodiment of the present application;

fig. 2 is a flowchart of another expression recognition method according to an embodiment of the present application;

FIG. 3 is a schematic workflow diagram of an uncertainty learning module provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of an expression recognition device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the application may be practiced otherwise than as specifically illustrated and described herein, and that the objects identified by "first," "second," etc. are generally of the same type and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

Furthermore, it should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

An expression recognition method, apparatus, electronic device, and readable storage medium according to embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an expression recognition method according to an embodiment of the present application. As shown in fig. 1, the expression recognition method includes:

step 101, obtaining an image to be identified;

the image to be identified is an image which needs to be identified by using the expression identification model.

The background scene of the image to be identified may be a complex background scene, for example, the background scene may include expression recognition under an environment with insufficient light or complex background, expression recognition under a multi-person scene, expression recognition under different race, gender, age, etc., expression recognition under a non-frontal face angle, expression recognition with rapid change, such as smiling, blinking, etc.

Step 102, recognizing the expression of the image to be recognized through the expression recognition model obtained through training, and obtaining an expression recognition result.

The expression recognition model is obtained based on training of a training set, the training set comprises a plurality of image sample pairs, and the loss value of the expression recognition model is obtained based on the uncertainty of a first image sample and the uncertainty of a second image sample in the image sample pairs.

An expression recognition model is an artificial intelligence model that can recognize and understand the emotion and emotion conveyed by human facial expressions, and infer the emotional state of a person by analyzing facial features and dynamic changes.

The training set comprises a plurality of image sample pairs, the image sample pairs are randomly selected to train the expression recognition model, and it is to be noted that the expression labels of the first image sample and the expression labels of the second image sample in the image sample pairs can be the same or different.

In order to improve the accuracy and generalization of the expression recognition model, the number and variety of image sample pairs should be as large as possible.

The loss value of the expression recognition model is used for measuring the difference between the model prediction result and the real expression label.

Uncertainty of an image sample refers to the situation where there is uncertainty in understanding and deducing the content of an image due to factors such as blurring, noise, uncertainty of model predictions, etc. present in the image. Uncertainty is prevalent in image processing because there are a variety of factors interfering and affecting the image, such as illumination, noise, occlusion, blurring, etc. These factors can affect the quality and content of the image, resulting in some uncertainty in the understanding and inference of the image by the model. The higher the uncertainty of the image sample, the greater the likelihood of uncertainty and errors in understanding and inferring the image content.

The method for acquiring the uncertainty of the image sample may include an Entropy method (Entropy), a weighted average variance method (Weighted average variance), and the like, and is not particularly limited herein.

The loss value of the expression recognition model is obtained through the uncertainty of the first image sample and the uncertainty of the second image sample in the image sample pair, the uncertainty of the image sample is considered in the training process of the expression recognition model, the influence of the data uncertainty on the expression recognition result is avoided, and the high accuracy and the high generalization of the expression recognition model are improved.

The method comprises the steps of obtaining an expression recognition result by recognizing the expression of an image to be recognized through the expression recognition model obtained through training, and enabling the image to be recognized more accurately by utilizing the high precision and the high generalization of the trained expression recognition model, so that the more accurate recognition result is obtained.

In this way, the embodiment obtains the loss value of the expression recognition model through the uncertainty of the first image sample and the uncertainty of the second image sample in the image sample pair, trains the expression recognition model according to the loss value, recognizes the expression of the image to be recognized by using the expression recognition model obtained through training to obtain the expression recognition result, effectively utilizes the capability of the expression recognition model after training, enables the expression recognition model to fully recognize the image in the complex scene, obtains a more accurate recognition result, and solves the problem of inaccurate expression recognition in the complex scene in the prior art.

In some embodiments, the expression recognition model includes a backbone network module and an uncertainty learning module;

the method comprises the steps of identifying the expression of an image to be identified through an expression identification model obtained through training, and before obtaining an expression identification result, further comprising:

inputting the first image sample and the second image sample into a backbone network module, obtaining a first expression feature vector and first uncertainty data of the first image sample output by the backbone network module, and obtaining a second expression feature vector and second uncertainty data of the second image sample output by the backbone network module;

inputting the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data into an uncertainty learning module to obtain a loss value output by the uncertainty learning module;

and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.

Specifically, a backbone network (backboneNet) refers to a backbone part used for extracting image features in a deep neural network, and the backbone network can be used for tasks such as image classification, object detection, semantic segmentation and the like, and is used for extracting high-level feature representations of images so as to realize understanding and deducing of image contents. The backbone network is generally composed of a plurality of convolution layers and pooling layers, which can effectively reduce the dimension and complexity of the image and extract the characteristic representation with semantic information.

The backbone network may use a network including Alexacten network (AlexNet), residual network (ResNet), dense connectivity network (DenseNet), etc., which is not particularly limited.

In addition, conventional feature extraction networks are typically composed of multiple convolutional layers and pooling layers, with the last layer being a fully connected layer for flattening the feature map (or tensor) into a one-dimensional vector and connecting into a classifier. This layer is commonly referred to as the fully connected layer, classification layer, or top layer. Using a backbone network as a feature extractor, the last layer is typically removed and the output before the last layer is passed as a feature vector to a classifier for classification tasks. The purpose of this is to separate the feature extraction capability from the classification capability of the network, so that the migration learning and application can be more conveniently performed.

It should be noted that, the uncertainty data and the expression feature vectors extracted by the backbone network have the same channel number, so that the memory occupation of the data can be reduced, the network structure is more symmetrical, the robustness and stability of the network are improved, and the image sample pairs share the same network structure and parameters, so that the calculation amount and the parameter number of the model can be effectively reduced, and the generalization capability and efficiency of the model are improved. Meanwhile, since the image samples share the same network structure and parameters, the characteristic representation among the image samples is the same, and the similarity and the correlation among the images can be better utilized.

The expression feature vector refers to a group of numerical vectors of the facial expression, is usually obtained by processing and analyzing a facial image, generally contains semantic information and structural information of the facial expression, and can be used for tasks such as facial expression recognition, emotion analysis and the like.

The uncertainty learning module can obtain a loss value of an image sample based on uncertainty data and expression feature vectors extracted by the backbone network and is used for training an expression recognition model.

When the loss value is smaller than or equal to the preset value, the expression recognition model is trained, the loss value on the training data set reaches the preset target value, the model learns enough characteristic representation and can be used for carrying out expression recognition on new and unseen face images, and the trained backbone network module and a classifier can be combined to be used as the expression recognition model after training. In addition, the magnitude of the preset value may be set according to actual conditions, and is not particularly limited herein.

According to the embodiment, the first image sample and the second image sample are input into the backbone network module, the backbone network outputs the first expression feature vector and the first uncertainty data of the first image sample, and outputs the second expression feature vector and the second uncertainty data of the second image sample, and the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data are input into the uncertainty learning module to obtain the loss value output by the uncertainty learning module, so that the uncertainty of the image is considered in the training process of the expression recognition model, and the accuracy of the expression recognition model in recognizing the complex image sample is improved.

Further, in some embodiments, inputting the first expression feature vector, the first uncertainty data, the second expression feature vector, and the second uncertainty data into the uncertainty learning module, resulting in a loss value output by the uncertainty learning module, comprising:

obtaining a third expression feature vector based on the first expression feature vector and the first uncertainty data through an uncertainty learning module, obtaining a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, and carrying out mixing processing on the third expression feature vector and the fourth expression feature vector to obtain a mixed vector;

the loss value is obtained by an uncertainty learning module based on the mixed vector.

Specifically, the third expression feature vector is obtained from the first expression feature vector and the first uncertainty data of the first image sample, and the fourth expression feature vector is obtained from the second expression feature vector and the second uncertainty data of the second image sample.

The purpose of the mixing is to comprehensively utilize expression information in two pictures, and further improve accuracy of expression recognition, and the mixing method can include a linear mixing method, a nonlinear mixing method, a style migration method and the like, and is not limited herein.

In addition, when the expression features of the two pictures are mixed, the respective expression features can be mixed according to a certain weight in a weighted average mode. The weight distribution may be determined according to different situations, for example, expression metric values of two pictures, definition and brightness of the pictures, and these factors may affect the accuracy of the blending result. The expression information of the two pictures can be more effectively integrated by reasonably weighting and distributing according to the factors, and the recognition accuracy is improved.

And obtaining a loss value through the sample image of the mixed vector, and updating the expression recognition model.

According to the embodiment, the third expression feature vector and the fourth expression feature vector are obtained through the uncertainty learning module, the third expression feature vector and the fourth expression feature vector are mixed, a new sample image with more expression features is obtained, a loss value is obtained through the sample image, the expression information of the two pictures is effectively integrated to be used for training the expression recognition model, and the recognition precision of the model is improved.

In addition, in some embodiments, obtaining, by the uncertainty learning module, a third expression feature vector based on the first expression feature vector and the first uncertainty data, and obtaining, based on the second expression feature vector and the second uncertainty data, a fourth expression feature vector includes:

averaging the first uncertainty data and the second uncertainty data according to an output channel of the backbone network module to obtain first overall uncertainty data of the first image sample and second overall uncertainty data of the second image sample;

normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data;

and multiplying the first expression feature vector with the first normalization data to obtain a third expression feature vector, and multiplying the second expression feature vector with the second normalization data to obtain a fourth expression feature vector.

Specifically, by averaging the uncertainties according to the channels, an average uncertainty value of each channel can be obtained, so that the contribution degree of each channel to the overall uncertainty of the image, namely, which features or parts are more important to the prediction result and which parts have higher prediction uncertainty, is known.

For example, parameters of the feature extractor may be adjusted according to the average uncertainty values of different channels to improve the accuracy of extracting the surface features.

According to the embodiment, the first uncertainty data and the second uncertainty data are averaged according to the channel to obtain the first overall uncertainty data and the second overall uncertainty data, and the first overall uncertainty data and the second overall uncertainty data are used for helping to improve the robustness and the accuracy of the expression recognition model, so that the method has good recognition capability on images of different types.

Normalization refers to scaling data according to a certain proportion so that the data falls into a specific interval, and the comparability between different data are achieved, so that the data analysis effect is improved.

The normalization method may include a min-max normalization, a normalization (z-score) normalization, a mean variance normalization, etc., and is not specifically limited herein.

For example, the method uses Z-Score to normalize the uncertainty data, and the method converts the uncertainty data into a standard normal distribution with a mean value of 0 and a standard deviation of 1, so that all the uncertainty data is uniformly changed into a distribution with 0 as a center, and the data can be conveniently compared and analyzed.

For another example, assume that the uncertainty of the first image sample isThe uncertainty of the second image sample is +.>The first normalized data of the normalized first image sample is: />

The second normalized data of the normalized second image sample is:。

in the embodiment, the first overall uncertainty data and the second overall uncertainty data are normalized to obtain the first normalized data and the second normalized data, so that dimension and size differences among different data are eliminated, and comparison and analysis can be performed under the same scale.

The embodiment can multiply different expression feature vectors element by element to obtain a new feature vector as a final recognition feature. The multiplication method may use a method including simple phase multiplication, linear phase multiplication, adaptive weighting method, and the like, and is not particularly limited herein.

The normalized data and the expression feature vectors are multiplied, and the importance among different features can be adjusted, so that the accuracy of expression recognition model recognition is improved.

For example, in an expression recognition task, expression features of certain regions may contribute more to recognition results, while expression features of certain regions may contribute less to recognition results. The expression features of different areas can be weighted by multiplying the normalized uncertainty and the facial expression features, so that the recognition accuracy is improved.

It should be noted that, the feature weighting method needs to be adjusted and optimized according to the specific application scenario and the data set. Under different data sets and application situations, important characteristics may be different, and adjustment is required according to actual situations. Meanwhile, the feature weighting method is also required to be used in combination with other feature selection and extraction methods so as to achieve the optimal expression recognition effect.

In this embodiment, the first expression feature vector and the first normalized data are multiplied to obtain a third expression feature vector, and the second expression feature vector and the second normalized data are multiplied to obtain a fourth expression feature vector. The important expression feature vectors are given greater weight by weighting the importance degrees of the expression feature vectors, so that the expression recognition model is trained, a more comprehensive expression recognition model is obtained, and the recognition capability of the model is improved.

In some embodiments, deriving the penalty value based on the post-mix vector by the uncertainty learning module includes:

performing loss calculation on the mixed vector and the label corresponding to the first image sample to obtain a first loss value, and performing loss calculation on the mixed vector and the label corresponding to the second image sample to obtain a second loss value;

a loss value is derived based on the first loss value and the second loss value.

Specifically, the loss calculation refers to calculating a prediction error of a model by comparing a difference between a model predicted value and an actual label, and updating parameters and weights of the model according to the error, and the loss calculation is one of core steps of training the model.

The loss calculation may use methods including cross entropy loss, mean square error loss, contrast loss, etc., and is not particularly limited herein.

Furthermore, when training a model, the loss calculation is an iterative process, where a loss function is calculated once per iteration, and parameters and weights of the model are updated according to the value of the loss function. Through continuous iteration and optimization, the prediction accuracy and robustness of the model can be gradually improved, and therefore a better model training effect is achieved.

It should be noted that, the loss calculation is only one link of machine learning and deep learning, and needs to be used in combination with other steps, such as feature extraction, model selection, super-parameter adjustment, etc., to achieve the best training effect.

According to the method, the first loss value and the second loss value are obtained through loss calculation of the mixed vector and the label corresponding to the image sample, and the loss value is obtained based on the first loss value and the second loss value and is used for updating the expression recognition model, so that the accuracy of the expression recognition model in an expression recognition task is improved.

Further, in some embodiments, deriving the loss value based on the first loss value and the second loss value includes:

and adding the first loss value and the second loss value to obtain a loss value.

According to the method and the device for obtaining the expression recognition model, the first loss value and the second loss value obtained through loss calculation are added to obtain the loss value, so that the expression recognition model can learn expression characteristics and uncertainty of more sample images, robustness and generalization of the model are improved, and accuracy and usability of the model in practical application are improved.

Additionally, in some embodiments, before acquiring the image to be identified, further comprises: pre-training the emotion recognition model through a face recognition training set, wherein the face recognition training set comprises an expression image and a label corresponding to the expression image.

In particular, pre-training refers to an unsupervised learning over a large data set, providing the model with initialization parameters or feature extractors that can help the model learn better about the features of the data set.

The facial recognition training set comprises a large number of facial feature classifications, and in order to improve the generalization capability and effect of the model, the facial recognition training set can be pre-trained on a large-scale face recognition data set before the facial expression recognition training, and the large-scale face recognition data set usually comprises a large number of face images, has rich visual information and diversity, and can help the model learn more general feature representation.

According to the embodiment, the model can better capture the characteristics of the face image and learn the characteristic representation which is more discriminant and generalizable by pre-training on a large-scale data set, so that the model can better adapt to the expression recognition task and has better generalization capability when the facial expression recognition training is carried out, and the new and unseen face image can be better processed.

Fig. 2 is a flowchart of another expression recognition method according to an embodiment of the present application, as shown in fig. 2, where the method includes:

first, a first expression feature vector and first uncertainty data of a first image sample are obtained through a backbone network, and a second expression feature vector and second uncertainty data of a second image sample are obtained.

The expression feature vector refers to a group of numeric vectors of the facial expression, contains semantic information and structural information of the facial expression, can be used for tasks such as facial expression recognition, emotion analysis and the like, uncertainty refers to the situation that uncertainty exists in understanding and deducing image content caused by factors such as blurring, noise, uncertainty of model prediction and the like in an image, a backbone network can extract high-level feature representation of the image, so that understanding and deducing of the image content are realized, and uncertainty data extracted by the backbone network and the expression feature vector have the same channel number, so that memory occupation of the data can be reduced, network structure is more symmetrical, robustness and stability of a network are improved, and image sample pairs share the same network structure and parameters, so that calculation amount and parameter quantity of the model can be effectively reduced, and generalization capability and efficiency of the model are improved. At the same time, since the image samples share the same network structure and parameters, the characteristic representation between the image samples is the same, and the similarity and the correlation between the images can be better utilized

Then, the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data are input into an uncertainty learning module, after the uncertainty learning module receives the related data, a third expression feature vector is obtained based on the first expression feature vector and the first uncertainty data, a fourth expression feature vector is obtained based on the second expression feature vector and the second uncertainty data, then the third expression feature vector and the fourth expression feature vector are subjected to mixed processing, the expression information in the two pictures is comprehensively utilized to train an expression recognition model, and the accuracy of expression recognition is further improved.

Finally, obtaining a loss value output by the uncertainty learning module, measuring the difference between the model prediction result and the real expression label, and using the loss value to update the expression recognition model to further improve the accuracy of expression recognition.

Fig. 3 is a schematic workflow diagram of an uncertainty learning module according to an embodiment of the present application, as shown in fig. 3, where the method includes:

firstly, the uncertainty of a first image sample and the uncertainty of a second image sample in an image sample pair are averaged according to channels, and the method for acquiring the uncertainty of the image samples can comprise an Entropy method (Entropy), a weighted average variance method (Weightedaverage variance) and the like, which are not particularly limited, so that the average uncertainty value of each channel can be obtained, the contribution degree of each channel to the overall uncertainty of the image is known, the robustness and the accuracy of an expression recognition model are improved, and the method has good recognition capability on images of different types.

And then normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data, wherein the normalization method can comprise minimum-maximum normalization, z-score normalization, mean variance normalization and the like, the normalization method is not particularly limited, and dimension and size differences among different data are eliminated, so that the comparison and analysis can be performed under the same dimension.

And secondly, multiplying the first expression feature vector and the first normalization data to obtain a third expression feature vector, multiplying the second expression feature vector and the second normalization data to obtain a fourth expression feature vector, and multiplying the different expression feature vectors element by element to obtain a new feature vector serving as a final recognition feature, wherein the importance among the different features can be adjusted, so that the recognition accuracy of the expression recognition model is improved.

And then, carrying out mixed processing on the third expression feature vector and the fourth expression feature vector, and comprehensively utilizing the expression information in the two pictures to train the expression recognition model so as to further improve the accuracy of expression recognition.

And finally, carrying out loss calculation on the mixed vector and the label corresponding to the first image sample to obtain a first loss value, carrying out loss calculation on the mixed vector and the label corresponding to the second image sample to obtain a second loss value, adding the first loss value and the second loss value to obtain a loss value, comparing the difference between the model predicted value and the actual label, calculating the predicted error of the model, updating parameters and weights of the model according to the error, so that the expression recognition model can learn expression features of more sample images, and the robustness and generalization of the model are improved, thereby improving the accuracy and the usability of the model in practical application.

Fig. 4 is a schematic diagram of an expression recognition device according to an embodiment of the present application. As shown in fig. 4, the expression recognition apparatus includes:

an acquisition module 401, configured to acquire an image to be identified;

the recognition module 402 is configured to recognize an expression of an image to be recognized through an expression recognition model obtained through training, and obtain an expression recognition result;

In some embodiments, the expression recognition model includes a backbone network module and an uncertainty learning module; the recognition module 402 is further configured to input the first image sample and the second image sample into the backbone network module, obtain a first expression feature vector and first uncertainty data of the first image sample output by the backbone network module, and obtain a second expression feature vector and second uncertainty data of the second image sample output by the backbone network module; inputting the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data into an uncertainty learning module to obtain a loss value output by the uncertainty learning module; and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.

In some embodiments, the recognition module 402 is specifically configured to obtain, by using the uncertainty learning module, a third expression feature vector based on the first expression feature vector and the first uncertainty data, obtain a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, and perform a mixing process on the third expression feature vector and the fourth expression feature vector to obtain a mixed vector; the loss value is obtained by an uncertainty learning module based on the mixed vector.

In some embodiments, the identification module 402 is specifically configured to average the first uncertainty data and the second uncertainty data according to an output channel of the backbone network module to obtain first overall uncertainty data of the first image sample and second overall uncertainty data of the second image sample; normalizing the first overall uncertainty data and the second overall uncertainty data to obtain first normalized data and second normalized data; and multiplying the first expression feature vector and the first normalized data to obtain a third expression feature vector, and multiplying the second expression feature vector and the second normalized data to obtain a fourth expression feature vector.

In some embodiments, the identifying module 402 is specifically configured to perform a loss calculation on the label corresponding to the mixed vector and the first image sample to obtain a first loss value, and perform a loss calculation on the label corresponding to the mixed vector and the second image sample to obtain a second loss value; a loss value is derived based on the first loss value and the second loss value.

In some embodiments, the identification module 402 is specifically configured to add the first loss value and the second loss value to obtain the loss value.

In some embodiments, the obtaining module 401 is further configured to pretrain the emotion recognition model through a facial recognition training set, where the facial recognition training set includes an expression image and a label corresponding to the expression image.

The device provided by the embodiment of the application can realize all the method steps of the method embodiment and achieve the same technical effects, and is not described herein.

Fig. 5 is a schematic diagram of an electronic device 5 according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: a processor 501, a memory 502 and a computer program 503 stored in the memory 502 and executable on the processor 501. The steps of the various method embodiments described above are implemented by processor 501 when executing computer program 503. Alternatively, the processor 501, when executing the computer program 503, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 5 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 5 may include, but is not limited to, a processor 501 and a memory 502. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not limiting of the electronic device 5 and may include more or fewer components than shown, or different components.

The processor 501 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 502 may be an internal storage unit of the electronic device 5, for example, a hard disk or a memory of the electronic device 5. The memory 502 may also be an external storage device of the electronic device 5, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 5. Memory 502 may also include both internal storage units and external storage devices of electronic device 5. The memory 502 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units may be stored in a readable storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a readable storage medium, where the computer program may implement the steps of the method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The readable storage medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. The content contained in the readable storage medium may be appropriately increased or decreased according to the requirements of the legislation and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. An expression recognition method, comprising:

acquiring an image to be identified;

2. The expression recognition method of claim 1, wherein the expression recognition model comprises a backbone network module and an uncertainty learning module;

the training-obtained expression recognition model recognizes the expression of the image to be recognized, and before obtaining the expression recognition result, the training-obtained expression recognition model further comprises:

inputting the first image sample and the second image sample into the backbone network module to obtain a first expression feature vector and first uncertainty data of the first image sample output by the backbone network module, and obtaining a second expression feature vector and second uncertainty data of the second image sample output by the backbone network module;

inputting the first expression feature vector, the first uncertainty data, the second expression feature vector and the second uncertainty data into the uncertainty learning module to obtain the loss value output by the uncertainty learning module;

3. The expression recognition method according to claim 2, wherein the inputting the first expression feature vector, first uncertainty data, second expression feature vector, and second uncertainty data into the uncertainty learning module, to obtain the loss value output by the uncertainty learning module, includes:

obtaining a third expression feature vector based on the first expression feature vector and the first uncertainty data through the uncertainty learning module, obtaining a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, and carrying out mixing processing on the third expression feature vector and the fourth expression feature vector to obtain a mixed vector;

and obtaining the loss value based on the mixed vector through the uncertainty learning module.

4. The expression recognition method of claim 3, wherein the obtaining, by the uncertainty learning module, a third expression feature vector based on the first expression feature vector and the first uncertainty data, and a fourth expression feature vector based on the second expression feature vector and the second uncertainty data, comprises:

averaging the first uncertainty data and the second uncertainty data according to an output channel of a backbone network module to obtain first overall uncertainty data of the first image sample and second overall uncertainty data of the second image sample;

and multiplying the first expression feature vector with the first normalization data to obtain the third expression feature vector, and multiplying the second expression feature vector with the second normalization data to obtain the fourth expression feature vector.

5. The expression recognition method of claim 3, wherein the deriving, by the uncertainty learning module, the loss value based on the post-mixing vector, comprises:

the loss value is derived based on the first loss value and the second loss value.

6. The expression recognition method of claim 5, wherein the deriving the loss value based on the first loss value and the second loss value comprises:

and adding the first loss value and the second loss value to obtain the loss value.

7. The expression recognition method according to claim 1, wherein before the image to be recognized is acquired, further comprising:

and pre-training the expression recognition model through a facial recognition training set, wherein the facial recognition training set comprises an expression image and a label corresponding to the expression image.

8. An expression recognition apparatus, characterized by comprising:

the acquisition module is used for acquiring the image to be identified;

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.