CN112613416A

CN112613416A - Facial expression recognition method and related device

Info

Publication number: CN112613416A
Application number: CN202011569124.3A
Authority: CN
Inventors: 高磊
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2020-12-26
Filing date: 2020-12-26
Publication date: 2021-04-06

Abstract

The application provides a facial expression recognition method and a related device, which are used for acquiring texture features, geometric features and semantic features corresponding to a facial image, and then respectively coding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. By decoding the fused features obtained after fusion, depth information related to facial expression recognition in the facial image is further explored, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the facial expression recognition degree is improved.

Description

Facial expression recognition method and related device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a facial expression recognition method and a related device.

Background

Facial expression recognition belongs to the category of emotion recognition, and refers to the assignment of an associated emotion category to a given facial image, including happiness, sadness, fear, surprise, anger, or disgust.

The facial expression recognition is an important research content in the field of computer vision, and with the development of artificial intelligence research, the facial expression recognition has a wide application prospect in the fields of artificial interaction, network monitoring, virtual reality and the like. The human face expression is a photo of human mental activities, so that the human face expression is accurately recognized, the human emotion is accurately analyzed, and the human-computer interaction experience is improved.

However, the related facial expression recognition method has poor interference resistance, is susceptible to environmental influences, such as illumination, shading, skin color, facial form and the like, and is prone to facial expression unrecognizable or misidentified.

Disclosure of Invention

In order to solve the technical problems in the related art, the application provides a facial expression recognition method and a related device, which reduce the influence of the environment and improve the accuracy of facial expression recognition.

In one aspect, an embodiment of the present application provides a facial expression recognition method, where the method includes:

acquiring texture features, geometric features and semantic features corresponding to the face image;

respectively encoding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features;

fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image;

decoding the fusion features to obtain fourth features corresponding to the face image;

and determining the facial expression corresponding to the facial image according to the fourth characteristic.

In a possible implementation manner, the encoding the texture feature, the geometric feature, and the semantic feature respectively to obtain a first feature corresponding to the texture feature, where the second feature corresponding to the geometric feature and the third feature corresponding to the semantic feature include:

respectively encoding the texture features, the geometric features and the semantic features by using an encoding module in a self-encoding neural network model to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features;

the fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image comprises:

fusing the first feature, the second feature and the third feature by utilizing a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;

the decoding the fusion feature to obtain a fourth feature corresponding to the face image includes:

and decoding the fusion features by using a decoding module of the self-coding neural network model to obtain fourth features of the face image.

In one possible implementation, the method further includes:

acquiring a face image to be recognized and a block image corresponding to the face image; the block images are used for identifying human face parts included in the human face images;

acquiring the texture features and the geometric features corresponding to the face images according to the block images;

and acquiring semantic features corresponding to the face image according to the face image.

In one possible implementation, the block image is obtained by:

carrying out feature point positioning on the face image to obtain a plurality of feature points of a face included in the face image;

and dividing the face image according to the positioning identifications corresponding to the plurality of feature points to obtain the block images corresponding to the face image.

In one possible implementation, the self-coding neural network model is trained according to the following method:

acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;

training the self-coding neural network model by using the training samples;

in the training process, determining the sample facial expression of the sample facial image based on the output of the self-coding neural network model; and adjusting parameters of the self-coding neural network model according to the sample facial expression and the facial expression label.

On the other hand, an embodiment of the present application provides a facial expression recognition apparatus, where the apparatus includes an obtaining unit, a coding unit, a fusing unit, a decoding unit, and a determining unit:

the acquisition unit is used for acquiring texture features, geometric features and semantic features corresponding to the face image;

the encoding unit is configured to encode the texture feature, the geometric feature and the semantic feature respectively to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature;

the fusion unit is configured to fuse the first feature, the second feature and the third feature to obtain a fusion feature corresponding to the face image;

the decoding unit is used for decoding the fusion features to obtain fourth features corresponding to the face image;

and the determining unit is used for determining the facial expression corresponding to the facial image according to the fourth feature.

In a possible implementation manner, the encoding unit is configured to encode the texture feature, the geometric feature, and the semantic feature respectively by using an encoding module in a self-encoding neural network model, so as to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;

the fusion unit is used for fusing the first feature, the second feature and the third feature by utilizing a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;

and the decoding unit is used for decoding the fusion features by using a decoding module of the self-coding neural network model to obtain fourth features of the face image.

In a possible implementation manner, the obtaining unit is further configured to

In a possible implementation manner, the apparatus further includes a positioning unit and a dividing unit;

the positioning unit is used for positioning the characteristic points of the face image to obtain a plurality of characteristic points of the face included in the face image;

and the positioning unit is used for dividing the face image according to the positioning identifiers corresponding to the plurality of feature points to obtain the block images corresponding to the face image.

In one possible implementation, the apparatus further includes a training unit:

the training unit is used for acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;

training the self-coding neural network model by using the training samples;

According to the technical scheme, the texture features, the geometric features and the semantic features corresponding to the face image are obtained, and then the texture features, the geometric features and the semantic features are respectively coded to obtain the first features corresponding to the texture features, the second features corresponding to the geometric features and the third features corresponding to the semantic features. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of a facial expression recognition method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a face image preprocessing provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a face block image according to an embodiment of the present application;

fig. 4 is a schematic flow chart of texture feature extraction according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of semantic feature extraction according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a self-coding neural network model according to an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of another facial expression recognition method according to an embodiment of the present application;

fig. 8 is a schematic flowchart of another facial expression recognition method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of another facial expression recognition apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The recognition accuracy of the facial expression is affected by the shooting environment of the facial image and the state of the object, for example, the object in the facial image is in an environment with weak light, or the object is provided with a hat, a mask, and the like. In order to improve the accuracy of facial expression recognition, the embodiment of the application provides a facial expression recognition method and a related device.

The facial expression recognition method provided by the application can be applied to facial expression recognition equipment with data processing capacity, such as terminal equipment and a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

For convenience of understanding, the facial expression recognition method provided by the embodiment of the present application is described below with a terminal device having an image capturing function as a facial expression recognition device.

Referring to fig. 1, fig. 1 is a schematic flow chart of a facial expression recognition method according to an embodiment of the present application. As shown in fig. 1, the facial expression recognition method includes the following steps:

s101: and acquiring texture features, geometric features and semantic features corresponding to the face image.

The face image can be shot in real time by the shooting device for the object to be recognized, and the face image corresponding to the object to be recognized is obtained, for example, in a man-machine interaction scene, the terminal device shoots the face image of the object to be recognized through the front camera, and therefore the facial expression of the object to be recognized is recognized based on the face image. The face image may also be an image to be recognized that is pre-stored in a database, and may be determined specifically according to an actual scene, which is not limited herein.

In the related facial expression recognition technology, the extracted features are single, facial expression information cannot be completely extracted, and therefore the facial expression recognition efficiency is low and the accuracy is low.

In view of this, in the embodiment of the present application, the facial image is subjected to expression feature extraction from three different dimensions, and texture features, geometric features, and semantic features corresponding to the facial image are respectively obtained. The expression feature extraction is to extract data representing expression features from the face image through a correlation algorithm.

The texture features are used for identifying the texture information of local human faces in the human face image, such as frowning and wrinkles. The geometric features are used for identifying shape information of the face image, such as size, angle and the like. The semantic features are used for identifying attribute information of a face structure included in the face image, such as eyes, a nose, a mouth and the like, and the face image is described through image information, particularly high-level information.

The texture, the geometric and the semantic features are extracted from the face image, the facial expression information is extracted in three different aspects, and compared with a single feature, the facial expression information is more fully and perfectly utilized, abundant data are provided for subsequently recognizing the facial expression according to the three features, and therefore the recognition degree of the facial expression is improved.

For the above texture features, geometric features and semantic features corresponding to the obtained face image, in a possible implementation manner, a face image of an object to be recognized and a block image corresponding to the face image are obtained first, then, feature extraction is performed on the block image to obtain texture features and geometric features corresponding to the face image, and feature extraction is performed on the face image to obtain semantic features corresponding to the face image.

The embodiment of the application provides a method for blocking a face image, namely, feature point positioning is carried out on the face image to obtain a plurality of feature points of a face included in the face image, and then the face image is divided according to positioning marks corresponding to the plurality of feature points to obtain a blocking image corresponding to the face image.

In practical application, the collected face image may be preprocessed, that is, the face image is converted from RGB space to gray scale space, and then the size of the face image is normalized to obtain a normalized face gray scale image. Then, feature points in the face grayscale image may be located by using an Active Appearance Model (AAM), and each feature point is marked by a location identifier. The number of the feature points may be set according to an actual scene, and the positioning identifier may be a number, a code, or the like, which is not limited herein. For example, 68 feature points of a human face are marked with numerical numbers. In the specific marking process, the marking can be carried out according to the structural characteristics of the human face. For example, the numbers 0 to 16 are used for marking the face contour, and the numbers 17 to 26 are used for marking the eyebrow. And then, connecting the marked feature points according to the shapes of all parts of the human face to form a plurality of closed irregular polygon feature blocks as block images.

For better understanding, the image processing procedure described above is described below with reference to fig. 2. As shown in fig. 2, the image processing process includes a model preparation process and an image blocking process. Wherein the image segmentation process is based on a model preparation process.

In the model preparation phase, the AAM model is established, the AAM model is trained, and the AAM model is generated. And in the image blocking stage, AAM model matching, face detection and model matching are included. Fig. 3 is a schematic diagram of a face image subjected to the image processing shown in fig. 2 to obtain a block image.

Above-mentioned through location people's face characteristic point, carry out the piecemeal to the face image, compare in traditional grid formula piecemeal, can not only carry out better analysis to people's face local information, also more accord with the victory structure of people's face, the deformation information of every position of seizure people's face that can be better, laid good basis for subsequent characteristic extraction.

In the embodiment of the application, the texture feature, the geometric feature and the semantic feature are expression feature vectors, and the expression feature vectors are one-dimensional vectors formed by data capable of effectively expressing the facial expression.

The extraction of texture features may adopt Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG), local direction numerical pattern (LDN), or the like. As shown in fig. 4, a pixel difference local directional number pattern (PD-LDN) is used to extract the texture features of the block images, and the features are connected and counted to form a feature histogram, and a vector formed by the histogram data is used as the texture feature vector of the face image.

The positions of the facial feature points vary somewhat between different expressions, and the degree of variation in the positions of the feature points varies for different expressions. The position change of the feature points can cause the size and the shape of each block image to be changed considerably, so that the moment features of different expressions can be greatly different for the same block image.

Therefore, the geometric feature extraction process of the present application may extract n geometric moments as the geometric features of the block for each block image, where n is an integer greater than or equal to 1. In the embodiment of the present application, n-7, i.e., a seventh moment feature, is taken to identify the shape and size change of each block image. The seventh moment feature reflects the difference of geometric information among different expressions, and has good recognition degree and representation force.

In the semantic Feature extraction process, a face expression key region can be selected for a face image, then, a Dense Scale-Invariant Feature Transform (DSIFT) algorithm is used for semantic Feature extraction, that is, k-means clustering is performed on DSIFT features of the key region to generate a dictionary, and a statistical histogram is obtained as semantic features of the face image, as shown in fig. 5.

It should be noted that, the above described possible implementation manners for obtaining the texture feature, the geometric feature and the semantic feature corresponding to the face image, which are provided by the embodiments of the present application, may be set according to an actual scene in an actual application, and are not limited herein.

S102: and respectively coding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features.

Because the related facial expression recognition technology mainly directly adopts the features obtained by extracting the facial image, the facial expression is recognized, and the feature recognition degree is lower.

Therefore, the application provides a possible implementation manner, that is, the texture features, the geometric features and the semantic features obtained in the step S101 are subjected to deep fusion, so that the accuracy of facial expression recognition is improved.

Firstly, the texture feature, the geometric feature and the semantic feature can be respectively encoded to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced.

S103: and fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image.

And then, fusing the first feature, the second feature and the third feature to obtain a fusion feature corresponding to the face image, wherein the fusion feature identifies texture information, geometric information and semantic information included in the face image.

The fusion features are fused with the features of three different dimensions of the face image, and the advantages and the disadvantages of the different features are complemented, so that the face expression is identified based on the fusion features, a better expression identification rate can be kept in different scenes, and the performance of resisting external factor interference is improved.

S104: and decoding the fusion features to obtain a fourth feature corresponding to the face image.

Because the face image is encoded and compressed through the S102, the fused feature needs to be decoded correspondingly, and the fourth feature corresponding to the face image is obtained.

The process of encoding, fusing and decoding the features realizes the deep fusion of shallow features, namely the feature deep fusion process, strengthens the relation among texture features, geometric features and semantic features in the face image, enriches the data on which face recognition is based, and provides deeper information for subsequently recognizing facial expressions by using the decoded fourth feature.

S105: and determining the facial expression corresponding to the facial image according to the fourth characteristic.

In practical applications, according to the fourth feature, the classifier is used to determine a facial expression corresponding to the facial image, such as happiness, anger, sadness, and the like.

The reconstruction of the shallow feature on three dimensions of the face image is realized by fusing the three coded features. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.

For the process of recognizing the facial expressions by using the textural features, the geometric features and the semantic features, the embodiment of the application provides a three-channel self-coding neural network model, which comprises a coding module, a fusion module and a decoding module.

For a better understanding, reference is made to fig. 6 below. As shown in fig. 6, the self-coding neural network model includes three layers, i.e., an input layer, a hidden layer, and an output layer.

In the application process, the texture feature x is input_pGeometric feature x_gAnd semantic features x_bRespectively encoding the first and second characteristics by using an encoding module to obtain a first characteristic h_pSecond characteristic h_gAnd a third feature h_b. Then, the first feature h is subjected to fusion by using a fusion module_pSecond characteristic h_gAnd a third feature h_bFusing to obtain a fused feature, and decoding the fused feature by using a decoding module to obtain a fourth feature h_fAnd inputting the facial expression into a classifier softmax, and determining the expression category corresponding to the facial image. In this embodiment, the encoding module is a matrix for encoding the features, such as the encoding matrix W shown in fig. 6_p、W_gAnd W_b(ii) a The fusion module and the decoding module are matrices for fusing and decoding features, such as a fusion decoding matrix W shown in FIG. 6_f。

Therefore, the facial expression recognition method provided by the embodiment of the application is realized based on the artificial intelligence technology, and particularly relates to deep learning in artificial intelligence. Initial model parameters of the self-coding neural network model may be adjusted using the training samples.

In the application process, a training sample corresponding to the self-coding neural network model can be obtained, and the training sample comprises a sample face image and a face expression label corresponding to the sample face image. Wherein the facial expression labels identify expression categories of the sample facial images. Then, the self-coding neural network model is trained by using the training samples.

In the training process, the sample facial expression of the sample facial image is determined based on the output of the self-coding neural network model, and then the parameters of the self-coding neural network model are adjusted according to the loss between the sample facial expression and the facial expression label.

The facial expression recognition method provided in the above embodiment obtains texture features, geometric features and semantic features corresponding to the face image, and then encodes the texture features, the geometric features and the semantic features respectively to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.

In order to better understand the facial expression recognition method provided by the above embodiment, an application process of the facial expression recognition method is described below with reference to fig. 7 and 8.

Firstly, a face image, namely an expression image, is preprocessed to obtain a block image. Then, texture feature extraction and geometric feature extraction are performed on the block image, and semantic feature extraction is performed on the face image, that is, the PD-LDN feature and the seventh moment are extracted and the semantic feature is extracted by using the bag-of-words model shown in fig. 8. And then, respectively encoding and fusing texture features, geometric features and semantic features, decoding the fused features, namely performing feature deep fusion processing, and finally determining expression categories of the face image, such as happiness, surprise, sadness and the like by using a classifier.

By extracting the three characteristics of the face image and performing characteristic depth fusion on the three shallow layer characteristics, the interference with the information which cannot be obtained by the face expression is reduced, the deep information which is associated with different dimensionality characteristics is added, and the anti-interference performance and the face expression identification degree are further improved.

Aiming at the facial expression recognition method provided by the embodiment, the embodiment of the application also provides a facial expression recognition device.

Referring to fig. 9, fig. 9 is a facial expression recognition apparatus according to an embodiment of the present application. As shown in fig. 9, the facial expression recognition apparatus 900 includes an acquisition unit 901, an encoding unit 902, a fusion unit 903, a decoding unit 904, and a determination unit 905:

the acquiring unit 901 is configured to acquire texture features, geometric features and semantic features corresponding to the face image;

the encoding unit 902 is configured to encode the texture feature, the geometric feature, and the semantic feature respectively to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;

the fusion unit 903 is configured to fuse the first feature, the second feature, and the third feature to obtain a fusion feature corresponding to the face image;

the decoding unit 904 is configured to decode the fusion feature to obtain a fourth feature corresponding to the face image;

the determining unit 905 is configured to determine, according to the fourth feature, a facial expression corresponding to the facial image.

In a possible implementation manner, the encoding unit 902 is configured to encode the texture feature, the geometric feature, and the semantic feature respectively by using an encoding module in a self-encoding neural network model, so as to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;

the fusion unit 903 is configured to fuse the first feature, the second feature, and the third feature by using a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;

the decoding unit 904 is configured to decode the fusion feature by using a decoding module of the self-coding neural network model, so as to obtain a fourth feature of the face image.

In a possible implementation manner, the obtaining unit 901 is further configured to

In one possible implementation, the apparatus further includes a training unit:

training the self-coding neural network model by using the training samples;

The facial expression recognition device provided in the above embodiment obtains the texture feature, the geometric feature and the semantic feature corresponding to the face image, and then encodes the texture feature, the geometric feature and the semantic feature respectively to obtain the first feature corresponding to the texture feature, the second feature corresponding to the geometric feature and the third feature corresponding to the semantic feature. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.

It will be understood by those skilled in the art that all or part of the steps of implementing the above method embodiments may be implemented by hardware associated with program instructions, and that the program may be stored in a computer readable storage medium, and when executed, performs the steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.

It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A facial expression recognition method, the method comprising:

2. The method according to claim 1, wherein the encoding the texture feature, the geometric feature and the semantic feature respectively obtains a first feature corresponding to the texture feature, and the second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature include:

3. The method of claim 1, further comprising:

4. The method of claim 3, wherein the block image is obtained by:

5. The method of claim 2, wherein the self-coding neural network model is trained according to:

training the self-coding neural network model by using the training samples;

6. A facial expression recognition device is characterized by comprising an acquisition unit, a coding unit, a fusion unit, a decoding unit and a determination unit:

7. The apparatus according to claim 6, wherein the encoding unit is configured to encode the texture feature, the geometric feature, and the semantic feature respectively by using an encoding module in a self-encoding neural network model, so as to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;

8. The apparatus of claim 6, wherein the obtaining unit is further configured to obtain the data from the database system

9. The apparatus of claim 8, further comprising a positioning unit and a dividing unit;

10. The apparatus of claim 6, further comprising a training unit:

training the self-coding neural network model by using the training samples;