CN112613416A - Facial expression recognition method and related device - Google Patents

Facial expression recognition method and related device Download PDF

Info

Publication number
CN112613416A
CN112613416A CN202011569124.3A CN202011569124A CN112613416A CN 112613416 A CN112613416 A CN 112613416A CN 202011569124 A CN202011569124 A CN 202011569124A CN 112613416 A CN112613416 A CN 112613416A
Authority
CN
China
Prior art keywords
feature
features
face image
facial expression
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011569124.3A
Other languages
Chinese (zh)
Inventor
高磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202011569124.3A priority Critical patent/CN112613416A/en
Publication of CN112613416A publication Critical patent/CN112613416A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application provides a facial expression recognition method and a related device, which are used for acquiring texture features, geometric features and semantic features corresponding to a facial image, and then respectively coding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. By decoding the fused features obtained after fusion, depth information related to facial expression recognition in the facial image is further explored, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the facial expression recognition degree is improved.

Description

Facial expression recognition method and related device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a facial expression recognition method and a related device.
Background
Facial expression recognition belongs to the category of emotion recognition, and refers to the assignment of an associated emotion category to a given facial image, including happiness, sadness, fear, surprise, anger, or disgust.
The facial expression recognition is an important research content in the field of computer vision, and with the development of artificial intelligence research, the facial expression recognition has a wide application prospect in the fields of artificial interaction, network monitoring, virtual reality and the like. The human face expression is a photo of human mental activities, so that the human face expression is accurately recognized, the human emotion is accurately analyzed, and the human-computer interaction experience is improved.
However, the related facial expression recognition method has poor interference resistance, is susceptible to environmental influences, such as illumination, shading, skin color, facial form and the like, and is prone to facial expression unrecognizable or misidentified.
Disclosure of Invention
In order to solve the technical problems in the related art, the application provides a facial expression recognition method and a related device, which reduce the influence of the environment and improve the accuracy of facial expression recognition.
In one aspect, an embodiment of the present application provides a facial expression recognition method, where the method includes:
acquiring texture features, geometric features and semantic features corresponding to the face image;
respectively encoding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features;
fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image;
decoding the fusion features to obtain fourth features corresponding to the face image;
and determining the facial expression corresponding to the facial image according to the fourth characteristic.
In a possible implementation manner, the encoding the texture feature, the geometric feature, and the semantic feature respectively to obtain a first feature corresponding to the texture feature, where the second feature corresponding to the geometric feature and the third feature corresponding to the semantic feature include:
respectively encoding the texture features, the geometric features and the semantic features by using an encoding module in a self-encoding neural network model to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features;
the fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image comprises:
fusing the first feature, the second feature and the third feature by utilizing a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;
the decoding the fusion feature to obtain a fourth feature corresponding to the face image includes:
and decoding the fusion features by using a decoding module of the self-coding neural network model to obtain fourth features of the face image.
In one possible implementation, the method further includes:
acquiring a face image to be recognized and a block image corresponding to the face image; the block images are used for identifying human face parts included in the human face images;
acquiring the texture features and the geometric features corresponding to the face images according to the block images;
and acquiring semantic features corresponding to the face image according to the face image.
In one possible implementation, the block image is obtained by:
carrying out feature point positioning on the face image to obtain a plurality of feature points of a face included in the face image;
and dividing the face image according to the positioning identifications corresponding to the plurality of feature points to obtain the block images corresponding to the face image.
In one possible implementation, the self-coding neural network model is trained according to the following method:
acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;
training the self-coding neural network model by using the training samples;
in the training process, determining the sample facial expression of the sample facial image based on the output of the self-coding neural network model; and adjusting parameters of the self-coding neural network model according to the sample facial expression and the facial expression label.
On the other hand, an embodiment of the present application provides a facial expression recognition apparatus, where the apparatus includes an obtaining unit, a coding unit, a fusing unit, a decoding unit, and a determining unit:
the acquisition unit is used for acquiring texture features, geometric features and semantic features corresponding to the face image;
the encoding unit is configured to encode the texture feature, the geometric feature and the semantic feature respectively to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature;
the fusion unit is configured to fuse the first feature, the second feature and the third feature to obtain a fusion feature corresponding to the face image;
the decoding unit is used for decoding the fusion features to obtain fourth features corresponding to the face image;
and the determining unit is used for determining the facial expression corresponding to the facial image according to the fourth feature.
In a possible implementation manner, the encoding unit is configured to encode the texture feature, the geometric feature, and the semantic feature respectively by using an encoding module in a self-encoding neural network model, so as to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;
the fusion unit is used for fusing the first feature, the second feature and the third feature by utilizing a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;
and the decoding unit is used for decoding the fusion features by using a decoding module of the self-coding neural network model to obtain fourth features of the face image.
In a possible implementation manner, the obtaining unit is further configured to
Acquiring a face image to be recognized and a block image corresponding to the face image; the block images are used for identifying human face parts included in the human face images;
acquiring the texture features and the geometric features corresponding to the face images according to the block images;
and acquiring semantic features corresponding to the face image according to the face image.
In a possible implementation manner, the apparatus further includes a positioning unit and a dividing unit;
the positioning unit is used for positioning the characteristic points of the face image to obtain a plurality of characteristic points of the face included in the face image;
and the positioning unit is used for dividing the face image according to the positioning identifiers corresponding to the plurality of feature points to obtain the block images corresponding to the face image.
In one possible implementation, the apparatus further includes a training unit:
the training unit is used for acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;
training the self-coding neural network model by using the training samples;
in the training process, determining the sample facial expression of the sample facial image based on the output of the self-coding neural network model; and adjusting parameters of the self-coding neural network model according to the sample facial expression and the facial expression label.
According to the technical scheme, the texture features, the geometric features and the semantic features corresponding to the face image are obtained, and then the texture features, the geometric features and the semantic features are respectively coded to obtain the first features corresponding to the texture features, the second features corresponding to the geometric features and the third features corresponding to the semantic features. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a facial expression recognition method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a face image preprocessing provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a face block image according to an embodiment of the present application;
fig. 4 is a schematic flow chart of texture feature extraction according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of semantic feature extraction according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a self-coding neural network model according to an embodiment of the present disclosure;
fig. 7 is a schematic flowchart of another facial expression recognition method according to an embodiment of the present application;
fig. 8 is a schematic flowchart of another facial expression recognition method according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of another facial expression recognition apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The recognition accuracy of the facial expression is affected by the shooting environment of the facial image and the state of the object, for example, the object in the facial image is in an environment with weak light, or the object is provided with a hat, a mask, and the like. In order to improve the accuracy of facial expression recognition, the embodiment of the application provides a facial expression recognition method and a related device.
The facial expression recognition method provided by the application can be applied to facial expression recognition equipment with data processing capacity, such as terminal equipment and a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
For convenience of understanding, the facial expression recognition method provided by the embodiment of the present application is described below with a terminal device having an image capturing function as a facial expression recognition device.
Referring to fig. 1, fig. 1 is a schematic flow chart of a facial expression recognition method according to an embodiment of the present application. As shown in fig. 1, the facial expression recognition method includes the following steps:
s101: and acquiring texture features, geometric features and semantic features corresponding to the face image.
The face image can be shot in real time by the shooting device for the object to be recognized, and the face image corresponding to the object to be recognized is obtained, for example, in a man-machine interaction scene, the terminal device shoots the face image of the object to be recognized through the front camera, and therefore the facial expression of the object to be recognized is recognized based on the face image. The face image may also be an image to be recognized that is pre-stored in a database, and may be determined specifically according to an actual scene, which is not limited herein.
In the related facial expression recognition technology, the extracted features are single, facial expression information cannot be completely extracted, and therefore the facial expression recognition efficiency is low and the accuracy is low.
In view of this, in the embodiment of the present application, the facial image is subjected to expression feature extraction from three different dimensions, and texture features, geometric features, and semantic features corresponding to the facial image are respectively obtained. The expression feature extraction is to extract data representing expression features from the face image through a correlation algorithm.
The texture features are used for identifying the texture information of local human faces in the human face image, such as frowning and wrinkles. The geometric features are used for identifying shape information of the face image, such as size, angle and the like. The semantic features are used for identifying attribute information of a face structure included in the face image, such as eyes, a nose, a mouth and the like, and the face image is described through image information, particularly high-level information.
The texture, the geometric and the semantic features are extracted from the face image, the facial expression information is extracted in three different aspects, and compared with a single feature, the facial expression information is more fully and perfectly utilized, abundant data are provided for subsequently recognizing the facial expression according to the three features, and therefore the recognition degree of the facial expression is improved.
For the above texture features, geometric features and semantic features corresponding to the obtained face image, in a possible implementation manner, a face image of an object to be recognized and a block image corresponding to the face image are obtained first, then, feature extraction is performed on the block image to obtain texture features and geometric features corresponding to the face image, and feature extraction is performed on the face image to obtain semantic features corresponding to the face image.
The embodiment of the application provides a method for blocking a face image, namely, feature point positioning is carried out on the face image to obtain a plurality of feature points of a face included in the face image, and then the face image is divided according to positioning marks corresponding to the plurality of feature points to obtain a blocking image corresponding to the face image.
In practical application, the collected face image may be preprocessed, that is, the face image is converted from RGB space to gray scale space, and then the size of the face image is normalized to obtain a normalized face gray scale image. Then, feature points in the face grayscale image may be located by using an Active Appearance Model (AAM), and each feature point is marked by a location identifier. The number of the feature points may be set according to an actual scene, and the positioning identifier may be a number, a code, or the like, which is not limited herein. For example, 68 feature points of a human face are marked with numerical numbers. In the specific marking process, the marking can be carried out according to the structural characteristics of the human face. For example, the numbers 0 to 16 are used for marking the face contour, and the numbers 17 to 26 are used for marking the eyebrow. And then, connecting the marked feature points according to the shapes of all parts of the human face to form a plurality of closed irregular polygon feature blocks as block images.
For better understanding, the image processing procedure described above is described below with reference to fig. 2. As shown in fig. 2, the image processing process includes a model preparation process and an image blocking process. Wherein the image segmentation process is based on a model preparation process.
In the model preparation phase, the AAM model is established, the AAM model is trained, and the AAM model is generated. And in the image blocking stage, AAM model matching, face detection and model matching are included. Fig. 3 is a schematic diagram of a face image subjected to the image processing shown in fig. 2 to obtain a block image.
Above-mentioned through location people's face characteristic point, carry out the piecemeal to the face image, compare in traditional grid formula piecemeal, can not only carry out better analysis to people's face local information, also more accord with the victory structure of people's face, the deformation information of every position of seizure people's face that can be better, laid good basis for subsequent characteristic extraction.
In the embodiment of the application, the texture feature, the geometric feature and the semantic feature are expression feature vectors, and the expression feature vectors are one-dimensional vectors formed by data capable of effectively expressing the facial expression.
The extraction of texture features may adopt Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG), local direction numerical pattern (LDN), or the like. As shown in fig. 4, a pixel difference local directional number pattern (PD-LDN) is used to extract the texture features of the block images, and the features are connected and counted to form a feature histogram, and a vector formed by the histogram data is used as the texture feature vector of the face image.
The positions of the facial feature points vary somewhat between different expressions, and the degree of variation in the positions of the feature points varies for different expressions. The position change of the feature points can cause the size and the shape of each block image to be changed considerably, so that the moment features of different expressions can be greatly different for the same block image.
Therefore, the geometric feature extraction process of the present application may extract n geometric moments as the geometric features of the block for each block image, where n is an integer greater than or equal to 1. In the embodiment of the present application, n-7, i.e., a seventh moment feature, is taken to identify the shape and size change of each block image. The seventh moment feature reflects the difference of geometric information among different expressions, and has good recognition degree and representation force.
In the semantic Feature extraction process, a face expression key region can be selected for a face image, then, a Dense Scale-Invariant Feature Transform (DSIFT) algorithm is used for semantic Feature extraction, that is, k-means clustering is performed on DSIFT features of the key region to generate a dictionary, and a statistical histogram is obtained as semantic features of the face image, as shown in fig. 5.
It should be noted that, the above described possible implementation manners for obtaining the texture feature, the geometric feature and the semantic feature corresponding to the face image, which are provided by the embodiments of the present application, may be set according to an actual scene in an actual application, and are not limited herein.
S102: and respectively coding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features.
Because the related facial expression recognition technology mainly directly adopts the features obtained by extracting the facial image, the facial expression is recognized, and the feature recognition degree is lower.
Therefore, the application provides a possible implementation manner, that is, the texture features, the geometric features and the semantic features obtained in the step S101 are subjected to deep fusion, so that the accuracy of facial expression recognition is improved.
Firstly, the texture feature, the geometric feature and the semantic feature can be respectively encoded to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced.
S103: and fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image.
And then, fusing the first feature, the second feature and the third feature to obtain a fusion feature corresponding to the face image, wherein the fusion feature identifies texture information, geometric information and semantic information included in the face image.
The fusion features are fused with the features of three different dimensions of the face image, and the advantages and the disadvantages of the different features are complemented, so that the face expression is identified based on the fusion features, a better expression identification rate can be kept in different scenes, and the performance of resisting external factor interference is improved.
S104: and decoding the fusion features to obtain a fourth feature corresponding to the face image.
Because the face image is encoded and compressed through the S102, the fused feature needs to be decoded correspondingly, and the fourth feature corresponding to the face image is obtained.
The process of encoding, fusing and decoding the features realizes the deep fusion of shallow features, namely the feature deep fusion process, strengthens the relation among texture features, geometric features and semantic features in the face image, enriches the data on which face recognition is based, and provides deeper information for subsequently recognizing facial expressions by using the decoded fourth feature.
S105: and determining the facial expression corresponding to the facial image according to the fourth characteristic.
In practical applications, according to the fourth feature, the classifier is used to determine a facial expression corresponding to the facial image, such as happiness, anger, sadness, and the like.
The reconstruction of the shallow feature on three dimensions of the face image is realized by fusing the three coded features. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.
For the process of recognizing the facial expressions by using the textural features, the geometric features and the semantic features, the embodiment of the application provides a three-channel self-coding neural network model, which comprises a coding module, a fusion module and a decoding module.
For a better understanding, reference is made to fig. 6 below. As shown in fig. 6, the self-coding neural network model includes three layers, i.e., an input layer, a hidden layer, and an output layer.
In the application process, the texture feature x is inputpGeometric feature xgAnd semantic features xbRespectively encoding the first and second characteristics by using an encoding module to obtain a first characteristic hpSecond characteristic hgAnd a third feature hb. Then, the first feature h is subjected to fusion by using a fusion modulepSecond characteristic hgAnd a third feature hbFusing to obtain a fused feature, and decoding the fused feature by using a decoding module to obtain a fourth feature hfAnd inputting the facial expression into a classifier softmax, and determining the expression category corresponding to the facial image. In this embodiment, the encoding module is a matrix for encoding the features, such as the encoding matrix W shown in fig. 6p、WgAnd Wb(ii) a The fusion module and the decoding module are matrices for fusing and decoding features, such as a fusion decoding matrix W shown in FIG. 6f
Therefore, the facial expression recognition method provided by the embodiment of the application is realized based on the artificial intelligence technology, and particularly relates to deep learning in artificial intelligence. Initial model parameters of the self-coding neural network model may be adjusted using the training samples.
In the application process, a training sample corresponding to the self-coding neural network model can be obtained, and the training sample comprises a sample face image and a face expression label corresponding to the sample face image. Wherein the facial expression labels identify expression categories of the sample facial images. Then, the self-coding neural network model is trained by using the training samples.
In the training process, the sample facial expression of the sample facial image is determined based on the output of the self-coding neural network model, and then the parameters of the self-coding neural network model are adjusted according to the loss between the sample facial expression and the facial expression label.
The facial expression recognition method provided in the above embodiment obtains texture features, geometric features and semantic features corresponding to the face image, and then encodes the texture features, the geometric features and the semantic features respectively to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.
In order to better understand the facial expression recognition method provided by the above embodiment, an application process of the facial expression recognition method is described below with reference to fig. 7 and 8.
Firstly, a face image, namely an expression image, is preprocessed to obtain a block image. Then, texture feature extraction and geometric feature extraction are performed on the block image, and semantic feature extraction is performed on the face image, that is, the PD-LDN feature and the seventh moment are extracted and the semantic feature is extracted by using the bag-of-words model shown in fig. 8. And then, respectively encoding and fusing texture features, geometric features and semantic features, decoding the fused features, namely performing feature deep fusion processing, and finally determining expression categories of the face image, such as happiness, surprise, sadness and the like by using a classifier.
By extracting the three characteristics of the face image and performing characteristic depth fusion on the three shallow layer characteristics, the interference with the information which cannot be obtained by the face expression is reduced, the deep information which is associated with different dimensionality characteristics is added, and the anti-interference performance and the face expression identification degree are further improved.
Aiming at the facial expression recognition method provided by the embodiment, the embodiment of the application also provides a facial expression recognition device.
Referring to fig. 9, fig. 9 is a facial expression recognition apparatus according to an embodiment of the present application. As shown in fig. 9, the facial expression recognition apparatus 900 includes an acquisition unit 901, an encoding unit 902, a fusion unit 903, a decoding unit 904, and a determination unit 905:
the acquiring unit 901 is configured to acquire texture features, geometric features and semantic features corresponding to the face image;
the encoding unit 902 is configured to encode the texture feature, the geometric feature, and the semantic feature respectively to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;
the fusion unit 903 is configured to fuse the first feature, the second feature, and the third feature to obtain a fusion feature corresponding to the face image;
the decoding unit 904 is configured to decode the fusion feature to obtain a fourth feature corresponding to the face image;
the determining unit 905 is configured to determine, according to the fourth feature, a facial expression corresponding to the facial image.
In a possible implementation manner, the encoding unit 902 is configured to encode the texture feature, the geometric feature, and the semantic feature respectively by using an encoding module in a self-encoding neural network model, so as to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;
the fusion unit 903 is configured to fuse the first feature, the second feature, and the third feature by using a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;
the decoding unit 904 is configured to decode the fusion feature by using a decoding module of the self-coding neural network model, so as to obtain a fourth feature of the face image.
In a possible implementation manner, the obtaining unit 901 is further configured to
Acquiring a face image to be recognized and a block image corresponding to the face image; the block images are used for identifying human face parts included in the human face images;
acquiring the texture features and the geometric features corresponding to the face images according to the block images;
and acquiring semantic features corresponding to the face image according to the face image.
In a possible implementation manner, the apparatus further includes a positioning unit and a dividing unit;
the positioning unit is used for positioning the characteristic points of the face image to obtain a plurality of characteristic points of the face included in the face image;
and the positioning unit is used for dividing the face image according to the positioning identifiers corresponding to the plurality of feature points to obtain the block images corresponding to the face image.
In one possible implementation, the apparatus further includes a training unit:
the training unit is used for acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;
training the self-coding neural network model by using the training samples;
in the training process, determining the sample facial expression of the sample facial image based on the output of the self-coding neural network model; and adjusting parameters of the self-coding neural network model according to the sample facial expression and the facial expression label.
The facial expression recognition device provided in the above embodiment obtains the texture feature, the geometric feature and the semantic feature corresponding to the face image, and then encodes the texture feature, the geometric feature and the semantic feature respectively to obtain the first feature corresponding to the texture feature, the second feature corresponding to the geometric feature and the third feature corresponding to the semantic feature. Because the data is subjected to lossy compression in the encoding process, encoding the features is equivalent to filtering the data which is irrelevant to facial expression recognition in the features, and the influence of interference information on the facial expression recognition is reduced. And then, the first feature, the second feature and the third feature are fused, so that the reconstruction of the shallow feature in three dimensions of the face image is realized. The depth information related to facial expression recognition in the facial image is further explored by decoding the fusion features obtained after fusion, so that the facial expression corresponding to the facial image is determined according to the decoded fourth features, and the accuracy of facial expression recognition is improved.
It will be understood by those skilled in the art that all or part of the steps of implementing the above method embodiments may be implemented by hardware associated with program instructions, and that the program may be stored in a computer readable storage medium, and when executed, performs the steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A facial expression recognition method, the method comprising:
acquiring texture features, geometric features and semantic features corresponding to the face image;
respectively encoding the texture features, the geometric features and the semantic features to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features;
fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image;
decoding the fusion features to obtain fourth features corresponding to the face image;
and determining the facial expression corresponding to the facial image according to the fourth characteristic.
2. The method according to claim 1, wherein the encoding the texture feature, the geometric feature and the semantic feature respectively obtains a first feature corresponding to the texture feature, and the second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature include:
respectively encoding the texture features, the geometric features and the semantic features by using an encoding module in a self-encoding neural network model to obtain first features corresponding to the texture features, second features corresponding to the geometric features and third features corresponding to the semantic features;
the fusing the first feature, the second feature and the third feature to obtain a fused feature corresponding to the face image comprises:
fusing the first feature, the second feature and the third feature by utilizing a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;
the decoding the fusion feature to obtain a fourth feature corresponding to the face image includes:
and decoding the fusion features by using a decoding module of the self-coding neural network model to obtain fourth features of the face image.
3. The method of claim 1, further comprising:
acquiring a face image to be recognized and a block image corresponding to the face image; the block images are used for identifying human face parts included in the human face images;
acquiring the texture features and the geometric features corresponding to the face images according to the block images;
and acquiring semantic features corresponding to the face image according to the face image.
4. The method of claim 3, wherein the block image is obtained by:
carrying out feature point positioning on the face image to obtain a plurality of feature points of a face included in the face image;
and dividing the face image according to the positioning identifications corresponding to the plurality of feature points to obtain the block images corresponding to the face image.
5. The method of claim 2, wherein the self-coding neural network model is trained according to:
acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;
training the self-coding neural network model by using the training samples;
in the training process, determining the sample facial expression of the sample facial image based on the output of the self-coding neural network model; and adjusting parameters of the self-coding neural network model according to the sample facial expression and the facial expression label.
6. A facial expression recognition device is characterized by comprising an acquisition unit, a coding unit, a fusion unit, a decoding unit and a determination unit:
the acquisition unit is used for acquiring texture features, geometric features and semantic features corresponding to the face image;
the encoding unit is configured to encode the texture feature, the geometric feature and the semantic feature respectively to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature and a third feature corresponding to the semantic feature;
the fusion unit is configured to fuse the first feature, the second feature and the third feature to obtain a fusion feature corresponding to the face image;
the decoding unit is used for decoding the fusion features to obtain fourth features corresponding to the face image;
and the determining unit is used for determining the facial expression corresponding to the facial image according to the fourth feature.
7. The apparatus according to claim 6, wherein the encoding unit is configured to encode the texture feature, the geometric feature, and the semantic feature respectively by using an encoding module in a self-encoding neural network model, so as to obtain a first feature corresponding to the texture feature, a second feature corresponding to the geometric feature, and a third feature corresponding to the semantic feature;
the fusion unit is used for fusing the first feature, the second feature and the third feature by utilizing a fusion module in the self-coding neural network model to obtain a fusion feature corresponding to the face image;
and the decoding unit is used for decoding the fusion features by using a decoding module of the self-coding neural network model to obtain fourth features of the face image.
8. The apparatus of claim 6, wherein the obtaining unit is further configured to obtain the data from the database system
Acquiring a face image to be recognized and a block image corresponding to the face image; the block images are used for identifying human face parts included in the human face images;
acquiring the texture features and the geometric features corresponding to the face images according to the block images;
and acquiring semantic features corresponding to the face image according to the face image.
9. The apparatus of claim 8, further comprising a positioning unit and a dividing unit;
the positioning unit is used for positioning the characteristic points of the face image to obtain a plurality of characteristic points of the face included in the face image;
and the positioning unit is used for dividing the face image according to the positioning identifiers corresponding to the plurality of feature points to obtain the block images corresponding to the face image.
10. The apparatus of claim 6, further comprising a training unit:
the training unit is used for acquiring a training sample corresponding to the self-coding neural network model; the training sample comprises a sample face image and a face expression label corresponding to the sample face image;
training the self-coding neural network model by using the training samples;
in the training process, determining the sample facial expression of the sample facial image based on the output of the self-coding neural network model; and adjusting parameters of the self-coding neural network model according to the sample facial expression and the facial expression label.
CN202011569124.3A 2020-12-26 2020-12-26 Facial expression recognition method and related device Pending CN112613416A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011569124.3A CN112613416A (en) 2020-12-26 2020-12-26 Facial expression recognition method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011569124.3A CN112613416A (en) 2020-12-26 2020-12-26 Facial expression recognition method and related device

Publications (1)

Publication Number Publication Date
CN112613416A true CN112613416A (en) 2021-04-06

Family

ID=75248146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011569124.3A Pending CN112613416A (en) 2020-12-26 2020-12-26 Facial expression recognition method and related device

Country Status (1)

Country Link
CN (1) CN112613416A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420747A (en) * 2021-08-25 2021-09-21 成方金融科技有限公司 Face recognition method and device, electronic equipment and storage medium
CN113517064A (en) * 2021-04-14 2021-10-19 华南师范大学 Depression degree evaluation method, system, device and storage medium
CN113657197A (en) * 2021-07-27 2021-11-16 浙江大华技术股份有限公司 Image recognition method, training method of image recognition model and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101770649A (en) * 2008-12-30 2010-07-07 中国科学院自动化研究所 Automatic synthesis method for facial image
CN107644209A (en) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 Method for detecting human face and device
CN107729835A (en) * 2017-10-10 2018-02-23 浙江大学 A kind of expression recognition method based on face key point region traditional characteristic and face global depth Fusion Features
CN109446980A (en) * 2018-10-25 2019-03-08 华中师范大学 Expression recognition method and device
CN109753950A (en) * 2019-02-11 2019-05-14 河北工业大学 Dynamic human face expression recognition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101770649A (en) * 2008-12-30 2010-07-07 中国科学院自动化研究所 Automatic synthesis method for facial image
CN107644209A (en) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 Method for detecting human face and device
CN107729835A (en) * 2017-10-10 2018-02-23 浙江大学 A kind of expression recognition method based on face key point region traditional characteristic and face global depth Fusion Features
CN109446980A (en) * 2018-10-25 2019-03-08 华中师范大学 Expression recognition method and device
CN109753950A (en) * 2019-02-11 2019-05-14 河北工业大学 Dynamic human face expression recognition method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113517064A (en) * 2021-04-14 2021-10-19 华南师范大学 Depression degree evaluation method, system, device and storage medium
CN113657197A (en) * 2021-07-27 2021-11-16 浙江大华技术股份有限公司 Image recognition method, training method of image recognition model and related device
CN113420747A (en) * 2021-08-25 2021-09-21 成方金融科技有限公司 Face recognition method and device, electronic equipment and storage medium
CN113420747B (en) * 2021-08-25 2021-11-23 成方金融科技有限公司 Face recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Yuan et al. Fingerprint liveness detection using an improved CNN with image scale equalization
CN112613416A (en) Facial expression recognition method and related device
CN107330408B (en) Video processing method and device, electronic equipment and storage medium
CN109657554B (en) Image identification method and device based on micro expression and related equipment
CN112784763B (en) Expression recognition method and system based on local and overall feature adaptive fusion
CN107145842A (en) With reference to LBP characteristic patterns and the face identification method of convolutional neural networks
CN107333071A (en) Video processing method and device, electronic equipment and storage medium
CN110555896B (en) Image generation method and device and storage medium
CN111753802B (en) Identification method and device
CN110738153B (en) Heterogeneous face image conversion method and device, electronic equipment and storage medium
CN113705290A (en) Image processing method, image processing device, computer equipment and storage medium
CN107911643B (en) Method and device for showing scene special effect in video communication
CN112801054B (en) Face recognition model processing method, face recognition method and device
WO2024109374A1 (en) Training method and apparatus for face swapping model, and device, storage medium and program product
CN111160264A (en) Cartoon figure identity recognition method based on generation of confrontation network
CN111178130A (en) Face recognition method, system and readable storage medium based on deep learning
CN112836589A (en) Method for recognizing facial expressions in video based on feature fusion
CN113642481A (en) Recognition method, training method, device, electronic equipment and storage medium
CN112906520A (en) Gesture coding-based action recognition method and device
CN111353385B (en) Pedestrian re-identification method and device based on mask alignment and attention mechanism
CN114218543A (en) Encryption and unlocking system and method based on multi-scene expression recognition
CN108174141A (en) A kind of method of video communication and a kind of mobile device
CN116386102A (en) Face emotion recognition method based on improved residual convolution network acceptance block structure
CN112070744B (en) Face recognition method, system, device and readable storage medium
CN113706550A (en) Image scene recognition and model training method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination