CN111368663A

CN111368663A - Method, device, medium and equipment for recognizing static facial expressions in natural scene

Info

Publication number: CN111368663A
Application number: CN202010115562.6A
Authority: CN
Inventors: 朱亮; 邢晓芬; 徐向民; 郭锴凌; 晋建秀
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-07-03
Anticipated expiration: 2040-02-25
Also published as: CN111368663B

Abstract

The invention provides a method, a device, a medium and equipment for identifying static facial expressions in a natural scene; the method comprises a pretreatment process, a global feature extraction process, a local feature extraction process and a feature fusion process which are sequentially executed; the preprocessing procedure is used for removing the background of a natural scene in the picture and only reserving a face area; the global feature extraction process is to adopt a convolutional neural network to extract the features of the whole picture after the preprocessing process and convert the features into feature vectors; the local feature extraction process is to extract a local area with the most information quantity of the picture by a target detection method and convert the local area into a feature vector; the feature fusion process is to integrate the feature vectors extracted in the global feature extraction process and the feature vectors extracted in the local feature extraction process into a feature matrix, and then obtain the facial expression probability through a full connection layer and softmax. The facial expression recognition method is high in accuracy rate, high in recognition efficiency and good in recognition effect.

Description

Method, device, medium and equipment for recognizing static facial expressions in natural scene

Technical Field

The invention relates to the technical field of computer vision, in particular to a method, a device, a medium and equipment for recognizing static facial expressions in natural scenes.

Background

With the development of artificial intelligence technology, the application of facial expression recognition attracts a wide range of concerns, such as: the facial expression recognition method can be applied to the fields of fatigue driving, life entertainment, criminal investigation, medical diagnosis, virtual reality, intelligent education and the like. Particularly, with the development of emotion calculation research, the research on facial expressions is more hot, and according to the research of related researchers, the information contained in the facial expressions can reach 55%, so that the research on facial expression recognition is of great importance to emotion calculation.

For a long time, most researchers have studied facial expression data sets under laboratory conditions; with the development of research and the publication of data sets under natural conditions, the current research focus gradually turns to the recognition of facial expressions under natural conditions. Compared with the expression image under the natural condition, the expression image under the natural condition has the problems of multiple postures, shielding and the like under the laboratory condition, so that the traditional manual characteristic method and the shallow convolutional neural network method are difficult to obtain better identification accuracy on the expression data set under the natural condition. Because the expressions are formed by combining the movements of facial muscles, the characteristics of the expressions are finer compared with other images, the expressions of the same category have larger difference, and the different expressions are often local characteristic changes, so that a model capable of better extracting the characteristics with discrimination and the local characteristics is required to be used, and the accuracy of facial expression recognition can be improved.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a method, a device, a medium and equipment for identifying static facial expressions in natural scenes, which have the advantages of high accuracy and high identification efficiency and have good identification effect.

In order to achieve the purpose, the invention is realized by the following technical scheme: a method for recognizing static facial expressions in natural scenes is characterized by comprising the following steps: the method comprises a pretreatment process, a global feature extraction process, a local feature extraction process and a feature fusion process which are sequentially executed;

the preprocessing procedure is used for removing the background of a natural scene in the picture and only reserving a face area;

the global feature extraction process is to extract the features of the whole picture after the preprocessing process by adopting a convolutional neural network and convert the features into feature vectors;

the local feature extraction process is to extract a local area with the most information quantity of the picture by a target detection method and convert the local area into a feature vector;

the feature fusion process is to integrate the feature vectors extracted by the global feature extraction process and the feature vectors extracted by the local feature extraction process into a feature matrix, and then obtain the facial expression probability through a full connection layer and softmax.

Preferably, the preprocessing procedure refers to performing face detection, face alignment and face cropping operations on the picture.

Preferably, the local feature extraction process includes the following sub-steps:

s1, generating a plurality of target detection frames by adopting a target detection method;

s2, converting the local area feature map of the target detection box in the S1 into a score of a corresponding expression category by using a convolutional neural network, and expressing the score as follows by using a formula:

S:A→[0,1]

wherein S represents a score set of local regions, A is a set of all local regions, and R ∈ A, the score of the local region is set as C (R), the information content of the local region is set as I (R), and the information content of all local regions is ordered:

R₁,R₂∈A,ifC(R₁)>C(R₂),I(R₁)>I(R₂)

s3, selecting the first K local areas with the largest information quantity;

and S4, extracting K local area features screened in the S3 by adopting a convolutional neural network, and converting the K local area features into feature vectors.

Preferably, in steps S2 and S4 of the global feature extraction process and the local feature extraction process, the convolutional neural network uses resnet50 as a base network.

Preferably, in step S4 of the global feature extraction step and the local feature extraction step, the feature vectors are both M × 1-sized vectors;

in the feature fusion step, the size of the feature matrix is M × (K + 1).

An apparatus for recognizing facial expressions based on static states in a natural scene, comprising:

the preprocessing module is used for removing the background of a natural scene in the picture and only reserving a face area;

the global feature extraction module is used for extracting the features of the whole picture processed by the preprocessing module by adopting a convolutional neural network and converting the features into feature vectors;

the local feature extraction module is used for extracting a local area with the most information quantity of the picture by a target detection method and converting the local area into a feature vector;

and the feature fusion module is used for integrating the feature vectors extracted by the global feature extraction module and the feature vectors extracted by the local feature extraction module into a feature matrix, and then obtaining the facial expression probability through a full connection layer and softmax.

A storage medium, wherein the storage medium stores a computer program that, when executed by a processor, causes the processor to execute the above-described static facial expression recognition method in a natural scene.

A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the method for recognizing static facial expressions in natural scenes when executing the program stored in the memory.

Compared with the prior art, the invention has the following advantages and beneficial effects:

according to the method, a data set in a natural scene is preprocessed, and global and local features are extracted to finally fuse the global and local features to obtain a model for recognizing the expression, so that the static facial expression is accurately recognized; the facial expression recognition accuracy is high, the recognition efficiency is high, and a good recognition effect is achieved; the method can be used for human emotion calculation and a human-computer interaction system, and can improve the working efficiency of a facial expression correlation system.

Drawings

FIG. 1 is a schematic flow chart of a method for recognizing static facial expressions in a natural scene according to the present invention;

fig. 2 is a flow chart of local feature extraction in the static facial expression recognition method in natural scenes according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Example one

As shown in fig. 1, the method for recognizing static facial expressions in a natural scene in the present embodiment is characterized in that: the method comprises a preprocessing procedure, a global feature extraction procedure, a local feature extraction procedure and a feature fusion procedure which are sequentially executed.

The preprocessing procedure is used for removing the background of a natural scene in the picture and only reserving a face area; the method refers to the operations of face detection, face alignment and face cutting of a picture.

The global feature extraction process is to extract the features of the whole picture after the preprocessing process by adopting a convolutional neural network and convert the features into feature vectors.

The local feature extraction process is to extract a local area with the most information quantity of the picture by a target detection method and convert the local area into a feature vector.

The local feature extraction process includes the following sub-steps, as shown in fig. 2:

S:A→[0,1]

R₁,R₂∈A,ifC(R₁)>C(R₂),I(R₁)>I(R₂)

s3, selecting the first K local areas with the largest information quantity;

When the score corresponding to the local area feature is obtained, the feature needs to be extracted through a resnet50 network, a final score value is obtained through a classifier, the area with the larger score value represents the area with the most information quantity, the area is screened out to be used as a local area module, the specific area number K is used as a hyper-parameter of the model, and manual adjustment is needed.

The feature fusion process is to integrate the feature vectors extracted in the global feature extraction process and the feature vectors extracted in the local feature extraction process into a feature matrix, and then obtain the facial expression probability through a full connection layer and softmax.

In steps S2 and S4 of the global feature extraction step and the local feature extraction step, the convolutional neural network uses resnet50 as a base network.

In step S4 of the global feature extraction step and the local feature extraction step, the feature vectors are all M × 1 vectors, and in the feature fusion step, the feature matrix is M × (K + 1).

Convolutional neural network training may employ existing techniques. Specifically, the convolutional neural network is essentially function fitting, for example, simple y ═ wx, a weight w is obtained by training a model, first, w is initialized, after data is input each time, y is obtained, a difference between y and an expected target value is calculated, y is continuously approximated, and finally w is obtained, that is, the convolutional neural network.

The following description will be made of specific embodiments.

Based on the public facial expression data RAF-DB, a static facial expression recognition method in a natural scene is provided, an accurate solution is provided for facial expression recognition in the natural scene, the working efficiency of a facial expression related system is improved, and an emotion man-machine interaction system is perfected.

In this embodiment, the pictures of the data set are all processed into pictures with the size of 224 × 224 through a preprocessing procedure, and are classified into expressions, and the output categories of the classifier are mainly classified into seven categories such as anger (anger), depression (dismost), fear (fear), happy (happy), neutral (netural), sadness (sad), surprise (surpride) and the like according to the labels of the data set, so that the number of the output categories of the classifier is set to be 7.

The method for recognizing the static facial expression in the natural scene comprises the following steps:

the method comprises the following steps of preprocessing, wherein the preprocessing comprises the operations of face detection, face alignment, face cutting and the like, five key point coordinates in a face are detected by adopting a multitask cascade (MTCNN) face detection model, the left eye coordinates and the right eye coordinates are connected to calculate the included angle theta between the left eye coordinates and the right eye coordinates and the horizontal direction, the image is rotated clockwise by the angle theta according to the left eye coordinates as an axis to obtain a new image and new key point coordinates, the image is cut according to the new key point coordinates, and the cut image is stored as a 224-size face image with the size of 224 × 224.

And a global feature extraction process, wherein a convolutional neural network is adopted to extract the features of the whole picture after the preprocessing process and convert the features into feature vectors, and the convolutional neural network adopts the resnet50 without a full connection layer as a feature extraction network and converts the feature matrix into the feature vectors of 2048 × 1.

The local feature extraction process mainly adopts an extraction frame in target detection to extract the features of local areas, and selects an area containing the maximum information amount from all the local areas as the local area features of expression recognition, and the detailed steps are as follows:

S:A→[0,1]

wherein S represents the fraction of the local area, A is the set of all the local areas, and R ∈ A, the fraction of the local area is set as C (R), the information content of the local area is set as I (R), the information content of all the local areas is ordered:

R₁,R₂∈A,ifC(R₁)>C(R₂),I(R₁)>I(R₂)

s3, selecting the first K local areas with the largest information quantity;

and S4, extracting K local region features screened in S3 by adopting a convolutional neural network, and converting the K local region features into K2048 × 1 feature vectors.

And a feature fusion process, namely integrating the feature vectors extracted in the global feature extraction process and the feature vectors extracted in the local feature extraction process into a 2048 × (K +1) feature matrix, and then obtaining the facial expression probability through a full connection layer and softmax.

The method is implemented by using a public data set RAF-DB, a training set and a test set of the RAF-DB are preprocessed according to the preprocessing mode, then a local feature and global feature fusion model is constructed, the training set training model is adopted, finally, the test set is only used for testing, and finally, the expression recognition accuracy rate obtained by the model is obtained.

The effect of the method of the invention on the RAF-DB data set is compared with that of other methods through the following table, and the comparison shows that the method of the invention has better identification effect compared with other methods.

Method of producing a composite material	Year of year	Rate of accuracy
			DLP-CNN(baseline)	2019	84.13
gACNN	2018	85.07
			The method of the invention	2020	85.75

The facial expression recognition method mainly comprises the steps of extracting similar features of the same type of expressions and constructing a model for distinguishing different expressions according to different features of different types of expressions. By combining related theories of psychology, graphics and information science, practical and valuable references are provided for human emotion calculation and human-computer interaction system research. With the development of artificial intelligence technology, the method is widely applied to the fields of psychological disease treatment, emotional interaction and the like. However, at present, the expression recognition in a natural scene is a problem with a larger challenge, in the embodiment, the disclosed expression data set in the natural scene is preprocessed, the local features and the global features are extracted, and finally the global features and the local features are fused to obtain a model for the expression recognition, so that the accurate recognition of the static facial expression is realized.

Example two

In order to implement the method for recognizing static facial expression in natural scene according to the first embodiment, the present embodiment provides a device for recognizing static facial expression in natural scene, including:

EXAMPLE III

The present embodiment is a storage medium storing a computer program, which when executed by a processor causes the processor to execute the method for recognizing a static facial expression in a natural scene according to the first embodiment.

Example four

The embodiment of the invention relates to a computing device, which comprises a processor and a memory for storing a program executable by the processor, and is characterized in that when the processor executes the program stored in the memory, the method for recognizing the static facial expression in the natural scene, as described in the first embodiment, is implemented.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A method for recognizing static facial expressions in natural scenes is characterized by comprising the following steps: the method comprises a pretreatment process, a global feature extraction process, a local feature extraction process and a feature fusion process which are sequentially executed;

2. The method of recognizing static facial expressions in natural scenes according to claim 1, characterized in that: the preprocessing procedure refers to the operations of face detection, face alignment and face cutting of the picture.

3. The method of recognizing static facial expressions in natural scenes according to claim 1, characterized in that: the local feature extraction process comprises the following sub-steps:

s2, converting the local area feature map of the target detection box in S1 into scores of corresponding expression classes by using a convolutional neural network, and formulating as follows:

S:A→[0,1]

R₁,R₂∈A,ifC(R₁)>C(R₂),I(R₁)>I(R₂)

s3, selecting the first K local areas with the largest information quantity;

s4, extracting K local area features screened in S3 by adopting a convolutional neural network, and converting the K local area features into feature vectors.

4. A method of recognizing static facial expressions in natural scenes according to claim 3, characterized in that: in steps S2 and S4 of the global feature extraction step and the local feature extraction step, the convolutional neural network uses resnet50 as a base network.

5. The method of claim 3, wherein in step S4 of the global feature extraction step and the local feature extraction step, the feature vectors are all M × 1 vectors;

in the feature fusion step, the size of the feature matrix is M × (K + 1).

6. An apparatus for recognizing facial expressions based on static states in a natural scene, comprising:

7. A storage medium, wherein the storage medium stores a computer program which, when executed by a processor, causes the processor to execute the method of static facial expression recognition in natural scenes according to any one of claims 1 to 5.

8. A computing device comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the method of static facial expression recognition in natural scenes according to any one of claims 1 to 5.