CN112183213A

CN112183213A - Facial expression recognition method based on Intra-Class Gap GAN

Info

Publication number: CN112183213A
Application number: CN202010905875.1A
Authority: CN
Inventors: 刘韵婷; 陈亮; 吴攀
Original assignee: Shenyang Ligong University
Current assignee: Shenyang Ligong University
Priority date: 2019-09-02
Filing date: 2020-09-01
Publication date: 2021-01-05
Anticipated expiration: 2040-09-01
Also published as: CN112183213B

Abstract

A facial expression recognition method based on Intra-Class Gap GAN is characterized in that a recognition model is constructed, and the method comprises the following steps: (1) acquiring real-time images of different sources and different expressions of the human face; (2) inputting the image into an Intra-Class Gap GAN neural network model for identification; (3) outputting the identification result; compared with the traditional method for manually extracting expression characteristics, the facial expression recognition method based on the generation countermeasure realizes the automatic extraction of the facial expression characteristics, and compared with the slightly early neural network facial expression recognition, the facial expression recognition method based on the generation countermeasure realizes the improvement of the recognition rate, thereby accurately recognizing the expression.

Description

Facial expression recognition method based on Intra-Class Gap GAN

Technical Field

The invention relates to the field of facial expression recognition of image processing and deep learning, in particular to a facial expression recognition method based on antagonism generation.

Background

The huge floating population in China exerts great pressure on urban infrastructure and public service, the frequent incidents of vicious injury in recent years, the security situation is concerned, the urban management and service system is seriously lagged behind, the improvement is urgently needed, and the enhancement of urban monitoring and the facial expression recognition of lawbreakers become more important. The expression is the emotional state expressed by the change of facial muscles, the abnormal psychological state can be judged and the extreme emotion can be presumed by identifying the expression of facial emotion of people, the facial expression of pedestrians in a complex environment is observed, technical support is provided for further judging the psychology of people, people are roughly judged to be suspicious, and certain criminal activities are stopped in time. The traditional facial expression recognition is mainly a facial expression recognition method based on template matching and a neural network. In addition, the traditional facial expression needs human intervention in the feature selection process, the feature extraction algorithm is finely designed by manpower, sufficient computing power is lacked, the training difficulty is high, the accuracy rate is low, and the original expression information is easy to lose.

Disclosure of Invention

The purpose of the invention is as follows:

according to the proposed intra-class difference of facial expression recognition under the real condition, the facial expression recognition method based on generation countermeasure is provided for the technical problems that the difficulty is high in complex environment security inspection, and the requirement of the facial expression recognition rate cannot be met due to the intra-class difference.

The technical scheme is as follows:

a facial expression recognition method based on Intra-Class Gap GAN,

the identification model construction comprises the following steps:

(1) acquiring real-time images of different sources and different expressions of the human face;

(2) inputting the image into an Intra-Class Gap GAN neural network model for identification;

(3) outputting the identification result;

the method for constructing the Intra-Class Gap GAN neural network model in the step (2) is as follows:

(2.1) acquiring historical images of different sources and different expressions of the human face;

(2.2) preprocessing the collected face image to construct a facial expression data set;

(2.3) constructing an Intra-Class Gap GAN neural network model aiming at the problem of facial expression recognition of intra-Class differences in the data set in the step (2.2);

(2.4) training the generator and discriminator of the network simultaneously by combining the pixel difference between the input image and the reconstructed image and the difference of the potential vector, and ensuring that the difference between the reconstructed image and the input image is minimum.

(2.2) the method for constructing the human face expression data set in the step is as follows:

s11: based on Multi-PIE and JAFFE expression data sets, facial expression pictures are downloaded on the network through the step (2.1), the facial expression data sets required by self-control are carried out, 5 facial expressions of abomination, happy, neutral, anxious and surrose and fear of people in different countries, different age groups, different professions and the like are selected for experiment, and the data sets with the complexity of a large number of facial expression characteristics with intra-class differences are increased and serve as input images x of network training

S12: geometrically normalizing the input image, and carrying out face detection on the normalized image;

s13: the images after the processing in step S12 are scale-normalized to unify the sizes of the images.

The step (2.4) is specifically as follows:

s14: training a facial expression recognition network model based on an IC-GAN (Intra-Class Gap GAN) neural network generating a confrontation based on the image processed in step S13;

s15: carrying out data enhancement and data expansion processing on the image;

s16: and training the network model and storing the trained network model.

The step S12 includes the following steps:

s121: determining a characteristic point [ x, y ] according to the collected image, and calibrating the characteristic points of the two eyes and the nose to obtain coordinate values of the characteristic points;

s122: rotating the image according to the coordinates of the eyes on the face to ensure the consistency of the face direction, wherein the distance between the eyes of a person is d, and the midpoint of the two eyes is O;

s123: and determining a frame containing the face according to the calibrated characteristic points and the geometric model, respectively cutting the distance of d from the O to the left and the right, and respectively cutting by taking 0.5d and 1.5d in the up-down direction.

The step S13 includes the following steps:

s131: and (4) performing scale normalization on the cut images in the step (S123), and unifying resize of the images into 256 × 256 pixel images to complete geometric normalization of the images.

The step S14 includes the following steps:

s141: constructing the proposed IC-GAN (internal-Class Gap GAN) neural network by using a pytorch deep learning framework, firstly inputting the picture processed in the step S13 into a first layer of convolution layer for convolution operation, performing convolution on the input image through a 4 × 4 convolution core, and outputting the result as 128 × 64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, and outputting 128 × 64; the LeakyReLu activation function is:

a_iis a fixed parameter within the interval (1, + ∞);

s142: continuing to perform convolution operation on the output of the previous layer by using 4 × 4 convolution kernel, wherein the output is 64 × 128, then performing normalization operation on the output of the previous layer by using a batchnorm layer, performing nonlinear operation on the convolution by using a LeakyReLu activation function, and outputting 64 × 128

S143: continuing to perform convolution, batchnorm and LeakyReLu operations on the output of the previous layer by using the method in the step S142, wherein the output is 4 × 100;

s144: performing reverse convolution operation of a convolution kernel 4 × 4 on the output of the S143 to obtain an output of 29 × 1, performing batch normalization operation by using a batchnorm, and performing nonlinear operation on the output by using a ReLu activation function to obtain an output of 32 × 128; the ReLu activation function is:

s145: performing the convolution, bathnorm and ReLu operations in step S144 again on the output of the previous layer, and outputting 64 × 64;

s146: performing nonlinear operation on the output of the previous layer by using a ReLu activation function, performing convolution operation on the previous layer by using reverse convolution with a convolution kernel of 4 × 4, and performing nonlinear operation by using a Tanh activation function to output 128 × 128; the Tanh activation function is:

s147: performing the operation in the S141-S143 process again on the output of the previous layer, wherein the output is 1 x 5;

s148: inputting the image subjected to scale normalization in the step S13 and the output of the step S147 into a 4 × 4 convolution layer, performing convolution operation, and then performing nonlinear activation by using a nonlinear activation function LeakyReLu to output 128 × 64;

s149: performing convolution operation on the output of the previous layer by using 4 × 4 convolution kernel, performing batch normalization operation by using batchnorm, and performing LeakyReLu nonlinear activation;

s1491: continuing to perform convolution, batchnorm and nonlinear operation on the output of the previous layer by adopting the process of S142, wherein the output is 4 x 1;

s1492: finally, adopting Softmax to the output of the upper layer, and outputting the probability of judging the output to be true;

s1493: and performing full-connection operation on the output of the S147 process, and finally realizing training of 5 expressions by a Softmax classifier, wherein the 5 expressions are 1 ═ happy, 2 ═ inhibition, 3 ═ neutral, 4 ═ excitation, and 5 ═ surprism and fear, so as to realize recognition of facial expressions.

Step S15 includes

S151: dividing the network loss function into four parts, and reducing the difference between an original image and a reconstructed image on a pixel level for the generated network of the first part, wherein the reconstruction error loss is as follows:

L_con＝E_x～pX||x-G(x)||₁；

pX represents data allocation; x is the input image g (X) is the image generated by the generator in the network;

using the feature matching method proposed by Salimans et al to reduce training instability, the image feature level is optimized, and one feature matching error of the discriminator of the second partial network is:

L_adv＝E_x～pX||f(x)-f(G(x))||₂

wherein f (-) represents the discriminator model transformation;

the third part is a potential vector z and a reconstructed potential vector

The coding loss of the facial expression information prevents the interference with the picture independence information in the network decoding process:

where h (■) represents the transcoding;

the network loss of the fourth part is the cross entropy loss of the Softmax layer:

where k (■) represents the cross-entropy loss process of Softmax, k (y) represents the true result,

representing the recognition result;

the overall network loss function is as follows:

L＝ω_advL_adv+ω_conL_con+ω_pL_p+ω_sL_s

wherein ω is_adv，ω_con，ω_p，ω_sIs a parameter for adjusting losses;

s152: the Optimizer selects an Adam Optimizer, the learning rate is set to be 0.0002, training samples are trained in batches, 16 pictures are selected for each batch to be trained, and the epoch is respectively set to be 100, 200, 300 and 400;

s153: in each training, 1 epoch picture is firstly obtained, then the loss value is calculated, and then the Adam optimizer is used for continuously updating the parameters of the network to minimize the loss value of the network.

In the step (3), the pictures are input into the trained IC-GAN network model for recognition, the probability of each facial expression is finally output, and the expression category with the maximum output probability is the classification result of the people; the probability calculation formula is as follows:

wherein z is_iAn ith output representing a network; omega_ijIs the jth weight of the ith neuron, b is the bias; s_iRepresented by the output of the ith neuron, y_iRepresentative is the ith output value of Softmax.

The advantages and effects are as follows:

the invention designs a facial expression recognition method based on generation confrontation, which comprises a network training process and an off-line recognition process of facial expression recognition with intra-class difference; the offline identification process should include the following steps:

s11: downloading through a network, skipping a frame, analyzing a video, and collecting an input image x;

s12: geometrically normalizing the input image x and detecting an image x' after normalization processing;

s13: processing the detected and cut image x' to a uniform size;

s14: constructing a network model based on the generated confrontation facial expression recognition;

s15: carrying out data enhancement and data expansion processing on the image x' and unifying the image size;

s16: training the network model and storing the trained network model;

for the identification process, the following steps should be included:

s21: downloading, skipping a frame, analyzing a video and collecting an input image I through a network;

s22: then the input image I inputs the trained network model;

s23: and obtaining a recognition result.

The step S12 further includes the following steps:

s121: performing geometric normalization processing on the input image; the geometric normalization method comprises scale normalization, external head correction and face twisting correction;

s122: performing face detection on the image after geometric normalization by using a face detection method in an OpenCV open source library, and then performing noise reduction on the detected image;

s23: resulting in a geometrically normalized image x'.

The step S13 further includes:

s131: determining the position of an image according to the coordinates of the face;

s132: using OpenCV to detect and obtain a face image;

s133: and adjusting the cut face image into a uniform size, and changing the cut face image into 256 × 256 size.

Further, step S14 should further include: s141: building an IC-GAN neural network by using a pytorch deep learning framework, firstly inputting a picture into a con _1 layer for convolution operation, performing convolution on an input image through a 4 × 4 convolution kernel, and outputting 128 × 64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, and outputting 128 × 64; the LeakyReLu activation function is:

a_iis a fixed parameter within the interval (1, + ∞);

s142: continuing to perform convolution operation on the output of the previous layer by using a 4 × 4 convolution kernel, wherein the output is 64 × 128, then performing normalization operation on the output of the previous layer on the batcnorm layer, and performing nonlinear operation on the convolution by using a LeakyReLu activation function, wherein the output is 64 × 128;

s143: continuing to perform convolution, batchnorm and LeakyReLu operations on the output of the previous layer by using the method of S142, wherein the output is 4 x 100;

s144: performing reverse convolution operation of a convolution kernel 4 × 4 on the output of the S143 to obtain an output of 29 × 1, performing batch normalization operation by using a batchnorm, and performing nonlinear operation on the output by using a ReLu activation function to obtain an output of 32 × 128;

s145: performing the convolution, bathnorm and ReLu operations described in S144 again on the output of the previous layer, and outputting 64 × 64;

s146: performing nonlinear operation on the output of the previous layer by using a ReLu activation function, performing convolution operation on the previous layer by using reverse convolution with a convolution kernel of 4 × 4, and performing nonlinear operation by using a Tanh activation function to output 128 × 128;

s148: inputting the original image and the output of S147 into a 4 × 4 convolution layer, performing convolution operation, and then performing nonlinear activation by using a nonlinear activation function LeakyReLu, wherein the output is 128 × 64;

s1491: continuously performing convolution, batchnorm and nonlinear operation on the output of the previous layer by adopting the process of S150, wherein the output is 4 x 1;

s1492: and finally, adopting Softmax to the output of the previous layer, and outputting the probability of judging to be true.

S1493: performing full-connection operation on the output of the S147 process, and finally realizing training of 5 expressions through a Softmax classifier, where the 5 expressions are 1 ═ happy, 2 ═ inhibition, 3 ═ neutral, 4 ═ excitation, and 5 ═ surpride and fear, so as to realize recognition of facial expressions;

step S15 should also include: s151: according to the network structure and experimental characteristics, the network loss is also divided into four parts, for the generation network of the first part, the difference between the original image and the reconstructed image is reduced on the pixel level, and the reconstruction error loss is as follows:

L_con＝E_x～pX||x-G(x)||₁；

the feature matching method proposed by Salimans et al is used herein to reduce training instability, the image feature level is optimized, and one feature matching error of the discriminator of the second partial network is:

L_adv＝E_x～pX||f(x)-f(G(x))||₂

where f (-) represents the discriminator model transformation.

The third part is a potential vector z and a reconstructed potential vector

where h (-) represents the transcoding.

where k (-) represents the cross-entropy loss process of Softmax, k (y) represents the true result,

representing the recognition result.

The overall network loss function is as follows:

L＝ω_advL_adv+ω_conL_con+ω_pL_p+ω_sL_s

wherein ω is_adv，ω_con，ω_p，ω_sIs a parameter for adjusting the loss.

S152: the Optimizer selects an Adam Optimizer, the learning rate is set to 0.0002, training samples are trained in batches, 16 pictures are selected for each batch to be trained, and the epoch is set to 100, 200, 300 and 400 respectively.

Further, the step S16 further includes: s161: downloading, skipping frames, analyzing videos and collecting input images through a network;

s162: performing geometric normalization processing, face detection, opencv processing and size unification on the input image;

s163: and inputting the processed image into a trained IC-GAN network model for recognition, and finally outputting the probability of each expression, wherein the expression with the maximum probability is taken as the expression which is required to be recognized by the network.

Compared with the prior art, the invention has the advantages that:

compared with the traditional method for manually extracting expression characteristics, the facial expression recognition method based on the generation countermeasure realizes the automatic extraction of the facial expression characteristics, and compared with the slightly early neural network facial expression recognition, the facial expression recognition method based on the generation countermeasure realizes the improvement of the recognition rate, thereby accurately recognizing the expression.

Drawings

In order to illustrate embodiments or prior art solutions of the present invention more clearly, all the figures that are essential to the embodiments or prior art descriptions will be briefly described below, so that the following figures are some embodiments of the invention, from which other figures can be derived by other researchers in this field.

FIG. 1 is an overall flow chart of the present invention.

FIG. 2 is a schematic diagram of the IC-GAN network model of the present invention

Detailed Description

A facial expression recognition method based on Intra-Class Gap GAN,

the identification model construction comprises the following steps:

(3) outputting the identification result;

(2.3) constructing an Intra-Class Gap GAN neural network model aiming at the facial recognition problem that the data set in the step (2.2) has an intra-Class difference (the difference of the same expression is called as the intra-Class difference, or the same expression has different expression forms, the intra-Class difference is called as the intra-Class difference, the collected image can be influenced by the shooting angle of an external environment shelter, the like expression is also smiling, the like expression can be mistakenly recognized into other types of expressions due to the above reasons, but the characteristic difference is particularly large due to the complex surrounding environment and the like, the recognition accuracy is finally influenced, the like expression is that the like expression is smiling, and the like expression is mistakenly recognized into other types of expressions due to the influence of the shooting angle of the external environment shelter, and the like);

and 2.4, simultaneously training a generator and a discriminator of the network by combining pixel differences and potential vector differences between an input image (a training sample input during network training) and a reconstructed image (an image generated in the training process is used for matching with an original image, and the network is considered to be trained to correctly extract image characteristics when the generated reconstructed image and the input image have no difference), so as to ensure that the difference between the reconstructed image and the input image is minimum. (the network constructed by the user has the strongest network identification capability when the original input picture is compared with the picture generated by the network during training and the picture generated by the network is consistent with the input picture when the network is trained)

s11: based on Multi-PIE and JAFFE expression data sets, facial expression pictures are downloaded on the network through the step (2.1), facial expression data sets (sample expansion) required by self-making of the text are made, 5 facial expressions of abomination, happy, neutral, anxious, surfise and fear of people in different countries, different age groups, different professions and the like are selected for experiment, a large number of similar expressions are added, wherein the similar expressions have large intra-class differences (the similar intra-class differences can be called as large, generally, the differences are basically recognized as large as long as the intra-class differences exist, the same expression forms of one expression comprise the same expression (for example, smiles and laughter and the like), the forms presented under the same background environment are called as intra-class, for example, the same smile expression of one person is presented under the same background environment, if the conditions are not met, the similar intra-class differences exist or the intra-class differences are large, for example: different backgrounds, different expressions or different persons, as long as one is satisfied, the differences exist in the class or the differences are larger. ) Complexity of facial expression features of the data set as input image x for network training

S12: geometrically normalizing the input image and performing face detection on the normalized image (obtaining a suitable face image as in claim 3, obtaining sample data suitable for network training by processing, such as possibly requiring rotation to ensure consistency of face direction, etc.);

s13: the images after the processing in the step S12 are scale-normalized, unifying the sizes of the images (S12 and S13 are preprocessing procedures).

The step (2.4) is specifically as follows:

s15: carrying out data enhancement and data expansion processing on the image;

s16: and training the network model and storing the trained network model.

The step S12 includes the following steps:

s122: rotating the image according to the coordinates of the eyes on the face to ensure the consistency of the face direction (the process of face image preprocessing reflects the rotation invariance of the face in the image plane), wherein the distance between the eyes of a person is d, and the midpoint of the two eyes is O;

The step S13 includes the following steps:

The step S14 includes the following steps:

a_iis a fixed parameter in the interval (1, + ∞).

S142: continuously performing convolution operation on the output of the previous layer (the first convolution layer) by using 4 × 4 convolution kernels, wherein the output is 64 × 128, then performing normalization operation on the output of the previous layer by using a batcnorm layer, performing nonlinear operation on the convolution by using a LeakyReLu activation function, and outputting 64 × 128

Step S15 includes

S151: according to the constructed IC-GAN network structure, the network loss function is divided into four parts, for the generation network of the first part, the difference between the original image and the reconstructed image is reduced on the pixel level, and the reconstruction error loss is as follows:

L_con＝E_x～pX||x-G(x)||₁；

L_adv＝E_x～pX||f(x)-f(G(x))||₂

where f (-) represents the discriminator model transformation.

The third part is a potential vector z and a reconstructed potential vector

where h (-) represents the transcoding.

representing the recognition result.

The overall network loss function is as follows:

L＝ω_advL_adv+ω_conL_con+ω_pL_p+ω_sL_s

wherein ω is_adv，ω_con，ω_p，ω_sIs a parameter for adjusting the loss.

In the step (3), the pictures are input into the trained IC-GAN network model for recognition, the probability of each facial expression is finally output, and the expression category with the maximum probability is output as the classification result of the people

In order to make the technical solution of the present invention more clearly understood, the technical solution of the present invention will be described in detail and completely with reference to the accompanying drawings in the embodiments of the present invention, and only some embodiments of the present invention will be given herein. All other embodiments, which are obtained by researchers without obtaining innovative achievements, based on the embodiments of the present invention, should be all included in the scope of protection of the present invention.

It should be noted that the terms "first", "second", and the like in the description and the claims of the present invention and in the above description are not used for describing a sequential order or a precedence order of similar objects, and are used for distinguishing similar objects in the description. Where used, portions of data are interchangeable, to facilitate describing or illustrating an unexpected order of implementation. In addition, the terms "comprising" and "having" and similar terms in the description, as well as any other steps in the description that follow are intended to more clearly describe the inherent nature of such processes, methods, articles of manufacture, and devices.

As shown in FIGS. 1 and 2, the invention provides a facial expression recognition method based on generation confrontation, which comprises a network training process and an off-line recognition process of facial expression recognition with intra-class distance difference.

As an embodiment, the offline identification process should include the following steps:

step S11: downloading through a network, skipping a frame, analyzing a video, and collecting an input image x;

step S12: geometrically normalizing the input image x and detecting an image x' after normalization processing;

step S13: processing the detected and cut image x' to a uniform size;

step S14: constructing a network model based on the generated confrontation facial expression recognition;

step S15: carrying out data enhancement and data expansion processing on the image x' and unifying the image size;

step S16: training the network model and storing the trained network model;

in a specific embodiment, step S12 should further include the steps of:

step S121: performing geometric normalization processing on the input image; the geometric normalization method comprises scale normalization, external head correction and face twisting correction;

step S122: performing face detection on the image after geometric normalization by using a face detection method in an OpenCV open source library, and then performing noise reduction on the detected image;

s23: resulting in a geometrically normalized image x'.

As a preferred embodiment, step S23 further includes:

s132: using OpenCV to detect and obtain a face image;

a_iis a fixed parameter within the interval (1, + ∞);

as a preferred embodiment, the IC-GAN network uses a pytorech to build a network, wherein the network comprises an input layer, a convolutional layer, an activation function, a pooling layer, a full connection layer, a BN layer and an output layer.

As a preferred embodiment, the size before and after the convolutional layer can be described as the following formula:

the input size of the convolutional layer is: w₁*H₁*D₁

The output size of the convolutional layer is:

D₂＝K

in the above formula, K is the number of convolution kernels, F is the size of the convolution kernels, S is the step size, and P is the boundary padding.

In a preferred embodiment, a total of 5 expression labels, 1 being happy, 2 being abomination, 3 being neutral, 4 being equal to anisous, and 5 being surpride and fear, are shared by 4455 images as one mixed expression dataset of the present application, and the present invention has a problem that the distribution of the dataset is not uniform, so that the dataset is expanded by using image affine transformation, image mirror transformation, contrast adjustment, brightness adjustment, and the like, and the number of the mixed expression datasets after expansion is as shown in table 1:

TABLE 1 number of expressions in the post-augmentation Mixed dataset

As a most preferred method of the present application, step S15 further includes: s151: defining the loss of the network as 4 parts according to the network structure and the experimental characteristics;

s163: and inputting the processed image into a trained IC-GAN network model for recognition, and finally outputting the probability of each expression, wherein the expression with the maximum probability is the expression which the network wants to recognize.

Compared with the prior art, the invention has the advantages that:

The idea of model training is to cut images by OpenCV open source codes before inputting pictures into a network for training, unify the images into 256 × 256 sizes, and then train an IC-GAN network model by taking preprocessed pictures as input of the network. The Softmax loss function adopts a cross entropy loss function, the Optimizer adopts an Adam Optimizer, the learning rate is set to be 0.0002, the training samples are trained in batches, 16 pictures are selected for each batch to be trained, and the epoch is set to be 100, 200, 300 and 400 respectively.

As a preferred embodiment of the present application, the identification process should comprise the following steps:

s22: then the input image I inputs the trained network model;

s23: and obtaining a recognition result.

The above-mentioned serial numbers of the embodiments of the present invention are only for describing the present invention, and do not indicate the quality of any embodiments.

In the embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and if a part of a certain embodiment is not clearly described, reference may be made to the corresponding descriptions in other embodiments;

in several embodiments provided in the present application, the technical contents described can be implemented in other ways. All of the above description is intended to be illustrative only.

Claims

1. A facial expression recognition method based on Intra-Class Gap GAN is characterized by comprising the following steps:

the identification model construction comprises the following steps:

(3) outputting the identification result;

(2.3) aiming at the facial expression recognition problem with the intra-Class Gap in the data set in the step (2.2), constructing an Intra-Class Gap GAN neural network model;

2. The method of claim 1, wherein the facial expression recognition method based on Intra-Class Gap GAN comprises:

3. The method of claim 2, wherein the facial expression recognition method based on Intra-Class Gap GAN comprises: the step (2.4) is specifically as follows:

s15: carrying out data enhancement and data expansion processing on the image;

s16: and training the network model and storing the trained network model.

4. The method of claim 2, wherein the facial expression recognition method based on Intra-Class Gap GAN comprises: : the step S12 includes the following steps:

5. The method of claim 4, wherein the facial expression recognition method based on Intra-Class Gap GAN comprises:

the step S13 includes the following steps:

6. The method of claim 3, wherein the facial expression recognition method based on Intra-Class Gap GAN comprises:

the step S14 includes the following steps:

a_iis a fixed parameter within the interval (1, + ∞);

7. The method of claim 3, wherein the facial expression recognition method based on Intra-Class Gap GAN comprises: step S15 includes

L_con＝E_x～pX||x-G(x)||₁；

L_adv＝E_x～pX||f(x)-f(G(x))||₂

wherein f (-) represents the discriminator model transformation;

the third part is a potential vector z and a reconstructed potential vector

where h (-) represents the encoding transform;

representing the recognition result;

the overall network loss function is as follows:

L＝ω_advL_adv+ω_conL_con+ω_pL_p+ω_sL_s

wherein ω is_adv，ω_con，ω_p，ω_sIs a parameter for adjusting losses;

8. A facial expression recognition method based on intraclass Gap GAN as described in claim 1, wherein:

and (3) inputting the picture into the trained IC-GAN network model for recognition, finally outputting the probability of each facial expression, and outputting the expression class with the maximum probability as the classification result.