CN115512422A

CN115512422A - Convolutional neural network facial emotion recognition method and system based on attention mechanism

Info

Publication number: CN115512422A
Application number: CN202211273222.1A
Authority: CN
Inventors: 宋斌; 张志勇; 张中亚; 张丽丽; 李玉祥; 赵长伟; 孔功胜; 向菲; 荆军昌; 毛岳恒
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2022-12-23

Abstract

The invention discloses a method and a system for identifying facial emotion of a convolutional neural network based on an attention mechanism, which comprises the steps of collecting face image data, and dividing the face image data into a training set and a test set according to a proportion; inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results; the method can realize the change of the emotion of the user, improve the emotion recognition accuracy of the user, improve the detection accuracy by using a plurality of convolution and pooling layers through the face micro-expression action detection of the convolutional neural network, avoid the omission of effective action units, improve the face recognition accuracy, and greatly increase the face recognition reliability.

Description

Convolutional neural network facial emotion recognition method and system based on attention mechanism

Technical Field

The invention belongs to the technical field of facial emotion recognition, and particularly relates to a convolutional neural network facial emotion recognition method and system based on an attention mechanism.

Background

Emotions are the sudden responses of a human being to internal or external important events. The duration of emotion is short, but a great deal of complex biological information is contained, including external behaviors such as language and emotion, and brain wave changes generated by the coordination of internal brain nerve mechanisms. Emotions can be generally classified into optimistic, calm, and pessimistic states. Emotion recognition is to enable a computer to recognize an emotional state of a human by observing biological information. The research on emotion recognition can provide scientific basis for assisting medical treatment, preventing fatigue driving, monitoring personal health condition, creating artistic works and the like. Therefore, the study of emotion recognition is of great significance.

For the processing of brain wave signals, many researchers adopt a method of calculating statistical features of partial signals for processing, but such a feature extraction method destroys the correlation between brain wave signals. Meanwhile, many researchers also adopt classifiers of support vector machines to solve the emotion recognition problem, but the result of the method is excessively dependent on the selection of kernel functions, and the result deviation is very large when different kernel functions are used, so that the robustness of the method is poor, and compared with the support vector machines, the convolutional neural network has stronger robustness.

Disclosure of Invention

The invention overcomes the defects of the prior art, and solves the technical problems that: the method and the system for identifying the facial emotion of the convolutional neural network based on the attention mechanism can solve the problem of accurate identification of emotion change of a user.

In order to solve the technical problems, the invention adopts the technical scheme that: the convolutional neural network facial emotion recognition method based on the attention mechanism comprises the following steps: collecting face image data, and dividing the face image data into a training set and a test set according to a proportion; inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; and inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results.

Preferably, the inputting the face image data in the training set into the convolutional neural network model for feature learning to obtain face emotion feature data specifically includes: setting each weight value and threshold value to be small random values close to 0, and initializing an accuracy control parameter eps =1e-15 and a learning rate of 0.05; an input mode is taken from the training set and added to the network, and the input mode and the weight matrix of each layer are subjected to dot product operation, so that the output of each layer is calculated, and the output vector of the output is given; comparing elements in the output vector with elements in the target vector, and calculating residual errors, weights and thresholds of all layers; and saving the model when the average accuracy of the model meets 95%.

Preferably, before inputting the data of the face image to be recognized into the optimized convolutional neural network model to obtain the facial emotion feature data of the face image, the method includes: collecting human face image data to be recognized; carrying out gray level conversion on the face image, and carrying out length and width size correction on the face image after gray level conversion to adjust the length and width of the face image to a preset size; and segmenting the face image according to a preset face part rule, and marking the segmented parts.

Preferably, the inputting the face image data to be recognized into the optimized convolutional neural network model to obtain the facial emotion feature data of the face image specifically includes: inputting the segmented and marked face image data into a trained convolutional neural network model; calculating the marked parts, and integrating the calculation results of all the parts to obtain scores of the lower emotional characteristics under different emotions; and sorting the scores, wherein the emotion corresponding to the highest score is the emotion classification result of the face image data to be recognized.

Preferably, the convolutional neural network model comprises 2 convolutional layers including convolutional layer 1 and convolutional layer 2; the 2 pooling layers comprise a pooling layer 1 and a pooling layer 2; the 2 fully connected layers include fully connected layer 1 and fully connected layer 2 and 1 Softmax layer.

Preferably, the convolution kernel number of the convolution layer 1 is set to be 16, the size of the convolution kernel template is 3 × 3, the convolution layer 1 is connected with the bias layer, the activation function layer and the pooling layer 1, and the pooling kernel template size of the pooling layer 1 is set to be 2 × 2; the number of convolution kernels of the convolution layer 2 is set to 32, and the size of the convolution kernels is 3 x 3; the pooling core size of the pooling layer 2 is set to 2 × 2; the feature vector dimension of the fully connected layer 1 is set to 2048, and the feature vector dimension of the fully connected layer 2 is set to 512; the feature vector dimension of the Softmax layer is 7 types.

In another embodiment, a convolutional neural network based attention mechanism facial emotion recognition system includes: a sample collection unit: collecting face image data, and dividing the face image data into a training set and a test set according to a proportion; a convolution calculation unit: inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; an optimization unit: testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; an identification unit: and inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results.

Preferably, the convolution calculating unit specifically includes: an initialization unit; setting each weight value and threshold value to be small random values close to 0, and initializing an accuracy control parameter eps =1e-15 and a learning rate of 0.05; the first calculation unit: taking an input mode from the training set and adding the input mode to the network, and performing point multiplication operation on the input mode and the weight matrix of each layer so as to calculate the output of each layer and give an output vector of the output; a second calculation unit: comparing elements in the output vector with elements in the target vector, and calculating residual errors, weights and thresholds of all layers; a storage unit: and saving the model when the average accuracy of the model meets 95%.

Preferably, the system for facial emotion recognition based on convolutional neural network of attention mechanism comprises: a collecting unit: collecting face image data to be recognized; a processing unit: carrying out gray level conversion on the face image, and correcting the length and width of the face image after the gray level conversion to adjust the length and width of the face image to a preset size; a marking unit: and segmenting the face image according to a preset face part rule, and marking the segmented parts.

Preferably, the identification unit includes: a dividing unit: inputting the segmented and marked face image data into a trained convolutional neural network model; a third calculation unit: calculating the marked parts, and integrating the calculation results of all the parts to obtain scores of lower emotional characteristics under different emotions; a sorting unit: and sorting the scores, wherein the emotion corresponding to the highest score is the emotion classification result of the face image data to be recognized.

Compared with the prior art, the invention has the following beneficial effects:

the invention relates to a method and a system for identifying facial emotion of a convolutional neural network based on an attention mechanism, which comprises the steps of collecting facial image data, and dividing the facial image data into a training set and a test set according to a proportion; inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results; the method can realize the change of the emotion of the user, improve the emotion recognition accuracy of the user, improve the detection accuracy by using a plurality of convolution and pooling layers through the face micro-expression action detection of the convolutional neural network, avoid the omission of effective action units, improve the face recognition accuracy, and greatly increase the face recognition reliability.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings;

fig. 1 is a schematic flowchart of a method for identifying facial emotion in a convolutional neural network based on an attention mechanism according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a system for facial emotion recognition based on a convolutional neural network of an attention mechanism according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a convolutional neural network facial emotion recognition system based on an attention mechanism according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a convolutional neural network facial emotion recognition system based on an attention mechanism according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a system for facial emotion recognition based on a convolutional neural network of an attention mechanism according to a fourth embodiment of the present invention;

in the figure: 10 is a sample collection unit, 20 is a convolution calculation unit, 201 is an initialization unit, 202 is a first calculation unit, 203 is a second calculation unit, 204 is a storage unit, 30 is an optimization unit, 40 is an identification unit, 401 is a segmentation unit, 402 is a third calculation unit, 403 is a sorting unit, 50 is a collection unit, 60 is a processing unit, and 70 is a marking unit.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention; all other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when an element is described as being "connected" to another element, the element may be "directly connected" to the other element or "connected" to the other element through a third element.

The present application is described below with reference to preferred implementation steps, and fig. 1 is a schematic flowchart of a method for identifying facial emotion of a convolutional neural network based on an attention mechanism according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

s10, collecting face image data, and dividing the face image data into a training set and a test set according to a proportion;

s20, inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data;

s30, testing the convolutional neural network model in the test set, and optimizing parameters of the convolutional neural network model;

and S40, inputting the face image data to be recognized into the optimized convolutional neural network model to obtain an emotion classification result.

Specifically, collecting face image data, and dividing the face image data into a training set and a test set according to a proportion; inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results; the embodiment can realize the change of the emotion of the user, improve the accuracy of emotion recognition of the user, detect the human face micro-expression action through the convolutional neural network, improve the detection accuracy by utilizing a plurality of convolutions and pooling layers, avoid the omission of effective action units, improve the precision of the human face recognition, and further greatly increase the reliability of the human face recognition.

Further, the inputting the face image data in the training set into the convolutional neural network model for feature learning to obtain face emotion feature data specifically includes: setting each weight value and each threshold value to be small random values close to 0, and initializing an accuracy control parameter eps =1e-15 and a learning rate 0.05; taking an input mode from the training set and adding the input mode to the network, and performing point multiplication operation on the input mode and the weight matrix of each layer so as to calculate the output of each layer and give an output vector of the output; comparing elements in the output vector with elements in the target vector, and calculating residual errors, weights and thresholds of all layers; and saving the model when the average accuracy of the model meets 95%.

Further, before inputting the data of the face image to be recognized into the optimized convolutional neural network model and obtaining the facial emotion feature data of the face image, the method includes: collecting human face image data to be recognized; carrying out gray level conversion on the face image, and correcting the length and width of the face image after the gray level conversion to adjust the length and width of the face image to a preset size; segmenting the face image according to a preset face part rule, and marking the segmented parts; specifically, the preprocessing of the face image to be recognized comprises processing of image color, adjustment of image size, image rotation, image turning, image noise reduction and the like, wherein the processing of the image color comprises gray level conversion of the face image, so that the face image is converted into a gray level image; the image size adjustment comprises the step of correcting the length and width of a face image to adjust the length and width of the face image to a preset size; the preset size is set according to historical data, and the subsequent processing can achieve a better effect; the segmented parts are marked, for example, the eyes, lips, mandible and other parts of the human face are segmented; the segmented parts are then labeled, for example, with segmented eyes labeled a, lips labeled b, eyebrows labeled c, and mandible labeled d.

Further, the inputting the face image data to be recognized into the optimized convolutional neural network model to obtain the facial emotion feature data of the face image specifically includes: inputting the segmented and marked face image data into a trained convolutional neural network model; calculating the marked parts, and integrating the calculation results of all the parts to obtain scores of the lower emotional characteristics under different emotions; sorting the sum of the scores, wherein the emotion corresponding to the highest score is the emotion classification result of the face image data to be recognized; specifically, human face emotions are classified into three categories: neutral, positive, negative; neutral mood is the facial appearance of a person under normal conditions; positive mood is the facial expression of people with joy; the negative emotions comprise negative emotions of anger, sadness, surprise and the like; for example, the eyes, lips, eyebrows and mandible of a human face image to be recognized are marked as a, b, c and d respectively, and the scores of the eyes a, the lips b, the eyebrows c and the mandible d under neutral emotion, positive emotion and negative emotion are calculated respectively, wherein the score of the eyes a, the score of the lips b, the score of the eyebrows c and the score of the mandible d under neutral emotion is 5, the score of the lips b is 4, the score of the eyebrows c is 4, the score of the mandible d is 3, and the total score under neutral emotion is 16; under positive mood, the eyes a score 9, the lips b score 8, the eyebrows c score 7, the mandible d score 6, and under positive mood the total score is 30; under negative emotions, the eyes a score 2, the lips b score 1, the eyebrows c score 5, the mandible d score 4, and under negative emotions the total score is 12; and sequencing to obtain that the emotion classification result of the facial image data to be recognized is positive emotion.

Further, the convolutional neural network model comprises 2 convolutional layers including convolutional layer 1 and convolutional layer 2; the 2 pooling layers comprise a pooling layer 1 and a pooling layer 2; the 2 full connection layers comprise a full connection layer 1, a full connection layer 2 and 1 Softmax layer; specifically, the convolutional neural network model comprises 2 convolutional layers including convolutional layer 1 and convolutional layer 2, and can extract higher and multilevel features of the whole face or a local area through a plurality of convolution and pooling layers, and has good classification performance of facial expression image features.

Furthermore, the number of convolution kernels of the convolution layer 1 is set to 16, the size of a convolution kernel template is 3 × 3, a bias layer, an activation function layer and a pooling layer 1 are connected behind the convolution layer 1, and the size of a pooling kernel template of the pooling layer 1 is set to 2 × 2; the number of convolution kernels of the convolution layer 2 is set to 32, and the size of the convolution kernels is 3 x 3; the pooling core size of the pooling layer 2 is set to 2 × 2; the feature vector dimension of the fully connected layer 1 is set to 2048, and the feature vector dimension of the fully connected layer 2 is set to 512; the feature vector dimension of the Softmax layer is of type 7.

Fig. 2 is a schematic structural diagram of a convolutional neural network facial emotion recognition system based on an attention mechanism according to an embodiment of the present invention, and as shown in fig. 2, the convolutional neural network facial emotion recognition system based on the attention mechanism includes:

the sample collection unit 10: collecting face image data, and dividing the face image data into a training set and a test set according to a proportion; convolution calculation unit 20: inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; the optimization unit 30: testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; the recognition unit 40: and inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results.

Specifically, the sample collection unit 10: collecting face image data, and dividing the face image data into a training set and a test set according to a proportion; convolution calculation unit 20: inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data; the optimization unit 30: testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model; the recognition unit 40: inputting the face image data to be recognized into the optimized convolutional neural network model to obtain emotion classification results; the embodiment can realize the change of the emotion of the user, improve the accuracy of emotion recognition of the user, detect the human face micro-expression action through the convolutional neural network, improve the detection accuracy by utilizing a plurality of convolutions and pooling layers, avoid the omission of effective action units, improve the precision of the human face recognition, and further greatly increase the reliability of the human face recognition.

Fig. 3 is a schematic structural diagram of a convolutional neural network facial emotion recognition system based on an attention mechanism according to a second embodiment of the present invention, as shown in fig. 3, the convolutional calculating unit 20 specifically includes: an initialization unit 201; setting each weight value and threshold value to be small random values close to 0, and initializing an accuracy control parameter eps =1e-15 and a learning rate of 0.05; the first calculation unit 202: an input mode is taken from the training set and added to the network, and the input mode and the weight matrix of each layer are subjected to dot product operation, so that the output of each layer is calculated, and the output vector of the output is given; the second calculation unit 203: comparing elements in the output vector with elements in the target vector, and calculating residual errors, weights and thresholds of all layers; the storage unit 204: and saving the model when the average accuracy of the model meets 95 percent.

Fig. 4 is a schematic structural diagram of a convolutional neural network facial emotion recognition system based on an attention mechanism according to a third embodiment of the present invention, as shown in fig. 4, the system includes: the acquisition unit 50: collecting human face image data to be recognized; the processing unit 60: carrying out gray level conversion on the face image, and carrying out length and width size correction on the face image after gray level conversion to adjust the length and width of the face image to a preset size; the marking unit 70: and segmenting the face image according to a preset face part rule, and marking the segmented parts.

Fig. 5 is a schematic structural diagram of a system for recognizing facial emotion of a convolutional neural network based on an attention mechanism according to a fourth embodiment of the present invention, as shown in fig. 5, the recognition unit 40 includes: the division unit 401: inputting the segmented and marked face image data into a trained convolutional neural network model; third calculation unit 402: calculating the marked parts, and integrating the calculation results of all the parts to obtain scores of the lower emotional characteristics under different emotions; the sorting unit 403: and sorting the scores, wherein the emotion corresponding to the highest score is the facial emotion characteristic data of the face image.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, or portions or all or portions of the technical solutions that contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. The convolutional neural network facial emotion recognition method based on the attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

collecting face image data, and dividing the face image data into a training set and a test set according to a proportion;

inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data;

testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model;

and inputting the face image data to be recognized into the optimized convolutional neural network model to obtain an emotion classification result.

2. The attention-based convolutional neural network facial emotion recognition method of claim 1, wherein: the method comprises the following steps of inputting face image data in a training set into a convolutional neural network model for feature learning to obtain face emotion feature data, and specifically comprises the following steps:

setting each weight value and threshold value to be small random values close to 0, and initializing an accuracy control parameter eps =1e-15 and a learning rate of 0.05;

taking an input mode from the training set and adding the input mode to the network, and performing point multiplication operation on the input mode and the weight matrix of each layer so as to calculate the output of each layer and give an output vector of the output;

comparing elements in the output vector with elements in the target vector, and calculating residual errors, weights and thresholds of all layers;

and saving the model when the average accuracy of the model meets 95 percent.

3. The attention mechanism-based convolutional neural network facial emotion recognition method of claim 1, wherein: the method for inputting the face image data to be recognized into the optimized convolutional neural network model comprises the following steps of:

collecting human face image data to be recognized;

carrying out gray level conversion on the face image, and correcting the length and width of the face image after the gray level conversion to adjust the length and width of the face image to a preset size;

and segmenting the face image according to a preset face part rule, and marking the segmented parts.

4. The attention-based convolutional neural network facial emotion recognition method of claim 3, wherein: the method for obtaining the facial emotion feature data of the face image by inputting the face image data to be recognized into the optimized convolutional neural network model specifically comprises the following steps:

inputting the segmented and marked face image data into a trained convolutional neural network model;

calculating the marked parts, and integrating the calculation results of all the parts to obtain scores of the lower emotional characteristics under different emotions;

and sorting the scores, wherein the emotion corresponding to the highest score is the emotion classification result of the face image data to be recognized.

5. The attention mechanism-based convolutional neural network facial emotion recognition method of claim 1, wherein: the convolutional neural network model comprises 2 convolutional layers comprising convolutional layer 1 and convolutional layer 2; the 2 pooling layers include a pooling layer 1 and a pooling layer 2; the 2 fully connected layers include fully connected layer 1 and fully connected layer 2 and 1 Softmax layer.

6. The attention-based convolutional neural network facial emotion recognition method of claim 5, wherein: the convolution kernel number of the convolution layer 1 is set to be 16, the size of a convolution kernel template is 3 x 3, a bias layer, an activation function layer and a pooling layer 1 are connected behind the convolution layer 1, and the size of the pooling kernel template of the pooling layer 1 is set to be 2 x 2; the number of convolution kernels of the convolution layer 2 is set to 32, and the size of the convolution kernels is 3 x 3; the pooling core size of the pooling layer 2 is set to 2 × 2; the feature vector dimension of the fully connected layer 1 is set to 2048, and the feature vector dimension of the fully connected layer 2 is set to 512; the feature vector dimension of the Softmax layer is of type 7.

7. Convolutional neural network facial emotion recognition system based on attention mechanism, its characterized in that: the method comprises the following steps:

a sample collection unit: collecting face image data, and dividing the face image data into a training set and a test set according to a proportion;

a convolution calculation unit: inputting the face image data in the training set into a convolutional neural network model for feature learning to obtain face emotion feature data;

an optimization unit: testing a convolutional neural network model in a test set, and optimizing parameters of the convolutional neural network model;

an identification unit: and inputting the face image data to be recognized into the optimized convolutional neural network model to obtain an emotion classification result.

8. The attention-based convolutional neural network facial emotion recognition system of claim 7, wherein: the convolution calculating unit specifically includes:

an initialization unit; setting each weight value and threshold value to be small random values close to 0, and initializing an accuracy control parameter eps =1e-15 and a learning rate of 0.05;

the first calculation unit: taking an input mode from the training set and adding the input mode to the network, and performing point multiplication operation on the input mode and the weight matrix of each layer so as to calculate the output of each layer and give an output vector of the output;

a second calculation unit: comparing elements in the output vector with elements in the target vector, and calculating residual errors, weights and thresholds of all layers;

a storage unit: and saving the model when the average accuracy of the model meets 95%.

9. The attention mechanism-based convolutional neural network facial emotion recognition system of claim 7, wherein: the method comprises the following steps:

a collecting unit: collecting human face image data to be recognized;

a processing unit: carrying out gray level conversion on the face image, and correcting the length and width of the face image after the gray level conversion to adjust the length and width of the face image to a preset size;

a marking unit: and segmenting the face image according to a preset face part rule, and marking the segmented parts.

10. The attention mechanism-based convolutional neural network facial emotion recognition system of claim 9, wherein: the identification unit includes:

a dividing unit: inputting the segmented and marked face image data into a trained convolutional neural network model;

a third calculation unit: calculating the marked parts, and integrating the calculation results of all the parts to obtain scores of the lower emotional characteristics under different emotions;

a sorting unit: and sorting the scores, wherein the emotion corresponding to the highest score is the emotion classification result of the face image data to be recognized.