CN107292256B - Auxiliary task-based deep convolution wavelet neural network expression recognition method - Google Patents

Auxiliary task-based deep convolution wavelet neural network expression recognition method Download PDF

Info

Publication number
CN107292256B
CN107292256B CN201710446076.0A CN201710446076A CN107292256B CN 107292256 B CN107292256 B CN 107292256B CN 201710446076 A CN201710446076 A CN 201710446076A CN 107292256 B CN107292256 B CN 107292256B
Authority
CN
China
Prior art keywords
layer
convolution
expression
network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710446076.0A
Other languages
Chinese (zh)
Other versions
CN107292256A (en
Inventor
白静
陈科雯
张景森
焦李成
缑水平
张向荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Electronic Science and Technology
Original Assignee
Xian University of Electronic Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Electronic Science and Technology filed Critical Xian University of Electronic Science and Technology
Priority to CN201710446076.0A priority Critical patent/CN107292256B/en
Publication of CN107292256A publication Critical patent/CN107292256A/en
Application granted granted Critical
Publication of CN107292256B publication Critical patent/CN107292256B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a depth convolution wavelet neural network expression recognition method based on an auxiliary task, and solves the problems that expression features cannot be efficiently learned by an existing feature selection operator, and more image expression information classification features cannot be extracted. The invention is realized as follows: building a deep convolution wavelet neural network; establishing a facial expression set and a corresponding expression sensitive area image set; inputting a facial expression image to a network; training a deep convolution wavelet neural network; network error back propagation; updating each convolution kernel and offset vector of the network; inputting an expression sensitive area image to a trained network; learning the weighted proportion of the auxiliary task; obtaining a network global classification label; and counting the recognition accuracy according to the global label. The method gives consideration to the abstract and detail information of the expression image, enhances the influence of the expression sensitive area in the expression feature learning, obviously improves the accuracy of expression recognition, and can be applied to the expression recognition of the facial expression image.

Description

Auxiliary task-based deep convolution wavelet neural network expression recognition method
Technical Field
The invention belongs to the technical field of image processing, mainly relates to computer vision identification, and particularly relates to a depth convolution wavelet neural network expression identification method based on an auxiliary task. The method can be applied to learning and classifying the expression characteristics in the facial expression recognition.
Background
Facial expression recognition is a leading technology in the field of image processing and computer vision. The method is a key step from image processing to image analysis, and the quality of a segmentation result directly influences subsequent image analysis, understanding, solving and the like. The purpose of facial expression recognition is to research a coding model of facial expression, learn and extract a characteristic expression mode of the facial expression, and realize automatic synthesis, tracking and recognition of the facial expression by a computer.
Currently, the technical research on the recognition of facial expressions mainly focuses on two aspects of feature extraction and classification algorithms. The human face expression recognition method based on the deep learning network is used by researchers in recent years, particularly, a deep convolution neural network which is good at processing two-dimensional images in the deep learning network is applied to the expression recognition field by the researchers, but the deep convolution neural network focuses on abstract mapping of images from a low layer to a high layer in a general sense so as to obtain a high-level feature expression mode, and texture and detail information of the expression images are ignored when the high-level feature expression mode is obtained. Moreover, the commonly used deep network is generally a single-task deep network, and the main contribution of the expression sensitive area to the feature expression cannot be effectively highlighted when the features of the expression are learned.
In the existing expression recognition technology, a method of firstly selecting features and then classifying is mainly adopted, but in the feature selection step, the existing feature selection operator cannot efficiently learn expression features, so that the subsequent classification cannot obtain an ideal result. In addition, luyadan et al adopt a deep self-coding network as a classifier, and do not avoid the step of feature selection, so that the final classification effect is not greatly improved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a depth convolution wavelet neural network expression recognition method based on an auxiliary task.
The invention relates to a depth convolution wavelet neural network expression recognition method based on an auxiliary task, which is characterized by comprising the following steps of:
(1) building a depth convolution wavelet network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer; initializing a bias weight matrix of the network convolution layer into a 0 matrix, wherein a Sigmoid function is selected as an activation function of the network;
(2) establishing a facial expression image set and an expression sensitive area image set, wherein the expression sensitive area image set is obtained by cutting eyebrow parts and mouth parts of the facial expression image set, a part of images in the facial expression image data set are used as a training image set of a network, and the rest of images are used as a testing image set;
(3) inputting a training image into a deep convolution wavelet network, wherein the size of the input image is 96 × 96;
(4) the first layer of the deep convolution wavelet network is a convolution layer, the convolution layer performs convolution operation on each input facial expression training image, and the number of selected convolution kernels is Q1Convolution kernel size 7 × 7:
(4a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(4b) each convolution kernel performs convolution operation on the human face expression image to obtain Q1The feature map size of each convolution kernel is 90 x 90;
(5) the second layer of the network is a pooling layer of Q's obtained from the previous layer of convolutional layers1Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer1The size of the pooled feature map is 45 × 45;
(6) the third layer of the network is a convolution layer, and Q obtained by the previous layer of the pooling layer1Taking the characteristic graph as input, performing convolution operation, wherein the number of convolution kernels selected by the convolution layer is Q2Convolution kernel size 6 × 6:
(6a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ]; (ii) a
(6b) Each convolution kernel is on the Q1Performing convolution operation on the feature map, and then performing convolution on the Q1Characteristic diagramThe convolution result and the bias matrix are subjected to average evaluation after the activation function filtering to obtain a feature map of the convolution kernel, and the feature map size of each convolution kernel is 40 x 40;
(7) the fourth layer of the network is a pooling layer that pools Q from the previous convolutional layer2Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer2The size of the pooled feature map is 20 x 20;
(8) the fifth layer of the network is a convolution layer, and Q obtained by the previous layer of the stratification layer is added2Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q3Convolution kernel size 5 × 5:
(8a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(8b) each convolution kernel is on the Q2Performing convolution operation on the feature map, and then performing convolution on the Q2The convolution result of each characteristic map and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic map of the convolution kernel, and the size of each characteristic map is 16 x 16;
(9) the sixth layer of the network is a wavelet pooling layer that is a Q value obtained from the previous convolutional layer3Taking the characteristic graph as an input, and performing one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band;
(10) the seventh layer of the network is a full connection layer, and Q obtained by pooling wavelets of the sixth layer of the network is used as a Q value38 x 8 low frequency subbands and Q3Taking 8-by-8 high-frequency sub-bands as input to form a 128-dimensional full-connection layer feature vector;
(11) repeating the steps (3) to (10) by taking n randomly selected facial expression images as a unit to obtain respective 128-dimensional feature vectors of the n images;
(12) the eighth layer of the network is a Softmax output layer, n 128-dimensional feature vectors are obtained and used as input, a probability distribution Softmax classifier with 7 types of output is trained, and classification labels are obtained;
(13) error calculation is carried out on the classification label and the real label of the Softmax output layer, and a weight matrix is updated once according to a BP back propagation algorithm;
(14) repeating the training steps (3) to (13) until the weight matrix is updated m times to obtain a trained deep convolution wavelet neural network;
(15) bringing the facial expression image test set into a trained deep convolution wavelet neural network to obtain a classification label z1 on an output layer, bringing an expression sensitive region image set corresponding to the test data set into the trained deep convolution wavelet neural network to obtain a classification label z2 on the output layer, and obtaining a final classification label by using the two classification labels according to a mode of z3 ═ z1+ λ × z2, wherein λ represents the weighted proportion of an auxiliary task;
(16) and outputting the facial expression recognition accuracy according to the classification label z3 of the test set, and completing the auxiliary task-based deep convolution wavelet neural network facial expression recognition.
According to the method, the expression characteristics are learned by using the subtask deep convolution wavelet neural network, and characteristic selection is not needed, so that the abstract and local detail information of the facial expression can be well learned, the influence of an expression sensitive area on network expression characteristic extraction is improved, and the accuracy of the facial expression recognition result is obviously improved.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention gives consideration to the special characterization capability of the expression sensitive area in the learning of the expression characteristics of the deep convolutional neural network, firstly, a main task learning DCNN is trained to obtain a shared characteristic weight matrix, then, local images of the eyebrow postures and the mouth postures of the eyes and eyebrows of the expression sensitive area are merged to be used as an auxiliary task estimation branch task, a classification result of auxiliary task estimation is obtained by mapping the shared characteristic weight matrix, and finally, the classification performance of the main task learning is optimized by the auxiliary task estimation classification result, so that the generalization capability of the deep convolutional network in the expression recognition is improved;
secondly, because the invention avoids the defects that the characteristic learned by the upper convolutional layer of the part of the common convolutional neural network can be lost due to simple down-sampling operation of the pooling layer in the convolutional neural network and the local characteristic of a plurality of shallow layers is lost because the output of the full-link layer only contains abstract information, and combines the multi-scale wavelet transform and the deep convolutional neural network architecture, the network not only ensures that the characteristic learned by the convolutional layer can effectively carry out complete characteristic transmission in the pooling layer, but also can expand the expression local characteristic obtained during shallow layer learning in the full-link layer, thereby leading the whole network structure to describe the expression characteristic more optimally and obviously improving the recognition result.
Description of the figures
FIG. 1 is a portion of an image in a raw database as employed by the present invention;
FIG. 2 is a block flow diagram of the present invention;
FIG. 3 is a schematic diagram of the network structure of the present invention, wherein FIG. 3(a) is a structural diagram of the deep convolution wavelet neural network of the present invention, and FIG. 3(b) is a structural diagram of the deep convolution wavelet neural network of the subtask of the present invention;
fig. 4 is a portion of an expression sensitive area image of the present invention.
Detailed Description
The invention is described in detail below with reference to the attached drawing figures:
example 1
The facial expression recognition is an indispensable component in machine learning research, has very wide application value in the society where human-computer interaction is continuously popularized at present, and automatically recognizes facial expressions of human faces in real time in human-computer interfaces such as mobile terminals and personal computers; in some cases, the realized facial expressions are retrieved from the video, tracked and identified. The breakthrough of the facial expression recognition method also has great reference significance to the fields of intelligent calculation and brain-like research.
In the existing expression recognition technology, a method of firstly selecting features and then classifying is mainly adopted, but in the feature selection step, the existing feature selection operator cannot efficiently learn expression features, so that the subsequent classification cannot obtain an ideal result. In addition, the method of adopting the deep network as the classifier does not avoid feature selection, so that the classification effect is improved to a limited extent.
The invention develops research and exploration aiming at the current situation, provides a depth convolution wavelet neural network expression recognition method based on an auxiliary task, and referring to fig. 2, the invention realizes facial expression recognition, and comprises the following steps:
(1) building a depth convolution wavelet network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer; the bias weight matrix of the network convolution layer is initialized to be 0 matrix, and the Sigmoid function is selected as the activation function of the network. The deep convolution wavelet neural network built by the invention comprises the following components from an input layer to an output layer in sequence: the deep convolutional wavelet neural network comprises an input layer, a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a multi-scale transformation layer, a full connection layer and a softmax output layer, wherein the multi-scale transformation layer is a wavelet pooling layer and integrally forms the deep convolutional wavelet neural network.
(2) And establishing a facial expression image set and an expression sensitive area image set, wherein the expression sensitive area image set is obtained by cutting eyebrow parts and mouth parts of the facial expression image set, a part of images in the facial expression image data set are used as a training image set of a network, and the rest images are used as a test image set. For example, the facial expression image data set in this example has 20000 samples, of which 15000 images are used as the training image set, and the remaining 5000 images are used as the training image set, and the number of the expression sensitive area image sets is corresponding to that of the facial expression image data set.
(3) A training image is input into a deep convolutional wavelet network, the size of the input image being 96 × 96. In the embodiment, the training image is directly input into the network, and other image preprocessing is not needed, such as removing the influence of complex background or illumination, and the like, so that the program and the process of image recognition are simplified.
(4) The first layer of the deep convolution wavelet network is a convolution layer, the convolution layer performs convolution operation on each input facial expression training image, and the number of selected convolution kernels is Q1The convolution kernel size is 7 × 7. The number of convolution kernels, in this example Q, is selected according to the computing environment and the software and hardware conditions1Take 4.
(4a) And adopting a random initialization method to configure the weight value of the convolution kernel as a near zero number between [ -0.5 and 0.5 ]. The initial weight of the convolution kernel is near zero in the invention, so as to accelerate the convergence speed of the network.
(4b) Each convolution kernel performs convolution operation on the human face expression image to obtain Q1And (4) the feature map size of each convolution kernel is 90 x 90. The characteristic graph size of the convolution kernel in the present invention is determined by the convolution kernel size.
(4c) Initially setting a bias weight matrix of the convolutional layer as a 0 matrix; in this example, the bias weight matrix is a one-dimensional vector, the dimensions and the number Q of convolution kernels1The same is true.
(4d) The activation function of the network is a Sigmoid function. The Sigmoid function formula in the invention is as follows:
wherein f (x) is the activation value of the function, x is the input of the activation function, x in the network represents the value of the convolution result added with the bias weight, and e is the natural logarithm.
(5) The second layer of the network is a pooling layer of Q from the previous, first, convolutional layer1Taking the feature map as an input of the pooling layer, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer1And (4) obtaining the pooled feature maps with the size of 45 × 45.
(6) The third layer of the network is a convolution layer, and Q obtained by the previous layer of the pooling layer1Taking the characteristic graph as input, performing convolution operation, wherein the number of convolution kernels selected by the convolution layer is Q2The convolution kernel size is 6 x 6. Number of convolution kernel selections Q in this example2Was taken as 6.
(6a) Adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(6b) each convolution kernel is on the Q1Performing convolution operation on the feature map, and then performing convolution on the Q1The convolution result of the characteristic graphs and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic graphs of the convolution kernels, and the characteristic graph size of each convolution kernel is 40 x 40;
(6c) the bias weight matrix for the convolutional layer is initially set to a 0 matrix. In this example, the bias weight matrix is a one-dimensional vector, the dimensions and the number Q of convolution kernels2The same is true.
(6d) The activation function of the network is a Sigmoid function.
(7) The fourth layer of the network is a pooling layer that pools Q from the previous convolutional layer2Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer2The size of the pooled feature map is 20 x 20;
(8) the fifth layer of the network is a convolution layer, and Q obtained by the previous layer of the stratification layer is added2Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q3The convolution kernel size is 5 x 5. Number of convolution kernel selections Q in this example3Taken as 12.
(8a) Adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(8b) each convolution kernel is on the Q2Performing convolution operation on the feature map, and then performing convolution on the Q2The convolution result of each characteristic map and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic map of the convolution kernel, and the size of each characteristic map is 16 x 16;
(8c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(8d) the activation function of the network is a Sigmoid function.
(9) The sixth layer of the network is a wavelet pooling layer that is a Q value obtained from the previous convolutional layer3Taking the characteristic graph as an input, and performing one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band.
(10) The seventh layer of the network is a full connection layer, and Q obtained by pooling wavelets of the sixth layer of the network is used as a Q value38 x 8 low frequency subbands and Q3The 8 x 8 high frequency subbands are used as input to form a 128-dimensional fully connected layer feature vector.
(11) And (5) repeating the steps (3) to (10) by taking the randomly selected n facial expression images as a unit to obtain respective 128-dimensional feature vectors of the n images.
(12) And the eighth layer of the network is a Softmax output layer, the obtained n 128-dimensional feature vectors are used as input, a probability distribution Softmax classifier with 7 types of output is trained, and classification labels are obtained.
(13) And carrying out error calculation on the classification label of the Softmax output layer and the real label, and updating the weight matrix once according to a BP back propagation algorithm. The updated weight matrix in this example includes the values of the convolution kernel and the values of the bias weight vector.
(14) And (5) repeating the training steps (3) to (13) until the weight matrix is updated m times. In the invention, m is the updating times and is determined by the image scale and the convergence rate of the network, so that the trained deep convolution wavelet neural network is obtained.
(15) And substituting the facial expression image test set into the trained deep convolution wavelet neural network to obtain a classification label z1 on an output layer, substituting the expression sensitive region image set corresponding to the test data set into the trained deep convolution wavelet neural network to obtain a classification label z2 on the output layer, and obtaining a final classification label by using the two classification labels according to a mode of z 3-z 1+ lambda-z 2, wherein lambda represents the weighted proportion of the auxiliary task.
(16) And outputting the facial expression recognition accuracy according to the classification label z3 of the test set, and completing the auxiliary task-based deep convolution wavelet neural network facial expression recognition.
The method gives consideration to the special characterization capability of the expression sensitive area in the learning of the expression characteristics of the deep convolutional neural network, firstly trains a main task learning DCNN to obtain a shared characteristic weight matrix, then merges partial images of the eyebrow postures and the mouth postures of the eyes and eyebrows of the expression sensitive area to serve as an auxiliary task estimation branch task, obtains a classification result of auxiliary task estimation through mapping of the shared characteristic weight matrix, and finally optimizes the classification performance of the main task learning through the auxiliary task estimation classification result to improve the generalization capability of the deep convolutional network in the expression recognition.
Example 2
The auxiliary task-based deep convolution wavelet neural network expression recognition method is the same as that in the embodiment 1, the facial expression image set and the expression sensitive area image set are established in the step (2), and the method is carried out according to the following steps:
2.1 the facial expression image set is obtained as follows:
selecting a proper number of original images with labels from a JAFFE expression image library at random, wherein the images in the JAFFE expression image library adopted by the invention have 213 images in total as shown in figure 1, and the images comprise seven types of expressions: anger, heart injury, happiness, calmness, aversion, surprise and fear. The original image size is 256 × 256, and referring to fig. 1, images of partially different expressions of four persons are listed in fig. 1, the first row represents angry expressions, the second row represents aversive expressions, the third row represents startle expressions, the fourth row represents happy expressions, and the fifth row represents calm expressions. The original image is expanded by turning, rotating and selecting image blocks by the sliding frame, the image is firstly turned, then the image is rotated by a plurality of small angles, and finally the expression image is selected by sliding up and down by the sliding frame by taking the center of the image as a base point. The method combines haar characteristics with Adaboost algorithm to identify the face area of the expanded image and zoom the face expression image, and finally obtains a face expression image set of tens of thousands of samples.
2.2 expression sensitive region image sets were obtained as follows:
the expression sensitive area refers to areas of several parts sensitive to expressions in the human face area, including the eye eyebrow area and the mouth area; and (3) cutting the facial expression image set obtained in the step (2.1), obtaining two left and right eyebrow and eye image blocks and obtaining a mouth position image block by adopting a proper cutting frame, splicing the three image blocks to obtain an expression sensitive area image, and finally obtaining the expression sensitive area image set of the same tens of thousands of samples. Referring to fig. 4, the sensitive region images of seven expressions of one person are listed in fig. 4.
2.3, a label file of the facial expression image set is manufactured according to an original label of a JAFFE expression image library, a label of a single image is a 1 x k-dimensional binary vector, a k-dimensional expression category is divided into k types, k is 2, 3, 4, 5, 6, and the value of k is determined according to the requirement of an actual expression classification problem. The dimension with the label vector of 1 represents that the image belongs to the expression category represented by the dimension, the values of other dimensions are 0, for example, the first dimension represents a happy expression category in the category 5 expression category, and the label vector of a single image is [1, 0, 0, 0, 0] if the single image is a happy image. In the invention, the expression image data set and the sensitive area image data set are mutually corresponding, so that the label file can be shared.
Example 3
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is the same as that in embodiment 1-2, the wavelet pooling layer in step (9) obtains a low-frequency sub-band and a high-frequency sub-band, referring to fig. 3(a), and a conventional downsampling pooling layer is modified into the wavelet pooling layer in fig. 3(a), so that information loss caused by simple downsampling is avoided, high-frequency information is retained, and local information of expression features is enhanced. The method comprises the following steps:
9.1, performing one-layer downsampling wavelet decomposition on the feature map obtained by the previous layer of convolutional layer, wherein the selected wavelet basis function is a Haar function, and each feature map is subjected to one-layer downsampling wavelet decomposition to obtain a low-frequency sub-band, a horizontal high-frequency sub-band, a vertical high-frequency sub-band and a high-frequency sub-band comprising the horizontal direction and the vertical direction. The number of layers of the wavelet decomposition in the invention can be determined according to the requirement of the network on the size in practical application.
9.2 fuse the three high frequency subbands into a new high frequency subband according to the following formula:
xWH=Maxf(0,xHH,xHL,xLH)
wherein x isHH,xHL,xLHRepresenting three high-frequency subbands, x, obtained by a wavelet decompositionWHRepresenting the fused high-frequency sub-band, and defining a function Maxf (A, B) to represent that the corresponding positions of the matrix A and the matrix B take larger values;
and 9.3, taking the obtained low-frequency sub-band and the fused high-frequency sub-band as the input of the next full-connection layer.
The wavelet pooling layer of the invention avoids the defect that information is lost due to simple down-sampling operation of the pooling layer in a general convolutional neural network, replaces pooling results with low-frequency sub-bands with less wavelet transform information loss, and inputs high-frequency sub-bands containing detailed information into the full-connection layer together, so that the feature vector of the full-connection layer is expanded in multiple channels, and the distinguishability of the feature vector is enhanced.
Example 4
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is performed as in embodiments 1-3, and the full-connected layer feature vector in step (10) is performed according to the following steps:
10.1, solving a low-frequency subband matrix according to the following formula;
xL=Maxf(0,W1·xLL1+W2·xLL2+W3·xLL3+……+Wn·xLLn)
wherein x isLRepresenting a global low frequency subband matrix, xLLnShowing each characteristic diagramOne layer of wavelet decomposed low frequency sub-band, WnRepresenting the superposition weights of the low frequency subbands of the respective profiles. The superposition weight W in the inventionnThe determination can be based on empirical values, or other learning manners can be designed.
10.2 solving the high-frequency subband matrix according to the following formula:
xH=Maxf(0,xWH1,xWH2,…xWHn)
wherein x isHRepresenting a global high-frequency subband matrix, xWHnRepresenting a new high-frequency sub-band formed by fusing three high-frequency sub-bands of one-layer wavelet decomposition of each characteristic diagram;
10.3 Global Low frequency subband xLAnd a global high frequency subband xHAnd stretching the sub-band matrix into a vector with the dimension of 1 x v according to rows, connecting the sub-band matrix end to obtain the characteristic vector of the fully-connected layer, wherein the size of the vector is 1 x 2v, and the value of v is obtained by multiplying the values of the length and the width of the sub-band matrix. The feature vector of the fully-connected layer in this example is formed by stretching the low-frequency and high-frequency subbands in rows and splicing the low-frequency and high-frequency subbands end to end, and specifically is 1 × 128-dimensional, wherein the vector size of the stretching of the low-frequency subbands and the high-frequency subbands in rows is 1 × 64-dimensional.
Example 5
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is the same as that in the embodiment 1-4, the auxiliary task weighting specific gravity lambda is described in the step (15), referring to fig. 3(b), the auxiliary task correction is added to the trained deep convolution wavelet neural network in the step (b) of fig. 3, the weighting specific gravity lambda is obtained by learning in the network by using the sensitive region image set, and the method is carried out according to the following steps:
15.1 initializing λ to 0, and randomly selecting M personal face expression images and corresponding sensitive area images as learning samples of a weight λ;
15.2 for the trained deep convolution wavelet neural network, the learning sample is brought into the network to obtain the classification label according to the following formula:
z3=z1+λz2
wherein z is1Output label z representing the network brought by the facial expression image2Representing images of corresponding sensitive areas brought into the networkOutput tag, z3Representing a network global tag;
15.3 according to Global Label z3The magnitude of the error with the true tag, the value of λ is updated as follows:
λ=λ+▽λ
wherein ≧ λ is 0.05, the λ value corresponding to the minimum label error of each learning sample is counted. The error of the global label and the real label in this example is determined according to the numerical difference of the dimensionalities of the expression classes.
15.4, the expected value of the lambda value corresponding to the minimum label error of the M learning samples is obtained, and the expected value lambda is used as the global auxiliary task weighting specific gravity lambda value. The expected value calculation in this example can be obtained directly by averaging.
A more detailed example is given below to further illustrate the invention
Example 6
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is the same as the embodiment 1-5, and the specific steps are as follows with reference to the attached figure 3:
step 1: establishment of facial expression image set
200 original images with labels are randomly selected from a JAFFE expression database containing 213 images, and the size of the selected original image is 256 × 256 as shown in FIG. 1. Then 200 original images are expanded into 400 images which are 2 times of the original images through left-right turning, then 10 times of the expansion of 4000 images are obtained through left-right rotation of the images by 1 degree, 2 degrees, 3 degrees, 4 degrees and 5 degrees, finally, a 128-128 rectangular frame is adopted, the coordinates of upper and lower 5 pixel points are cut in a sliding mode by taking the center of the images as a base point, then, a method of combining haar features and an Adaboost algorithm is adopted to identify a face area and zoom the face area into an experimental face image with the size of 96-96, finally, a face expression image set with 40000 sample numbers is shared, a corresponding label file is made, the label of a single image is a binary vector with the dimension of 1-7, the dimension with the value of 1 represents that the image belongs to the expression classification represented by the dimension, and the other dimensions are 0.
Step 2: establishment of expression sensitive area image set
The expression-sensitive image in the invention refers to the regions of several parts sensitive to expression in the human face region, including the eyebrow region and mouth region of the eyes, as shown in fig. 4. And (3) cutting the face region image obtained in the step (1), obtaining two eyebrow and eye position image blocks by adopting a 48-by-48 cutting frame and obtaining a mouth position image block by adopting a 48-by-96 cutting frame, splicing the three image blocks to obtain an expression sensitive region image, and finally sharing the expression sensitive region image set with 40000 samples, wherein the label file can be shared with the image obtained in the step (1).
And step 3: network training
(1) Building a depth network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer;
(2) inputting an expression image into a depth network, wherein the size of the input image is 96 × 96;
(3) the first layer of the network is a convolution layer, the convolution layer performs convolution operation on each expression original image, the number of selected convolution kernels is 6, and the size of each convolution kernel is 7 x 7:
(3a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(3b) performing convolution operation on the human face expression image by each convolution kernel to obtain 6 feature graphs after convolution, wherein the feature graph size of each convolution kernel is 90 x 90;
(3c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(3d) the activation function of the network is a Sigmoid function;
(4) the second layer of the network is a pooling layer, which takes the 6 feature maps obtained from the previous convolutional layer as input and performs pooling operations:
the pooling layer adopts a pooling method that the maximum value is selected in non-overlapping 2 x 2 areas to obtain 6 characteristic maps of the pooling layer, and the size of the characteristic maps is 45 x 45;
(5) the third layer of the network is a convolutional layer, 6 feature maps obtained by the previous pooling layer are used as input, convolution operation is carried out, the number of convolution kernels selected by the convolutional layer is 12, and the sizes of the convolution kernels are 6 x 6:
(5a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ]; (ii) a
(5b) Each convolution kernel performs convolution operation on the 6 characteristic graphs, then average evaluation is performed on the convolution results of the 6 characteristic graphs and a bias matrix after filtering of an activation function, the characteristic graph of the convolution kernel is obtained, and the characteristic graph size of each convolution kernel is 40 x 40;
(5c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(5d) the activation function of the network is a Sigmoid function.
(6) The fourth layer of the network is a pooling layer which takes the 12 feature maps obtained from the previous convolutional layer as input and performs pooling operations:
the pooling layer was created by selecting the maximum in non-overlapping 2 x 2 regions to yield 12 signatures of the pooling layer with a size of 20 x 20.
(7) The fifth layer of the network is a convolutional layer, 12 feature maps obtained by the previous pooling layer are used as input to carry out convolution operation, the number of convolution kernels selected by the convolutional layer is 12, and the size of the convolutional layer is 5 x 5:
(7a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(7b) each convolution kernel performs convolution operation on the 12 characteristic graphs, then average evaluation is performed on the convolution results of the 12 characteristic graphs and a bias matrix after filtering an activation function, and the characteristic graph of the convolution kernel is obtained, wherein the size of each characteristic graph is 16 x 16;
(7c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(7d) the activation function of the network is a Sigmoid function.
(8) The sixth layer of the network is a wavelet pooling layer, which takes 12 characteristic maps obtained from the previous convolutional layer as input and performs one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band.
(9) The seventh layer of the network is a full connection layer, and 128 × 8 low-frequency subbands and 128 × 8 high-frequency subbands obtained by the previous wavelet transform layer are used as input to form a 128-dimensional full connection layer feature vector. In the invention, the full connection layer firstly carries out corresponding position calculation on 12 8-8 low-frequency sub-bands to obtain the maximum value, then the maximum value is stretched into a vector with the dimension of 1-64 according to lines, the high-frequency sub-bands obtain another vector with the dimension of 1-64 according to the same operation, and the two vectors are connected end to end according to the sequence of the low-frequency sub-band vector and the high-frequency sub-band vector to obtain a global vector with the dimension of 1-128.
(10) And (4) repeating the steps (2) to (9) by taking the randomly selected 50 expression images as a unit to obtain respective 128-dimensional feature vectors of the 50 images.
(11) The eighth layer of the network is a Softmax output layer, the obtained 50 128-dimensional feature vectors are used as input, a Softmax classifier with 7-class probability distribution output is trained, and classification labels are obtained;
(12) and performing error calculation on the classification label and the real label of the Softmax output layer, and updating the value of the convolution kernel and the value of the bias weight vector of each layer according to a BP back propagation algorithm. The weight updating learning step length of the deep convolution wavelet neural network is set to be 0.05.
(13) And (5) repeating the training steps (2) to (12) until the weight matrix is updated 200 times. The setting of the updating times of the weight in the network training can be determined according to the convergence speed of the network.
And 4, step 4: learning auxiliary tasks
And (3) bringing the facial expression test data set into the trained network to obtain a classification label z1, bringing the expression sensitive area corresponding to the test data set into the trained network to obtain a classification label z2, obtaining a final classification label by using the two classification labels according to a mode of z 3-z 1+ 0.65-z 2, and calculating z3 for the whole test data set.
And 5: statistics of recognition results
And calculating the correct rate of correct recognition according to z3 in step 4.
The invention avoids the defects that the characteristics learned by the convolutional layer on the upper layer of the general convolutional neural network can be lost due to simple down-sampling operation of the pooling layer in the general convolutional neural network and the output of the full-link layer only contains abstract information but lacks a plurality of shallow local characteristics, and combines the multi-scale wavelet transform and the deep convolutional neural network architecture, so that the network not only ensures that the characteristics learned by the convolutional layer can effectively carry out complete characteristic transmission on the pooling layer, but also can expand the expression local characteristics obtained during shallow layer learning in the full-link layer, thereby leading the whole network structure to describe the expression characteristics more optimally and obviously improving the recognition result.
The technical effects of the invention are verified and explained by the simulation results as follows:
example 7
The auxiliary task-based deep convolution wavelet neural network expression recognition method is similar to that of the embodiment 1-6, and the effect of the method is further analyzed by combining the recognition results of the attached table 1.
Simulation experiment conditions
The hardware test platform of the invention is: the processor is an Inter Core CPU i3, the main frequency is 3.20GHz, the memory is 4G, and the software platform is as follows: windows 7 flagship edition 64-bit operating system and Matlab R2013 b. The input images of the inventive network were all 96 × 96 in size and in TIFF format.
Emulated content
The simulation content of the invention comprises: simulation experiments and recognition result statistics of the existing facial expression recognition technology; under the condition of no additional wavelet pooling layer and no auxiliary task learning, a six-layer deep convolutional neural network is simply used for carrying out simulation experiments of facial expression recognition and recognition result statistics; the auxiliary task-based deep convolution wavelet neural network expression recognition method is completely used for experimental simulation and recognition result statistics; and comparing and analyzing the simulation results of each experiment.
Analysis of simulation results
Table 1 shows the comparison between the recognition effect of the method of the present invention and the recognition effect of the existing facial expression recognition technology. Referring to the data in table 1, Shan C and Jabid T can know that in the method of dividing the image into several sub-regions, each sub-region is multiplied by a weight according to the level of its contribution to the expression, and the weight represents the amount of the expression characterization ability of the region. Tasked et al initializes the weight by x ^2 distribution, and obtains a result with an average recognition rate of 85.4% by using a new local face descriptor using a Local Direction Pattern (LDP) and an architecture of LDP + SVM algorithm. Furthermore, shishirr et al use Gabor's feature in combination with Learning Vectorization (LVQ) to perform Gabor filtering on 34 image fiducials on an imaging interface with the 34 fiducials, resulting in an identification rate of 87.51%. Nectarios et al propose an algorithm based on Gabor combined with Log-Gabor filter convolution to obtain feature vectors, resulting in 86.1% and 85.72% recognition rates. In the achievement of Luya Dan, Von ShiYong and the like, the algorithm of FP and the deep self-coding network is utilized to obtain the recognition rate of 90.47%, and in addition, the invention only uses a six-layer deep convolutional neural network to learn the expression characteristics, and the recognition rate of 90.56 is obtained when the algorithm of a softmax classifier is trained under the condition of no additional wavelet pooling layer and auxiliary task learning.
When the auxiliary task-based deep convolution wavelet neural network expression recognition method is used integrally, the obtained recognition accuracy is 92.91%.
TABLE 1 comparison of recognition effects of the present invention and the existing facial expression recognition methods
As can be seen from table 1, the method of the present invention can give good consideration to local and global information of the expression features of the facial expression image, and enhances the influence of the expression sensitive area on facial expression recognition through the auxiliary task, thereby improving the recognition rate of the facial expression.
In short, the auxiliary task-based deep convolution wavelet neural network expression recognition method disclosed by the invention solves the problems that expression features cannot be efficiently learned by a feature selection operator and more image expression information classification features cannot be extracted in the conventional expression recognition technology. The method comprises the following implementation steps: building a deep convolution wavelet neural network; establishing a facial expression image set and an expression sensitive area image set; inputting a facial expression image to a network; training a deep convolution wavelet neural network; network error back propagation; updating a depth convolution wavelet neural network parameter set, namely updating each convolution kernel and offset vector of the network; inputting an expression sensitive area image to a trained network; learning the weighted proportion of the auxiliary task; obtaining a network global classification label according to the weighted proportion; and counting and identifying the accuracy according to the global label. The method gives consideration to the abstract and the detailed information of the expression image, enhances the influence of the expression sensitive area in the learning of expression characteristics, obviously improves the accuracy of expression recognition, and can be applied to the expression recognition of the facial expression image.

Claims (5)

1. A depth convolution wavelet neural network expression recognition method based on auxiliary tasks is characterized by comprising the following steps:
(1) building a depth convolution wavelet network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer; initializing a bias weight matrix of the network convolution layer into a 0 matrix, wherein a Sigmoid function is selected as an activation function of the network;
(2) establishing a facial expression image set and an expression sensitive area image set, wherein the expression sensitive area image set is obtained by cutting eyebrow parts and mouth parts of the facial expression image set, a part of images in the facial expression image data set are used as a training image set of a network, and the rest of images are used as a testing image set;
(3) inputting a training image into a deep convolution wavelet network, wherein the size of the input image is 96 × 96;
(4) the first layer of the deep convolution wavelet network is a convolution layer which performs convolution operation on each input facial expression training image and selectsSelecting the number of convolution kernels as Q1Convolution kernel size 7 × 7:
(4a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(4b) each convolution kernel performs convolution operation on the human face expression image to obtain Q1The feature map size of each convolution kernel is 90 x 90;
(5) the second layer of the network is a pooling layer of Q's obtained from the previous layer of convolutional layers1Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer1The size of the pooled feature map is 45 × 45;
(6) the third layer of the network is a convolution layer, and Q obtained by the previous layer of the pooling layer1Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q2Convolution kernel size 6 × 6:
(6a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(6b) each convolution kernel is on the Q1Performing convolution operation on the feature map, and then performing convolution on the Q1The convolution result of the characteristic graphs and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic graphs of the convolution kernels, and the characteristic graph size of each convolution kernel is 40 x 40;
(7) the fourth layer of the network is a pooling layer that pools Q from the previous convolutional layer2Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer2The size of the pooled feature map is 20 x 20;
(8) the fifth layer of the network is a convolution layer, and Q obtained by the previous layer of the stratification layer is added2Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q3Ruler of convolution kernelCun 5 x 5:
(8a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(8b) each convolution kernel is on the Q2Performing convolution operation on the feature map, and then performing convolution on the Q2The convolution result of the characteristic graphs and the bias matrix are subjected to average evaluation after filtering of the activation function, so that the characteristic graphs of the convolution kernel are obtained, and the size of each characteristic graph is 16 x 16;
(9) the sixth layer of the network is a wavelet pooling layer that is a Q value obtained from the previous convolutional layer3Taking the characteristic graph as an input, and performing one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band;
(10) the seventh layer of the network is a full connection layer, and Q obtained by pooling wavelets of the sixth layer of the network is used as a Q value38 x 8 low frequency subbands and Q3Taking 8-by-8 high-frequency sub-bands as input to form a 128-dimensional full-connection layer feature vector;
(11) repeating the steps (3) to (10) by taking n randomly selected facial expression images as a unit to obtain respective 128-dimensional feature vectors of the n images;
(12) the eighth layer of the network is a Softmax output layer, n 128-dimensional feature vectors are obtained and used as input, a probability distribution Softmax classifier with 7 types of output is trained, and classification labels are obtained;
(13) error calculation is carried out on the classification label and the real label of the Softmax output layer, and a weight matrix is updated once according to a BP back propagation algorithm;
(14) repeating the training steps (3) to (13) until the weight matrix is updated m times to obtain a trained deep convolution wavelet neural network;
(15) bringing the facial expression image test set into a trained deep convolution wavelet neural network to obtain a classification label z1 on an output layer, bringing an expression sensitive region image set corresponding to the test data set into the trained deep convolution wavelet neural network to obtain a classification label z2 on the output layer, and obtaining a final classification label by using the two classification labels according to a mode of z3 ═ z1+ λ × z2, wherein λ represents the weighted proportion of an auxiliary task;
(16) and outputting the facial expression recognition accuracy according to the classification label z3 of the test set, and completing the auxiliary task-based deep convolution wavelet neural network facial expression recognition.
2. The subtask-based expression recognition method of the deep convolutional wavelet neural network, according to claim 1, wherein the step (2) of establishing the facial expression image set and the expression sensitive region image set is performed according to the following steps:
2.1 the facial expression image set is obtained as follows:
randomly selecting a proper number of original images with labels from an expression image library, expanding the original images in a mode of selecting image blocks through turning, rotating and sliding frames, identifying the face area of the expanded images by adopting a method of combining haar characteristics and Adaboost algorithm, and scaling the expanded images into face expression images with the size of 96 × 96, so as to finally obtain a face expression image set of a ten-thousand-level sample;
2.2 expression sensitive region image sets were obtained as follows:
the expression sensitive area refers to areas of several parts sensitive to expressions in the human face area, including the eye eyebrow area and the mouth area; cutting the obtained facial expression image set, obtaining two left and right eyebrow and eye part image blocks and a mouth part image block by adopting a cutting frame, splicing the three image blocks to obtain an expression sensitive area image, and finally obtaining an expression sensitive area image set of the same ten-thousand-order sample;
2.3, label files of the facial expression image set are manufactured according to original labels of the expression image library, the label of a single image is a 1 x k dimensional binary vector, k dimensions represent that the expression of the image is divided into k classes, the dimension with the label vector of 1 represents that the image belongs to the expression class represented by the dimension, the values of other dimensions are 0, and the label files of the expression image data set and the sensitive area image data set can be shared.
3. The subtask-based expression recognition method for the deep convolutional wavelet neural network as claimed in claim 1, wherein the wavelet pooling layer in step (9) obtains a low frequency subband and a high frequency subband, and the method comprises the following steps:
9.1, performing one-layer downsampling wavelet decomposition on the feature map obtained by the previous layer of convolutional layer, wherein the selected wavelet basis function is a Haar function, and each feature map obtains one low-frequency sub-band and three high-frequency sub-bands through one-layer downsampling wavelet decomposition;
9.2 fuse the three high frequency subbands into a new high frequency subband according to the following formula:
xWH=Maxf(0,xHH,xHL,xLH)
wherein x isHH,xHL,xLHRepresenting three high-frequency subbands, x, obtained by a wavelet decompositionWHRepresenting the fused high-frequency sub-band, and defining a function Maxf (A, B) as a larger value of the corresponding positions of the matrix A and the matrix B;
and 9.3, taking the obtained low-frequency sub-band and the fused high-frequency sub-band as the input of the next full-connection layer.
4. The subtask-based expression recognition method for the deep convolutional wavelet neural network as claimed in claim 1, wherein the fully-connected layer feature vector in step (10) is obtained by the following steps:
10.1, solving a low-frequency subband matrix according to the following formula;
xL=Maxf(0,W1·xLL1+W2·xLL2+W3·xLL3+……+Wn·xLLn)
wherein x isLRepresenting a global low frequency subband matrix, xLLnLow-frequency sub-bands, W, representing a wavelet decomposition of a layer of each feature mapnRepresenting the superposition weight of the low-frequency sub-band of each characteristic diagram;
10.2 solving the high-frequency subband matrix according to the following formula:
xH=Maxf(0,xWH1,xWH2,…xWHn)
wherein x isHRepresenting a global high-frequency subband matrix, xWHnRepresenting a new high-frequency sub-band formed by fusing three high-frequency sub-bands of one-layer wavelet decomposition of each characteristic diagram;
10.3 Global Low frequency subband xLAnd a global high frequency subband xHAnd stretching into a vector of 1 x v according to rows, and connecting the vectors end to obtain the characteristic vector of the fully-connected layer, wherein the size of the characteristic vector is 1 x 2 v.
5. The subtask-based deep convolutional wavelet neural network expression recognition method of claim 1, wherein the subtask weighting specific gravity λ of step (15) is performed according to the following steps:
15.1 initializing λ to 0, and randomly selecting M personal face expression images and corresponding sensitive area images as learning samples of a weight λ;
15.2 for the trained deep convolution wavelet neural network, the learning sample is brought into the network to obtain the classification label according to the following formula:
z3=z1+λz2
wherein z is1Output label z representing the network brought by the facial expression image2Output labels, z, representing images of corresponding sensitive areas brought into the network3Representing a network global tag;
15.3 according to Global Label z3The magnitude of the error with the true tag, the value of λ is updated as follows:
wherein the content of the first and second substances,counting the lambda value corresponding to the minimum label error of each learning sample;
and 15.4, calculating the expected value of the lambda value corresponding to the minimum label error of the M learning samples, wherein the expected value lambda is used as the weighted proportion lambda value of the subsequent auxiliary task.
CN201710446076.0A 2017-06-14 2017-06-14 Auxiliary task-based deep convolution wavelet neural network expression recognition method Active CN107292256B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710446076.0A CN107292256B (en) 2017-06-14 2017-06-14 Auxiliary task-based deep convolution wavelet neural network expression recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710446076.0A CN107292256B (en) 2017-06-14 2017-06-14 Auxiliary task-based deep convolution wavelet neural network expression recognition method

Publications (2)

Publication Number Publication Date
CN107292256A CN107292256A (en) 2017-10-24
CN107292256B true CN107292256B (en) 2019-12-24

Family

ID=60096459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710446076.0A Active CN107292256B (en) 2017-06-14 2017-06-14 Auxiliary task-based deep convolution wavelet neural network expression recognition method

Country Status (1)

Country Link
CN (1) CN107292256B (en)

Families Citing this family (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729872A (en) * 2017-11-02 2018-02-23 北方工业大学 Facial expression recognition method and device based on deep learning
JP6345332B1 (en) * 2017-11-21 2018-06-20 国立研究開発法人理化学研究所 Classification device, classification method, program, and information recording medium
CN107977677A (en) * 2017-11-27 2018-05-01 深圳市唯特视科技有限公司 A kind of multi-tag pixel classifications method in the reconstruction applied to extensive city
CN109840459A (en) * 2017-11-29 2019-06-04 深圳Tcl新技术有限公司 A kind of facial expression classification method, apparatus and storage medium
CN108122001B (en) * 2017-12-13 2022-03-11 北京小米移动软件有限公司 Image recognition method and device
CN108229341B (en) * 2017-12-15 2021-08-06 北京市商汤科技开发有限公司 Classification method and device, electronic equipment and computer storage medium
CN108090513A (en) * 2017-12-19 2018-05-29 天津科技大学 Multi-biological characteristic blending algorithm based on particle cluster algorithm and typical correlation fractal dimension
CN109949264A (en) * 2017-12-20 2019-06-28 深圳先进技术研究院 A kind of image quality evaluating method, equipment and storage equipment
CN108038466B (en) * 2017-12-26 2021-11-16 河海大学 Multi-channel human eye closure recognition method based on convolutional neural network
CN108171176B (en) * 2017-12-29 2020-04-24 中车工业研究院有限公司 Subway driver emotion identification method and device based on deep learning
CN108062416B (en) * 2018-01-04 2019-10-29 百度在线网络技术(北京)有限公司 Method and apparatus for generating label on map
CN108021910A (en) * 2018-01-04 2018-05-11 青岛农业大学 The analysis method of Pseudocarps based on spectrum recognition and deep learning
CN108304788B (en) * 2018-01-18 2022-06-14 陕西炬云信息科技有限公司 Face recognition method based on deep neural network
CN108363969B (en) * 2018-02-02 2022-08-26 南京邮电大学 Newborn pain assessment method based on mobile terminal
CN110298212B (en) * 2018-03-21 2023-04-07 腾讯科技(深圳)有限公司 Model training method, emotion recognition method, expression display method and related equipment
CN108520213B (en) * 2018-03-28 2021-10-19 五邑大学 Face beauty prediction method based on multi-scale depth
CN108805866B (en) * 2018-05-23 2022-03-25 兰州理工大学 Image fixation point detection method based on quaternion wavelet transform depth vision perception
CN109580629A (en) * 2018-08-24 2019-04-05 绍兴文理学院 Crankshaft thrust collar intelligent detecting method and system
CN109543526B (en) * 2018-10-19 2022-11-08 谢飞 True and false facial paralysis recognition system based on depth difference characteristics
CN109657554B (en) * 2018-11-21 2022-12-20 腾讯科技(深圳)有限公司 Image identification method and device based on micro expression and related equipment
CN111222624B (en) * 2018-11-26 2022-04-29 深圳云天励飞技术股份有限公司 Parallel computing method and device
CN109635709B (en) * 2018-12-06 2022-09-23 中山大学 Facial expression recognition method based on significant expression change area assisted learning
CN109615574B (en) * 2018-12-13 2022-09-23 济南大学 Traditional Chinese medicine identification method and system based on GPU and dual-scale image feature comparison
CN109919171A (en) * 2018-12-21 2019-06-21 广东电网有限责任公司 A kind of Infrared image recognition based on wavelet neural network
CN111488764B (en) * 2019-01-26 2024-04-30 天津大学青岛海洋技术研究院 Face recognition method for ToF image sensor
CN109815924B (en) * 2019-01-29 2021-05-04 成都旷视金智科技有限公司 Expression recognition method, device and system
CN109934173B (en) * 2019-03-14 2023-11-21 腾讯科技(深圳)有限公司 Expression recognition method and device and electronic equipment
CN110333088B (en) * 2019-04-19 2020-09-29 北京化工大学 Caking detection method, system, device and medium
CN110119702B (en) * 2019-04-30 2022-12-06 西安理工大学 Facial expression recognition method based on deep learning prior
CN110174948B (en) * 2019-05-27 2020-10-27 湖南师范大学 Intelligent language auxiliary learning system and method based on wavelet neural network
CN110210380B (en) * 2019-05-30 2023-07-25 盐城工学院 Analysis method for generating character based on expression recognition and psychological test
CN110414394B (en) * 2019-07-16 2022-12-13 公安部第一研究所 Facial occlusion face image reconstruction method and model for face occlusion detection
CN110399821B (en) * 2019-07-17 2023-05-30 上海师范大学 Customer satisfaction acquisition method based on facial expression recognition
CN110427892B (en) * 2019-08-06 2022-09-09 河海大学常州校区 CNN face expression feature point positioning method based on depth-layer autocorrelation fusion
CN111401116B (en) * 2019-08-13 2022-08-26 南京邮电大学 Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN110717423B (en) * 2019-09-26 2023-03-17 安徽建筑大学 Training method and device for emotion recognition model of facial expression of old people
CN110889332A (en) * 2019-10-30 2020-03-17 中国科学院自动化研究所南京人工智能芯片创新研究院 Lie detection method based on micro expression in interview
CN111191704B (en) * 2019-12-24 2023-05-02 天津师范大学 Foundation cloud classification method based on task graph convolutional network
CN111144348A (en) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN111178312B (en) * 2020-01-02 2023-03-24 西北工业大学 Face expression recognition method based on multi-task feature learning network
CN111291670B (en) * 2020-01-23 2023-04-07 天津大学 Small target facial expression recognition method based on attention mechanism and network integration
CN111401147B (en) * 2020-02-26 2024-06-04 中国平安人寿保险股份有限公司 Intelligent analysis method, device and storage medium based on video behavior data
CN111382795B (en) * 2020-03-09 2023-05-05 交叉信息核心技术研究院(西安)有限公司 Image classification processing method of neural network based on frequency domain wavelet base processing
CN111126364A (en) * 2020-03-30 2020-05-08 北京建筑大学 Expression recognition method based on packet convolutional neural network
CN111652171B (en) * 2020-06-09 2022-08-05 电子科技大学 Construction method of facial expression recognition model based on double branch network
CN112132058B (en) * 2020-09-25 2022-12-27 山东大学 Head posture estimation method, implementation system thereof and storage medium
CN112380995B (en) * 2020-11-16 2023-09-12 华南理工大学 Face recognition method and system based on deep feature learning in sparse representation domain
CN117036149A (en) * 2020-12-01 2023-11-10 华为技术有限公司 Image processing method and chip
CN112699938B (en) * 2020-12-30 2024-01-05 北京邮电大学 Classification method and device based on graph convolution network model
CN113095356B (en) * 2021-03-03 2023-10-31 北京邮电大学 Light-weight neural network system and image processing method and device
CN114445899A (en) * 2022-01-30 2022-05-06 中国农业银行股份有限公司 Expression recognition method, device, equipment and storage medium
CN114743251B (en) * 2022-05-23 2024-02-27 西北大学 Drama character facial expression recognition method based on shared integrated convolutional neural network
WO2024039332A1 (en) * 2022-08-15 2024-02-22 Aselsan Elektroni̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ Partial reconstruction method based on sub-band components of jpeg2000 compressed images

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872424A (en) * 2010-07-01 2010-10-27 重庆大学 Facial expression recognizing method based on Gabor transform optimal channel blur fusion
CN105139395A (en) * 2015-08-19 2015-12-09 西安电子科技大学 SAR image segmentation method based on wavelet pooling convolutional neural networks
CN106056088A (en) * 2016-06-03 2016-10-26 西安电子科技大学 Single-sample face recognition method based on self-adaptive virtual sample generation criterion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872424A (en) * 2010-07-01 2010-10-27 重庆大学 Facial expression recognizing method based on Gabor transform optimal channel blur fusion
CN105139395A (en) * 2015-08-19 2015-12-09 西安电子科技大学 SAR image segmentation method based on wavelet pooling convolutional neural networks
CN106056088A (en) * 2016-06-03 2016-10-26 西安电子科技大学 Single-sample face recognition method based on self-adaptive virtual sample generation criterion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Recognition of facial expressions using Gabor wavelets and learning vector quantization;Shishir Bashyal;《Engineering applications of Artificial Intelligence》;20081031;第21卷(第7期);Pages 1056-1063 *

Also Published As

Publication number Publication date
CN107292256A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN107292256B (en) Auxiliary task-based deep convolution wavelet neural network expression recognition method
CN106529447B (en) Method for identifying face of thumbnail
CN109685819B (en) Three-dimensional medical image segmentation method based on feature enhancement
US10891511B1 (en) Human hairstyle generation method based on multi-feature retrieval and deformation
CN109522874B (en) Human body action recognition method and device, terminal equipment and storage medium
CN112288011B (en) Image matching method based on self-attention deep neural network
CN108062543A (en) A kind of face recognition method and device
CN107977661B (en) Region-of-interest detection method based on FCN and low-rank sparse decomposition
CN112464865A (en) Facial expression recognition method based on pixel and geometric mixed features
CN114092833B (en) Remote sensing image classification method and device, computer equipment and storage medium
CN110674685B (en) Human body analysis segmentation model and method based on edge information enhancement
CN111652273B (en) Deep learning-based RGB-D image classification method
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN110503613A (en) Based on the empty convolutional neural networks of cascade towards removing rain based on single image method
CN110826462A (en) Human body behavior identification method of non-local double-current convolutional neural network model
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN114419732A (en) HRNet human body posture identification method based on attention mechanism optimization
CN112132145A (en) Image classification method and system based on model extended convolutional neural network
CN114445715A (en) Crop disease identification method based on convolutional neural network
CN113554084A (en) Vehicle re-identification model compression method and system based on pruning and light-weight convolution
CN114782979A (en) Training method and device for pedestrian re-recognition model, storage medium and terminal
CN113221660B (en) Cross-age face recognition method based on feature fusion
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
CN117576402A (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant