CN107292256B - Auxiliary task-based deep convolution wavelet neural network expression recognition method - Google Patents
Auxiliary task-based deep convolution wavelet neural network expression recognition method Download PDFInfo
- Publication number
- CN107292256B CN107292256B CN201710446076.0A CN201710446076A CN107292256B CN 107292256 B CN107292256 B CN 107292256B CN 201710446076 A CN201710446076 A CN 201710446076A CN 107292256 B CN107292256 B CN 107292256B
- Authority
- CN
- China
- Prior art keywords
- layer
- convolution
- expression
- network
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a depth convolution wavelet neural network expression recognition method based on an auxiliary task, and solves the problems that expression features cannot be efficiently learned by an existing feature selection operator, and more image expression information classification features cannot be extracted. The invention is realized as follows: building a deep convolution wavelet neural network; establishing a facial expression set and a corresponding expression sensitive area image set; inputting a facial expression image to a network; training a deep convolution wavelet neural network; network error back propagation; updating each convolution kernel and offset vector of the network; inputting an expression sensitive area image to a trained network; learning the weighted proportion of the auxiliary task; obtaining a network global classification label; and counting the recognition accuracy according to the global label. The method gives consideration to the abstract and detail information of the expression image, enhances the influence of the expression sensitive area in the expression feature learning, obviously improves the accuracy of expression recognition, and can be applied to the expression recognition of the facial expression image.
Description
Technical Field
The invention belongs to the technical field of image processing, mainly relates to computer vision identification, and particularly relates to a depth convolution wavelet neural network expression identification method based on an auxiliary task. The method can be applied to learning and classifying the expression characteristics in the facial expression recognition.
Background
Facial expression recognition is a leading technology in the field of image processing and computer vision. The method is a key step from image processing to image analysis, and the quality of a segmentation result directly influences subsequent image analysis, understanding, solving and the like. The purpose of facial expression recognition is to research a coding model of facial expression, learn and extract a characteristic expression mode of the facial expression, and realize automatic synthesis, tracking and recognition of the facial expression by a computer.
Currently, the technical research on the recognition of facial expressions mainly focuses on two aspects of feature extraction and classification algorithms. The human face expression recognition method based on the deep learning network is used by researchers in recent years, particularly, a deep convolution neural network which is good at processing two-dimensional images in the deep learning network is applied to the expression recognition field by the researchers, but the deep convolution neural network focuses on abstract mapping of images from a low layer to a high layer in a general sense so as to obtain a high-level feature expression mode, and texture and detail information of the expression images are ignored when the high-level feature expression mode is obtained. Moreover, the commonly used deep network is generally a single-task deep network, and the main contribution of the expression sensitive area to the feature expression cannot be effectively highlighted when the features of the expression are learned.
In the existing expression recognition technology, a method of firstly selecting features and then classifying is mainly adopted, but in the feature selection step, the existing feature selection operator cannot efficiently learn expression features, so that the subsequent classification cannot obtain an ideal result. In addition, luyadan et al adopt a deep self-coding network as a classifier, and do not avoid the step of feature selection, so that the final classification effect is not greatly improved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a depth convolution wavelet neural network expression recognition method based on an auxiliary task.
The invention relates to a depth convolution wavelet neural network expression recognition method based on an auxiliary task, which is characterized by comprising the following steps of:
(1) building a depth convolution wavelet network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer; initializing a bias weight matrix of the network convolution layer into a 0 matrix, wherein a Sigmoid function is selected as an activation function of the network;
(2) establishing a facial expression image set and an expression sensitive area image set, wherein the expression sensitive area image set is obtained by cutting eyebrow parts and mouth parts of the facial expression image set, a part of images in the facial expression image data set are used as a training image set of a network, and the rest of images are used as a testing image set;
(3) inputting a training image into a deep convolution wavelet network, wherein the size of the input image is 96 × 96;
(4) the first layer of the deep convolution wavelet network is a convolution layer, the convolution layer performs convolution operation on each input facial expression training image, and the number of selected convolution kernels is Q1Convolution kernel size 7 × 7:
(4a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(4b) each convolution kernel performs convolution operation on the human face expression image to obtain Q1The feature map size of each convolution kernel is 90 x 90;
(5) the second layer of the network is a pooling layer of Q's obtained from the previous layer of convolutional layers1Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer1The size of the pooled feature map is 45 × 45;
(6) the third layer of the network is a convolution layer, and Q obtained by the previous layer of the pooling layer1Taking the characteristic graph as input, performing convolution operation, wherein the number of convolution kernels selected by the convolution layer is Q2Convolution kernel size 6 × 6:
(6a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ]; (ii) a
(6b) Each convolution kernel is on the Q1Performing convolution operation on the feature map, and then performing convolution on the Q1Characteristic diagramThe convolution result and the bias matrix are subjected to average evaluation after the activation function filtering to obtain a feature map of the convolution kernel, and the feature map size of each convolution kernel is 40 x 40;
(7) the fourth layer of the network is a pooling layer that pools Q from the previous convolutional layer2Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer2The size of the pooled feature map is 20 x 20;
(8) the fifth layer of the network is a convolution layer, and Q obtained by the previous layer of the stratification layer is added2Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q3Convolution kernel size 5 × 5:
(8a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(8b) each convolution kernel is on the Q2Performing convolution operation on the feature map, and then performing convolution on the Q2The convolution result of each characteristic map and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic map of the convolution kernel, and the size of each characteristic map is 16 x 16;
(9) the sixth layer of the network is a wavelet pooling layer that is a Q value obtained from the previous convolutional layer3Taking the characteristic graph as an input, and performing one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band;
(10) the seventh layer of the network is a full connection layer, and Q obtained by pooling wavelets of the sixth layer of the network is used as a Q value38 x 8 low frequency subbands and Q3Taking 8-by-8 high-frequency sub-bands as input to form a 128-dimensional full-connection layer feature vector;
(11) repeating the steps (3) to (10) by taking n randomly selected facial expression images as a unit to obtain respective 128-dimensional feature vectors of the n images;
(12) the eighth layer of the network is a Softmax output layer, n 128-dimensional feature vectors are obtained and used as input, a probability distribution Softmax classifier with 7 types of output is trained, and classification labels are obtained;
(13) error calculation is carried out on the classification label and the real label of the Softmax output layer, and a weight matrix is updated once according to a BP back propagation algorithm;
(14) repeating the training steps (3) to (13) until the weight matrix is updated m times to obtain a trained deep convolution wavelet neural network;
(15) bringing the facial expression image test set into a trained deep convolution wavelet neural network to obtain a classification label z1 on an output layer, bringing an expression sensitive region image set corresponding to the test data set into the trained deep convolution wavelet neural network to obtain a classification label z2 on the output layer, and obtaining a final classification label by using the two classification labels according to a mode of z3 ═ z1+ λ × z2, wherein λ represents the weighted proportion of an auxiliary task;
(16) and outputting the facial expression recognition accuracy according to the classification label z3 of the test set, and completing the auxiliary task-based deep convolution wavelet neural network facial expression recognition.
According to the method, the expression characteristics are learned by using the subtask deep convolution wavelet neural network, and characteristic selection is not needed, so that the abstract and local detail information of the facial expression can be well learned, the influence of an expression sensitive area on network expression characteristic extraction is improved, and the accuracy of the facial expression recognition result is obviously improved.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention gives consideration to the special characterization capability of the expression sensitive area in the learning of the expression characteristics of the deep convolutional neural network, firstly, a main task learning DCNN is trained to obtain a shared characteristic weight matrix, then, local images of the eyebrow postures and the mouth postures of the eyes and eyebrows of the expression sensitive area are merged to be used as an auxiliary task estimation branch task, a classification result of auxiliary task estimation is obtained by mapping the shared characteristic weight matrix, and finally, the classification performance of the main task learning is optimized by the auxiliary task estimation classification result, so that the generalization capability of the deep convolutional network in the expression recognition is improved;
secondly, because the invention avoids the defects that the characteristic learned by the upper convolutional layer of the part of the common convolutional neural network can be lost due to simple down-sampling operation of the pooling layer in the convolutional neural network and the local characteristic of a plurality of shallow layers is lost because the output of the full-link layer only contains abstract information, and combines the multi-scale wavelet transform and the deep convolutional neural network architecture, the network not only ensures that the characteristic learned by the convolutional layer can effectively carry out complete characteristic transmission in the pooling layer, but also can expand the expression local characteristic obtained during shallow layer learning in the full-link layer, thereby leading the whole network structure to describe the expression characteristic more optimally and obviously improving the recognition result.
Description of the figures
FIG. 1 is a portion of an image in a raw database as employed by the present invention;
FIG. 2 is a block flow diagram of the present invention;
FIG. 3 is a schematic diagram of the network structure of the present invention, wherein FIG. 3(a) is a structural diagram of the deep convolution wavelet neural network of the present invention, and FIG. 3(b) is a structural diagram of the deep convolution wavelet neural network of the subtask of the present invention;
fig. 4 is a portion of an expression sensitive area image of the present invention.
Detailed Description
The invention is described in detail below with reference to the attached drawing figures:
example 1
The facial expression recognition is an indispensable component in machine learning research, has very wide application value in the society where human-computer interaction is continuously popularized at present, and automatically recognizes facial expressions of human faces in real time in human-computer interfaces such as mobile terminals and personal computers; in some cases, the realized facial expressions are retrieved from the video, tracked and identified. The breakthrough of the facial expression recognition method also has great reference significance to the fields of intelligent calculation and brain-like research.
In the existing expression recognition technology, a method of firstly selecting features and then classifying is mainly adopted, but in the feature selection step, the existing feature selection operator cannot efficiently learn expression features, so that the subsequent classification cannot obtain an ideal result. In addition, the method of adopting the deep network as the classifier does not avoid feature selection, so that the classification effect is improved to a limited extent.
The invention develops research and exploration aiming at the current situation, provides a depth convolution wavelet neural network expression recognition method based on an auxiliary task, and referring to fig. 2, the invention realizes facial expression recognition, and comprises the following steps:
(1) building a depth convolution wavelet network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer; the bias weight matrix of the network convolution layer is initialized to be 0 matrix, and the Sigmoid function is selected as the activation function of the network. The deep convolution wavelet neural network built by the invention comprises the following components from an input layer to an output layer in sequence: the deep convolutional wavelet neural network comprises an input layer, a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a multi-scale transformation layer, a full connection layer and a softmax output layer, wherein the multi-scale transformation layer is a wavelet pooling layer and integrally forms the deep convolutional wavelet neural network.
(2) And establishing a facial expression image set and an expression sensitive area image set, wherein the expression sensitive area image set is obtained by cutting eyebrow parts and mouth parts of the facial expression image set, a part of images in the facial expression image data set are used as a training image set of a network, and the rest images are used as a test image set. For example, the facial expression image data set in this example has 20000 samples, of which 15000 images are used as the training image set, and the remaining 5000 images are used as the training image set, and the number of the expression sensitive area image sets is corresponding to that of the facial expression image data set.
(3) A training image is input into a deep convolutional wavelet network, the size of the input image being 96 × 96. In the embodiment, the training image is directly input into the network, and other image preprocessing is not needed, such as removing the influence of complex background or illumination, and the like, so that the program and the process of image recognition are simplified.
(4) The first layer of the deep convolution wavelet network is a convolution layer, the convolution layer performs convolution operation on each input facial expression training image, and the number of selected convolution kernels is Q1The convolution kernel size is 7 × 7. The number of convolution kernels, in this example Q, is selected according to the computing environment and the software and hardware conditions1Take 4.
(4a) And adopting a random initialization method to configure the weight value of the convolution kernel as a near zero number between [ -0.5 and 0.5 ]. The initial weight of the convolution kernel is near zero in the invention, so as to accelerate the convergence speed of the network.
(4b) Each convolution kernel performs convolution operation on the human face expression image to obtain Q1And (4) the feature map size of each convolution kernel is 90 x 90. The characteristic graph size of the convolution kernel in the present invention is determined by the convolution kernel size.
(4c) Initially setting a bias weight matrix of the convolutional layer as a 0 matrix; in this example, the bias weight matrix is a one-dimensional vector, the dimensions and the number Q of convolution kernels1The same is true.
(4d) The activation function of the network is a Sigmoid function. The Sigmoid function formula in the invention is as follows:
wherein f (x) is the activation value of the function, x is the input of the activation function, x in the network represents the value of the convolution result added with the bias weight, and e is the natural logarithm.
(5) The second layer of the network is a pooling layer of Q from the previous, first, convolutional layer1Taking the feature map as an input of the pooling layer, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer1And (4) obtaining the pooled feature maps with the size of 45 × 45.
(6) The third layer of the network is a convolution layer, and Q obtained by the previous layer of the pooling layer1Taking the characteristic graph as input, performing convolution operation, wherein the number of convolution kernels selected by the convolution layer is Q2The convolution kernel size is 6 x 6. Number of convolution kernel selections Q in this example2Was taken as 6.
(6a) Adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(6b) each convolution kernel is on the Q1Performing convolution operation on the feature map, and then performing convolution on the Q1The convolution result of the characteristic graphs and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic graphs of the convolution kernels, and the characteristic graph size of each convolution kernel is 40 x 40;
(6c) the bias weight matrix for the convolutional layer is initially set to a 0 matrix. In this example, the bias weight matrix is a one-dimensional vector, the dimensions and the number Q of convolution kernels2The same is true.
(6d) The activation function of the network is a Sigmoid function.
(7) The fourth layer of the network is a pooling layer that pools Q from the previous convolutional layer2Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer2The size of the pooled feature map is 20 x 20;
(8) the fifth layer of the network is a convolution layer, and Q obtained by the previous layer of the stratification layer is added2Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q3The convolution kernel size is 5 x 5. Number of convolution kernel selections Q in this example3Taken as 12.
(8a) Adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(8b) each convolution kernel is on the Q2Performing convolution operation on the feature map, and then performing convolution on the Q2The convolution result of each characteristic map and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic map of the convolution kernel, and the size of each characteristic map is 16 x 16;
(8c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(8d) the activation function of the network is a Sigmoid function.
(9) The sixth layer of the network is a wavelet pooling layer that is a Q value obtained from the previous convolutional layer3Taking the characteristic graph as an input, and performing one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band.
(10) The seventh layer of the network is a full connection layer, and Q obtained by pooling wavelets of the sixth layer of the network is used as a Q value38 x 8 low frequency subbands and Q3The 8 x 8 high frequency subbands are used as input to form a 128-dimensional fully connected layer feature vector.
(11) And (5) repeating the steps (3) to (10) by taking the randomly selected n facial expression images as a unit to obtain respective 128-dimensional feature vectors of the n images.
(12) And the eighth layer of the network is a Softmax output layer, the obtained n 128-dimensional feature vectors are used as input, a probability distribution Softmax classifier with 7 types of output is trained, and classification labels are obtained.
(13) And carrying out error calculation on the classification label of the Softmax output layer and the real label, and updating the weight matrix once according to a BP back propagation algorithm. The updated weight matrix in this example includes the values of the convolution kernel and the values of the bias weight vector.
(14) And (5) repeating the training steps (3) to (13) until the weight matrix is updated m times. In the invention, m is the updating times and is determined by the image scale and the convergence rate of the network, so that the trained deep convolution wavelet neural network is obtained.
(15) And substituting the facial expression image test set into the trained deep convolution wavelet neural network to obtain a classification label z1 on an output layer, substituting the expression sensitive region image set corresponding to the test data set into the trained deep convolution wavelet neural network to obtain a classification label z2 on the output layer, and obtaining a final classification label by using the two classification labels according to a mode of z 3-z 1+ lambda-z 2, wherein lambda represents the weighted proportion of the auxiliary task.
(16) And outputting the facial expression recognition accuracy according to the classification label z3 of the test set, and completing the auxiliary task-based deep convolution wavelet neural network facial expression recognition.
The method gives consideration to the special characterization capability of the expression sensitive area in the learning of the expression characteristics of the deep convolutional neural network, firstly trains a main task learning DCNN to obtain a shared characteristic weight matrix, then merges partial images of the eyebrow postures and the mouth postures of the eyes and eyebrows of the expression sensitive area to serve as an auxiliary task estimation branch task, obtains a classification result of auxiliary task estimation through mapping of the shared characteristic weight matrix, and finally optimizes the classification performance of the main task learning through the auxiliary task estimation classification result to improve the generalization capability of the deep convolutional network in the expression recognition.
Example 2
The auxiliary task-based deep convolution wavelet neural network expression recognition method is the same as that in the embodiment 1, the facial expression image set and the expression sensitive area image set are established in the step (2), and the method is carried out according to the following steps:
2.1 the facial expression image set is obtained as follows:
selecting a proper number of original images with labels from a JAFFE expression image library at random, wherein the images in the JAFFE expression image library adopted by the invention have 213 images in total as shown in figure 1, and the images comprise seven types of expressions: anger, heart injury, happiness, calmness, aversion, surprise and fear. The original image size is 256 × 256, and referring to fig. 1, images of partially different expressions of four persons are listed in fig. 1, the first row represents angry expressions, the second row represents aversive expressions, the third row represents startle expressions, the fourth row represents happy expressions, and the fifth row represents calm expressions. The original image is expanded by turning, rotating and selecting image blocks by the sliding frame, the image is firstly turned, then the image is rotated by a plurality of small angles, and finally the expression image is selected by sliding up and down by the sliding frame by taking the center of the image as a base point. The method combines haar characteristics with Adaboost algorithm to identify the face area of the expanded image and zoom the face expression image, and finally obtains a face expression image set of tens of thousands of samples.
2.2 expression sensitive region image sets were obtained as follows:
the expression sensitive area refers to areas of several parts sensitive to expressions in the human face area, including the eye eyebrow area and the mouth area; and (3) cutting the facial expression image set obtained in the step (2.1), obtaining two left and right eyebrow and eye image blocks and obtaining a mouth position image block by adopting a proper cutting frame, splicing the three image blocks to obtain an expression sensitive area image, and finally obtaining the expression sensitive area image set of the same tens of thousands of samples. Referring to fig. 4, the sensitive region images of seven expressions of one person are listed in fig. 4.
2.3, a label file of the facial expression image set is manufactured according to an original label of a JAFFE expression image library, a label of a single image is a 1 x k-dimensional binary vector, a k-dimensional expression category is divided into k types, k is 2, 3, 4, 5, 6, and the value of k is determined according to the requirement of an actual expression classification problem. The dimension with the label vector of 1 represents that the image belongs to the expression category represented by the dimension, the values of other dimensions are 0, for example, the first dimension represents a happy expression category in the category 5 expression category, and the label vector of a single image is [1, 0, 0, 0, 0] if the single image is a happy image. In the invention, the expression image data set and the sensitive area image data set are mutually corresponding, so that the label file can be shared.
Example 3
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is the same as that in embodiment 1-2, the wavelet pooling layer in step (9) obtains a low-frequency sub-band and a high-frequency sub-band, referring to fig. 3(a), and a conventional downsampling pooling layer is modified into the wavelet pooling layer in fig. 3(a), so that information loss caused by simple downsampling is avoided, high-frequency information is retained, and local information of expression features is enhanced. The method comprises the following steps:
9.1, performing one-layer downsampling wavelet decomposition on the feature map obtained by the previous layer of convolutional layer, wherein the selected wavelet basis function is a Haar function, and each feature map is subjected to one-layer downsampling wavelet decomposition to obtain a low-frequency sub-band, a horizontal high-frequency sub-band, a vertical high-frequency sub-band and a high-frequency sub-band comprising the horizontal direction and the vertical direction. The number of layers of the wavelet decomposition in the invention can be determined according to the requirement of the network on the size in practical application.
9.2 fuse the three high frequency subbands into a new high frequency subband according to the following formula:
xWH=Maxf(0,xHH,xHL,xLH)
wherein x isHH,xHL,xLHRepresenting three high-frequency subbands, x, obtained by a wavelet decompositionWHRepresenting the fused high-frequency sub-band, and defining a function Maxf (A, B) to represent that the corresponding positions of the matrix A and the matrix B take larger values;
and 9.3, taking the obtained low-frequency sub-band and the fused high-frequency sub-band as the input of the next full-connection layer.
The wavelet pooling layer of the invention avoids the defect that information is lost due to simple down-sampling operation of the pooling layer in a general convolutional neural network, replaces pooling results with low-frequency sub-bands with less wavelet transform information loss, and inputs high-frequency sub-bands containing detailed information into the full-connection layer together, so that the feature vector of the full-connection layer is expanded in multiple channels, and the distinguishability of the feature vector is enhanced.
Example 4
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is performed as in embodiments 1-3, and the full-connected layer feature vector in step (10) is performed according to the following steps:
10.1, solving a low-frequency subband matrix according to the following formula;
xL=Maxf(0,W1·xLL1+W2·xLL2+W3·xLL3+……+Wn·xLLn)
wherein x isLRepresenting a global low frequency subband matrix, xLLnShowing each characteristic diagramOne layer of wavelet decomposed low frequency sub-band, WnRepresenting the superposition weights of the low frequency subbands of the respective profiles. The superposition weight W in the inventionnThe determination can be based on empirical values, or other learning manners can be designed.
10.2 solving the high-frequency subband matrix according to the following formula:
xH=Maxf(0,xWH1,xWH2,…xWHn)
wherein x isHRepresenting a global high-frequency subband matrix, xWHnRepresenting a new high-frequency sub-band formed by fusing three high-frequency sub-bands of one-layer wavelet decomposition of each characteristic diagram;
10.3 Global Low frequency subband xLAnd a global high frequency subband xHAnd stretching the sub-band matrix into a vector with the dimension of 1 x v according to rows, connecting the sub-band matrix end to obtain the characteristic vector of the fully-connected layer, wherein the size of the vector is 1 x 2v, and the value of v is obtained by multiplying the values of the length and the width of the sub-band matrix. The feature vector of the fully-connected layer in this example is formed by stretching the low-frequency and high-frequency subbands in rows and splicing the low-frequency and high-frequency subbands end to end, and specifically is 1 × 128-dimensional, wherein the vector size of the stretching of the low-frequency subbands and the high-frequency subbands in rows is 1 × 64-dimensional.
Example 5
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is the same as that in the embodiment 1-4, the auxiliary task weighting specific gravity lambda is described in the step (15), referring to fig. 3(b), the auxiliary task correction is added to the trained deep convolution wavelet neural network in the step (b) of fig. 3, the weighting specific gravity lambda is obtained by learning in the network by using the sensitive region image set, and the method is carried out according to the following steps:
15.1 initializing λ to 0, and randomly selecting M personal face expression images and corresponding sensitive area images as learning samples of a weight λ;
15.2 for the trained deep convolution wavelet neural network, the learning sample is brought into the network to obtain the classification label according to the following formula:
z3=z1+λz2
wherein z is1Output label z representing the network brought by the facial expression image2Representing images of corresponding sensitive areas brought into the networkOutput tag, z3Representing a network global tag;
15.3 according to Global Label z3The magnitude of the error with the true tag, the value of λ is updated as follows:
λ=λ+▽λ
wherein ≧ λ is 0.05, the λ value corresponding to the minimum label error of each learning sample is counted. The error of the global label and the real label in this example is determined according to the numerical difference of the dimensionalities of the expression classes.
15.4, the expected value of the lambda value corresponding to the minimum label error of the M learning samples is obtained, and the expected value lambda is used as the global auxiliary task weighting specific gravity lambda value. The expected value calculation in this example can be obtained directly by averaging.
A more detailed example is given below to further illustrate the invention
Example 6
The auxiliary task-based expression recognition method for the deep convolution wavelet neural network is the same as the embodiment 1-5, and the specific steps are as follows with reference to the attached figure 3:
step 1: establishment of facial expression image set
200 original images with labels are randomly selected from a JAFFE expression database containing 213 images, and the size of the selected original image is 256 × 256 as shown in FIG. 1. Then 200 original images are expanded into 400 images which are 2 times of the original images through left-right turning, then 10 times of the expansion of 4000 images are obtained through left-right rotation of the images by 1 degree, 2 degrees, 3 degrees, 4 degrees and 5 degrees, finally, a 128-128 rectangular frame is adopted, the coordinates of upper and lower 5 pixel points are cut in a sliding mode by taking the center of the images as a base point, then, a method of combining haar features and an Adaboost algorithm is adopted to identify a face area and zoom the face area into an experimental face image with the size of 96-96, finally, a face expression image set with 40000 sample numbers is shared, a corresponding label file is made, the label of a single image is a binary vector with the dimension of 1-7, the dimension with the value of 1 represents that the image belongs to the expression classification represented by the dimension, and the other dimensions are 0.
Step 2: establishment of expression sensitive area image set
The expression-sensitive image in the invention refers to the regions of several parts sensitive to expression in the human face region, including the eyebrow region and mouth region of the eyes, as shown in fig. 4. And (3) cutting the face region image obtained in the step (1), obtaining two eyebrow and eye position image blocks by adopting a 48-by-48 cutting frame and obtaining a mouth position image block by adopting a 48-by-96 cutting frame, splicing the three image blocks to obtain an expression sensitive region image, and finally sharing the expression sensitive region image set with 40000 samples, wherein the label file can be shared with the image obtained in the step (1).
And step 3: network training
(1) Building a depth network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer;
(2) inputting an expression image into a depth network, wherein the size of the input image is 96 × 96;
(3) the first layer of the network is a convolution layer, the convolution layer performs convolution operation on each expression original image, the number of selected convolution kernels is 6, and the size of each convolution kernel is 7 x 7:
(3a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(3b) performing convolution operation on the human face expression image by each convolution kernel to obtain 6 feature graphs after convolution, wherein the feature graph size of each convolution kernel is 90 x 90;
(3c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(3d) the activation function of the network is a Sigmoid function;
(4) the second layer of the network is a pooling layer, which takes the 6 feature maps obtained from the previous convolutional layer as input and performs pooling operations:
the pooling layer adopts a pooling method that the maximum value is selected in non-overlapping 2 x 2 areas to obtain 6 characteristic maps of the pooling layer, and the size of the characteristic maps is 45 x 45;
(5) the third layer of the network is a convolutional layer, 6 feature maps obtained by the previous pooling layer are used as input, convolution operation is carried out, the number of convolution kernels selected by the convolutional layer is 12, and the sizes of the convolution kernels are 6 x 6:
(5a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ]; (ii) a
(5b) Each convolution kernel performs convolution operation on the 6 characteristic graphs, then average evaluation is performed on the convolution results of the 6 characteristic graphs and a bias matrix after filtering of an activation function, the characteristic graph of the convolution kernel is obtained, and the characteristic graph size of each convolution kernel is 40 x 40;
(5c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(5d) the activation function of the network is a Sigmoid function.
(6) The fourth layer of the network is a pooling layer which takes the 12 feature maps obtained from the previous convolutional layer as input and performs pooling operations:
the pooling layer was created by selecting the maximum in non-overlapping 2 x 2 regions to yield 12 signatures of the pooling layer with a size of 20 x 20.
(7) The fifth layer of the network is a convolutional layer, 12 feature maps obtained by the previous pooling layer are used as input to carry out convolution operation, the number of convolution kernels selected by the convolutional layer is 12, and the size of the convolutional layer is 5 x 5:
(7a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(7b) each convolution kernel performs convolution operation on the 12 characteristic graphs, then average evaluation is performed on the convolution results of the 12 characteristic graphs and a bias matrix after filtering an activation function, and the characteristic graph of the convolution kernel is obtained, wherein the size of each characteristic graph is 16 x 16;
(7c) initially setting a bias weight matrix of the convolutional layer as a 0 matrix;
(7d) the activation function of the network is a Sigmoid function.
(8) The sixth layer of the network is a wavelet pooling layer, which takes 12 characteristic maps obtained from the previous convolutional layer as input and performs one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band.
(9) The seventh layer of the network is a full connection layer, and 128 × 8 low-frequency subbands and 128 × 8 high-frequency subbands obtained by the previous wavelet transform layer are used as input to form a 128-dimensional full connection layer feature vector. In the invention, the full connection layer firstly carries out corresponding position calculation on 12 8-8 low-frequency sub-bands to obtain the maximum value, then the maximum value is stretched into a vector with the dimension of 1-64 according to lines, the high-frequency sub-bands obtain another vector with the dimension of 1-64 according to the same operation, and the two vectors are connected end to end according to the sequence of the low-frequency sub-band vector and the high-frequency sub-band vector to obtain a global vector with the dimension of 1-128.
(10) And (4) repeating the steps (2) to (9) by taking the randomly selected 50 expression images as a unit to obtain respective 128-dimensional feature vectors of the 50 images.
(11) The eighth layer of the network is a Softmax output layer, the obtained 50 128-dimensional feature vectors are used as input, a Softmax classifier with 7-class probability distribution output is trained, and classification labels are obtained;
(12) and performing error calculation on the classification label and the real label of the Softmax output layer, and updating the value of the convolution kernel and the value of the bias weight vector of each layer according to a BP back propagation algorithm. The weight updating learning step length of the deep convolution wavelet neural network is set to be 0.05.
(13) And (5) repeating the training steps (2) to (12) until the weight matrix is updated 200 times. The setting of the updating times of the weight in the network training can be determined according to the convergence speed of the network.
And 4, step 4: learning auxiliary tasks
And (3) bringing the facial expression test data set into the trained network to obtain a classification label z1, bringing the expression sensitive area corresponding to the test data set into the trained network to obtain a classification label z2, obtaining a final classification label by using the two classification labels according to a mode of z 3-z 1+ 0.65-z 2, and calculating z3 for the whole test data set.
And 5: statistics of recognition results
And calculating the correct rate of correct recognition according to z3 in step 4.
The invention avoids the defects that the characteristics learned by the convolutional layer on the upper layer of the general convolutional neural network can be lost due to simple down-sampling operation of the pooling layer in the general convolutional neural network and the output of the full-link layer only contains abstract information but lacks a plurality of shallow local characteristics, and combines the multi-scale wavelet transform and the deep convolutional neural network architecture, so that the network not only ensures that the characteristics learned by the convolutional layer can effectively carry out complete characteristic transmission on the pooling layer, but also can expand the expression local characteristics obtained during shallow layer learning in the full-link layer, thereby leading the whole network structure to describe the expression characteristics more optimally and obviously improving the recognition result.
The technical effects of the invention are verified and explained by the simulation results as follows:
example 7
The auxiliary task-based deep convolution wavelet neural network expression recognition method is similar to that of the embodiment 1-6, and the effect of the method is further analyzed by combining the recognition results of the attached table 1.
Simulation experiment conditions
The hardware test platform of the invention is: the processor is an Inter Core CPU i3, the main frequency is 3.20GHz, the memory is 4G, and the software platform is as follows: windows 7 flagship edition 64-bit operating system and Matlab R2013 b. The input images of the inventive network were all 96 × 96 in size and in TIFF format.
Emulated content
The simulation content of the invention comprises: simulation experiments and recognition result statistics of the existing facial expression recognition technology; under the condition of no additional wavelet pooling layer and no auxiliary task learning, a six-layer deep convolutional neural network is simply used for carrying out simulation experiments of facial expression recognition and recognition result statistics; the auxiliary task-based deep convolution wavelet neural network expression recognition method is completely used for experimental simulation and recognition result statistics; and comparing and analyzing the simulation results of each experiment.
Analysis of simulation results
Table 1 shows the comparison between the recognition effect of the method of the present invention and the recognition effect of the existing facial expression recognition technology. Referring to the data in table 1, Shan C and Jabid T can know that in the method of dividing the image into several sub-regions, each sub-region is multiplied by a weight according to the level of its contribution to the expression, and the weight represents the amount of the expression characterization ability of the region. Tasked et al initializes the weight by x ^2 distribution, and obtains a result with an average recognition rate of 85.4% by using a new local face descriptor using a Local Direction Pattern (LDP) and an architecture of LDP + SVM algorithm. Furthermore, shishirr et al use Gabor's feature in combination with Learning Vectorization (LVQ) to perform Gabor filtering on 34 image fiducials on an imaging interface with the 34 fiducials, resulting in an identification rate of 87.51%. Nectarios et al propose an algorithm based on Gabor combined with Log-Gabor filter convolution to obtain feature vectors, resulting in 86.1% and 85.72% recognition rates. In the achievement of Luya Dan, Von ShiYong and the like, the algorithm of FP and the deep self-coding network is utilized to obtain the recognition rate of 90.47%, and in addition, the invention only uses a six-layer deep convolutional neural network to learn the expression characteristics, and the recognition rate of 90.56 is obtained when the algorithm of a softmax classifier is trained under the condition of no additional wavelet pooling layer and auxiliary task learning.
When the auxiliary task-based deep convolution wavelet neural network expression recognition method is used integrally, the obtained recognition accuracy is 92.91%.
TABLE 1 comparison of recognition effects of the present invention and the existing facial expression recognition methods
As can be seen from table 1, the method of the present invention can give good consideration to local and global information of the expression features of the facial expression image, and enhances the influence of the expression sensitive area on facial expression recognition through the auxiliary task, thereby improving the recognition rate of the facial expression.
In short, the auxiliary task-based deep convolution wavelet neural network expression recognition method disclosed by the invention solves the problems that expression features cannot be efficiently learned by a feature selection operator and more image expression information classification features cannot be extracted in the conventional expression recognition technology. The method comprises the following implementation steps: building a deep convolution wavelet neural network; establishing a facial expression image set and an expression sensitive area image set; inputting a facial expression image to a network; training a deep convolution wavelet neural network; network error back propagation; updating a depth convolution wavelet neural network parameter set, namely updating each convolution kernel and offset vector of the network; inputting an expression sensitive area image to a trained network; learning the weighted proportion of the auxiliary task; obtaining a network global classification label according to the weighted proportion; and counting and identifying the accuracy according to the global label. The method gives consideration to the abstract and the detailed information of the expression image, enhances the influence of the expression sensitive area in the learning of expression characteristics, obviously improves the accuracy of expression recognition, and can be applied to the expression recognition of the facial expression image.
Claims (5)
1. A depth convolution wavelet neural network expression recognition method based on auxiliary tasks is characterized by comprising the following steps:
(1) building a depth convolution wavelet network consisting of three convolution layers, two pooling layers, a multi-scale transformation layer, a full connection layer and a softmax output layer; initializing a bias weight matrix of the network convolution layer into a 0 matrix, wherein a Sigmoid function is selected as an activation function of the network;
(2) establishing a facial expression image set and an expression sensitive area image set, wherein the expression sensitive area image set is obtained by cutting eyebrow parts and mouth parts of the facial expression image set, a part of images in the facial expression image data set are used as a training image set of a network, and the rest of images are used as a testing image set;
(3) inputting a training image into a deep convolution wavelet network, wherein the size of the input image is 96 × 96;
(4) the first layer of the deep convolution wavelet network is a convolution layer which performs convolution operation on each input facial expression training image and selectsSelecting the number of convolution kernels as Q1Convolution kernel size 7 × 7:
(4a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(4b) each convolution kernel performs convolution operation on the human face expression image to obtain Q1The feature map size of each convolution kernel is 90 x 90;
(5) the second layer of the network is a pooling layer of Q's obtained from the previous layer of convolutional layers1Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer1The size of the pooled feature map is 45 × 45;
(6) the third layer of the network is a convolution layer, and Q obtained by the previous layer of the pooling layer1Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q2Convolution kernel size 6 × 6:
(6a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(6b) each convolution kernel is on the Q1Performing convolution operation on the feature map, and then performing convolution on the Q1The convolution result of the characteristic graphs and the bias matrix are subjected to average evaluation after the activation function filtering to obtain the characteristic graphs of the convolution kernels, and the characteristic graph size of each convolution kernel is 40 x 40;
(7) the fourth layer of the network is a pooling layer that pools Q from the previous convolutional layer2Taking the characteristic graph as input, and performing pooling operation:
the pooling layer is prepared by selecting maximum value in non-overlapping 2 x 2 region to obtain Q of the pooling layer2The size of the pooled feature map is 20 x 20;
(8) the fifth layer of the network is a convolution layer, and Q obtained by the previous layer of the stratification layer is added2Taking the characteristic graph as input, performing convolution operation, wherein the number of selected convolution kernels of the convolution layer is Q3Ruler of convolution kernelCun 5 x 5:
(8a) adopting a random initialization method to configure the weight of a convolution kernel as a near zero number between [ -0.5,0.5 ];
(8b) each convolution kernel is on the Q2Performing convolution operation on the feature map, and then performing convolution on the Q2The convolution result of the characteristic graphs and the bias matrix are subjected to average evaluation after filtering of the activation function, so that the characteristic graphs of the convolution kernel are obtained, and the size of each characteristic graph is 16 x 16;
(9) the sixth layer of the network is a wavelet pooling layer that is a Q value obtained from the previous convolutional layer3Taking the characteristic graph as an input, and performing one-layer wavelet decomposition:
the adopted wavelet basis function is a 'haar' function, for each feature map, an 8 x 8 low-frequency sub-band and three 8 x 8 high-frequency sub-bands are obtained, the corresponding positions of the three high-frequency sub-bands are maximized, and the three high-frequency sub-bands are fused into a new high-frequency sub-band;
(10) the seventh layer of the network is a full connection layer, and Q obtained by pooling wavelets of the sixth layer of the network is used as a Q value38 x 8 low frequency subbands and Q3Taking 8-by-8 high-frequency sub-bands as input to form a 128-dimensional full-connection layer feature vector;
(11) repeating the steps (3) to (10) by taking n randomly selected facial expression images as a unit to obtain respective 128-dimensional feature vectors of the n images;
(12) the eighth layer of the network is a Softmax output layer, n 128-dimensional feature vectors are obtained and used as input, a probability distribution Softmax classifier with 7 types of output is trained, and classification labels are obtained;
(13) error calculation is carried out on the classification label and the real label of the Softmax output layer, and a weight matrix is updated once according to a BP back propagation algorithm;
(14) repeating the training steps (3) to (13) until the weight matrix is updated m times to obtain a trained deep convolution wavelet neural network;
(15) bringing the facial expression image test set into a trained deep convolution wavelet neural network to obtain a classification label z1 on an output layer, bringing an expression sensitive region image set corresponding to the test data set into the trained deep convolution wavelet neural network to obtain a classification label z2 on the output layer, and obtaining a final classification label by using the two classification labels according to a mode of z3 ═ z1+ λ × z2, wherein λ represents the weighted proportion of an auxiliary task;
(16) and outputting the facial expression recognition accuracy according to the classification label z3 of the test set, and completing the auxiliary task-based deep convolution wavelet neural network facial expression recognition.
2. The subtask-based expression recognition method of the deep convolutional wavelet neural network, according to claim 1, wherein the step (2) of establishing the facial expression image set and the expression sensitive region image set is performed according to the following steps:
2.1 the facial expression image set is obtained as follows:
randomly selecting a proper number of original images with labels from an expression image library, expanding the original images in a mode of selecting image blocks through turning, rotating and sliding frames, identifying the face area of the expanded images by adopting a method of combining haar characteristics and Adaboost algorithm, and scaling the expanded images into face expression images with the size of 96 × 96, so as to finally obtain a face expression image set of a ten-thousand-level sample;
2.2 expression sensitive region image sets were obtained as follows:
the expression sensitive area refers to areas of several parts sensitive to expressions in the human face area, including the eye eyebrow area and the mouth area; cutting the obtained facial expression image set, obtaining two left and right eyebrow and eye part image blocks and a mouth part image block by adopting a cutting frame, splicing the three image blocks to obtain an expression sensitive area image, and finally obtaining an expression sensitive area image set of the same ten-thousand-order sample;
2.3, label files of the facial expression image set are manufactured according to original labels of the expression image library, the label of a single image is a 1 x k dimensional binary vector, k dimensions represent that the expression of the image is divided into k classes, the dimension with the label vector of 1 represents that the image belongs to the expression class represented by the dimension, the values of other dimensions are 0, and the label files of the expression image data set and the sensitive area image data set can be shared.
3. The subtask-based expression recognition method for the deep convolutional wavelet neural network as claimed in claim 1, wherein the wavelet pooling layer in step (9) obtains a low frequency subband and a high frequency subband, and the method comprises the following steps:
9.1, performing one-layer downsampling wavelet decomposition on the feature map obtained by the previous layer of convolutional layer, wherein the selected wavelet basis function is a Haar function, and each feature map obtains one low-frequency sub-band and three high-frequency sub-bands through one-layer downsampling wavelet decomposition;
9.2 fuse the three high frequency subbands into a new high frequency subband according to the following formula:
xWH=Maxf(0,xHH,xHL,xLH)
wherein x isHH,xHL,xLHRepresenting three high-frequency subbands, x, obtained by a wavelet decompositionWHRepresenting the fused high-frequency sub-band, and defining a function Maxf (A, B) as a larger value of the corresponding positions of the matrix A and the matrix B;
and 9.3, taking the obtained low-frequency sub-band and the fused high-frequency sub-band as the input of the next full-connection layer.
4. The subtask-based expression recognition method for the deep convolutional wavelet neural network as claimed in claim 1, wherein the fully-connected layer feature vector in step (10) is obtained by the following steps:
10.1, solving a low-frequency subband matrix according to the following formula;
xL=Maxf(0,W1·xLL1+W2·xLL2+W3·xLL3+……+Wn·xLLn)
wherein x isLRepresenting a global low frequency subband matrix, xLLnLow-frequency sub-bands, W, representing a wavelet decomposition of a layer of each feature mapnRepresenting the superposition weight of the low-frequency sub-band of each characteristic diagram;
10.2 solving the high-frequency subband matrix according to the following formula:
xH=Maxf(0,xWH1,xWH2,…xWHn)
wherein x isHRepresenting a global high-frequency subband matrix, xWHnRepresenting a new high-frequency sub-band formed by fusing three high-frequency sub-bands of one-layer wavelet decomposition of each characteristic diagram;
10.3 Global Low frequency subband xLAnd a global high frequency subband xHAnd stretching into a vector of 1 x v according to rows, and connecting the vectors end to obtain the characteristic vector of the fully-connected layer, wherein the size of the characteristic vector is 1 x 2 v.
5. The subtask-based deep convolutional wavelet neural network expression recognition method of claim 1, wherein the subtask weighting specific gravity λ of step (15) is performed according to the following steps:
15.1 initializing λ to 0, and randomly selecting M personal face expression images and corresponding sensitive area images as learning samples of a weight λ;
15.2 for the trained deep convolution wavelet neural network, the learning sample is brought into the network to obtain the classification label according to the following formula:
z3=z1+λz2
wherein z is1Output label z representing the network brought by the facial expression image2Output labels, z, representing images of corresponding sensitive areas brought into the network3Representing a network global tag;
15.3 according to Global Label z3The magnitude of the error with the true tag, the value of λ is updated as follows:
wherein the content of the first and second substances,counting the lambda value corresponding to the minimum label error of each learning sample;
and 15.4, calculating the expected value of the lambda value corresponding to the minimum label error of the M learning samples, wherein the expected value lambda is used as the weighted proportion lambda value of the subsequent auxiliary task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710446076.0A CN107292256B (en) | 2017-06-14 | 2017-06-14 | Auxiliary task-based deep convolution wavelet neural network expression recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710446076.0A CN107292256B (en) | 2017-06-14 | 2017-06-14 | Auxiliary task-based deep convolution wavelet neural network expression recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107292256A CN107292256A (en) | 2017-10-24 |
CN107292256B true CN107292256B (en) | 2019-12-24 |
Family
ID=60096459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710446076.0A Active CN107292256B (en) | 2017-06-14 | 2017-06-14 | Auxiliary task-based deep convolution wavelet neural network expression recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107292256B (en) |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729872A (en) * | 2017-11-02 | 2018-02-23 | 北方工业大学 | Facial expression recognition method and device based on deep learning |
JP6345332B1 (en) * | 2017-11-21 | 2018-06-20 | 国立研究開発法人理化学研究所 | Classification device, classification method, program, and information recording medium |
CN107977677A (en) * | 2017-11-27 | 2018-05-01 | 深圳市唯特视科技有限公司 | A kind of multi-tag pixel classifications method in the reconstruction applied to extensive city |
CN109840459A (en) * | 2017-11-29 | 2019-06-04 | 深圳Tcl新技术有限公司 | A kind of facial expression classification method, apparatus and storage medium |
CN108122001B (en) * | 2017-12-13 | 2022-03-11 | 北京小米移动软件有限公司 | Image recognition method and device |
CN108229341B (en) * | 2017-12-15 | 2021-08-06 | 北京市商汤科技开发有限公司 | Classification method and device, electronic equipment and computer storage medium |
CN108090513A (en) * | 2017-12-19 | 2018-05-29 | 天津科技大学 | Multi-biological characteristic blending algorithm based on particle cluster algorithm and typical correlation fractal dimension |
CN109949264A (en) * | 2017-12-20 | 2019-06-28 | 深圳先进技术研究院 | A kind of image quality evaluating method, equipment and storage equipment |
CN108038466B (en) * | 2017-12-26 | 2021-11-16 | 河海大学 | Multi-channel human eye closure recognition method based on convolutional neural network |
CN108171176B (en) * | 2017-12-29 | 2020-04-24 | 中车工业研究院有限公司 | Subway driver emotion identification method and device based on deep learning |
CN108062416B (en) * | 2018-01-04 | 2019-10-29 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating label on map |
CN108021910A (en) * | 2018-01-04 | 2018-05-11 | 青岛农业大学 | The analysis method of Pseudocarps based on spectrum recognition and deep learning |
CN108304788B (en) * | 2018-01-18 | 2022-06-14 | 陕西炬云信息科技有限公司 | Face recognition method based on deep neural network |
CN108363969B (en) * | 2018-02-02 | 2022-08-26 | 南京邮电大学 | Newborn pain assessment method based on mobile terminal |
CN110298212B (en) * | 2018-03-21 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Model training method, emotion recognition method, expression display method and related equipment |
CN108520213B (en) * | 2018-03-28 | 2021-10-19 | 五邑大学 | Face beauty prediction method based on multi-scale depth |
CN108805866B (en) * | 2018-05-23 | 2022-03-25 | 兰州理工大学 | Image fixation point detection method based on quaternion wavelet transform depth vision perception |
CN109580629A (en) * | 2018-08-24 | 2019-04-05 | 绍兴文理学院 | Crankshaft thrust collar intelligent detecting method and system |
CN109543526B (en) * | 2018-10-19 | 2022-11-08 | 谢飞 | True and false facial paralysis recognition system based on depth difference characteristics |
CN109657554B (en) * | 2018-11-21 | 2022-12-20 | 腾讯科技(深圳)有限公司 | Image identification method and device based on micro expression and related equipment |
CN111222624B (en) * | 2018-11-26 | 2022-04-29 | 深圳云天励飞技术股份有限公司 | Parallel computing method and device |
CN109635709B (en) * | 2018-12-06 | 2022-09-23 | 中山大学 | Facial expression recognition method based on significant expression change area assisted learning |
CN109615574B (en) * | 2018-12-13 | 2022-09-23 | 济南大学 | Traditional Chinese medicine identification method and system based on GPU and dual-scale image feature comparison |
CN109919171A (en) * | 2018-12-21 | 2019-06-21 | 广东电网有限责任公司 | A kind of Infrared image recognition based on wavelet neural network |
CN111488764B (en) * | 2019-01-26 | 2024-04-30 | 天津大学青岛海洋技术研究院 | Face recognition method for ToF image sensor |
CN109815924B (en) * | 2019-01-29 | 2021-05-04 | 成都旷视金智科技有限公司 | Expression recognition method, device and system |
CN109934173B (en) * | 2019-03-14 | 2023-11-21 | 腾讯科技(深圳)有限公司 | Expression recognition method and device and electronic equipment |
CN110333088B (en) * | 2019-04-19 | 2020-09-29 | 北京化工大学 | Caking detection method, system, device and medium |
CN110119702B (en) * | 2019-04-30 | 2022-12-06 | 西安理工大学 | Facial expression recognition method based on deep learning prior |
CN110174948B (en) * | 2019-05-27 | 2020-10-27 | 湖南师范大学 | Intelligent language auxiliary learning system and method based on wavelet neural network |
CN110210380B (en) * | 2019-05-30 | 2023-07-25 | 盐城工学院 | Analysis method for generating character based on expression recognition and psychological test |
CN110414394B (en) * | 2019-07-16 | 2022-12-13 | 公安部第一研究所 | Facial occlusion face image reconstruction method and model for face occlusion detection |
CN110399821B (en) * | 2019-07-17 | 2023-05-30 | 上海师范大学 | Customer satisfaction acquisition method based on facial expression recognition |
CN110427892B (en) * | 2019-08-06 | 2022-09-09 | 河海大学常州校区 | CNN face expression feature point positioning method based on depth-layer autocorrelation fusion |
CN111401116B (en) * | 2019-08-13 | 2022-08-26 | 南京邮电大学 | Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network |
CN110717423B (en) * | 2019-09-26 | 2023-03-17 | 安徽建筑大学 | Training method and device for emotion recognition model of facial expression of old people |
CN110889332A (en) * | 2019-10-30 | 2020-03-17 | 中国科学院自动化研究所南京人工智能芯片创新研究院 | Lie detection method based on micro expression in interview |
CN111191704B (en) * | 2019-12-24 | 2023-05-02 | 天津师范大学 | Foundation cloud classification method based on task graph convolutional network |
CN111144348A (en) * | 2019-12-30 | 2020-05-12 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN111178312B (en) * | 2020-01-02 | 2023-03-24 | 西北工业大学 | Face expression recognition method based on multi-task feature learning network |
CN111291670B (en) * | 2020-01-23 | 2023-04-07 | 天津大学 | Small target facial expression recognition method based on attention mechanism and network integration |
CN111401147B (en) * | 2020-02-26 | 2024-06-04 | 中国平安人寿保险股份有限公司 | Intelligent analysis method, device and storage medium based on video behavior data |
CN111382795B (en) * | 2020-03-09 | 2023-05-05 | 交叉信息核心技术研究院(西安)有限公司 | Image classification processing method of neural network based on frequency domain wavelet base processing |
CN111126364A (en) * | 2020-03-30 | 2020-05-08 | 北京建筑大学 | Expression recognition method based on packet convolutional neural network |
CN111652171B (en) * | 2020-06-09 | 2022-08-05 | 电子科技大学 | Construction method of facial expression recognition model based on double branch network |
CN112132058B (en) * | 2020-09-25 | 2022-12-27 | 山东大学 | Head posture estimation method, implementation system thereof and storage medium |
CN112380995B (en) * | 2020-11-16 | 2023-09-12 | 华南理工大学 | Face recognition method and system based on deep feature learning in sparse representation domain |
CN117036149A (en) * | 2020-12-01 | 2023-11-10 | 华为技术有限公司 | Image processing method and chip |
CN112699938B (en) * | 2020-12-30 | 2024-01-05 | 北京邮电大学 | Classification method and device based on graph convolution network model |
CN113095356B (en) * | 2021-03-03 | 2023-10-31 | 北京邮电大学 | Light-weight neural network system and image processing method and device |
CN114445899A (en) * | 2022-01-30 | 2022-05-06 | 中国农业银行股份有限公司 | Expression recognition method, device, equipment and storage medium |
CN114743251B (en) * | 2022-05-23 | 2024-02-27 | 西北大学 | Drama character facial expression recognition method based on shared integrated convolutional neural network |
WO2024039332A1 (en) * | 2022-08-15 | 2024-02-22 | Aselsan Elektroni̇k Sanayi̇ Ve Ti̇caret Anoni̇m Şi̇rketi̇ | Partial reconstruction method based on sub-band components of jpeg2000 compressed images |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872424A (en) * | 2010-07-01 | 2010-10-27 | 重庆大学 | Facial expression recognizing method based on Gabor transform optimal channel blur fusion |
CN105139395A (en) * | 2015-08-19 | 2015-12-09 | 西安电子科技大学 | SAR image segmentation method based on wavelet pooling convolutional neural networks |
CN106056088A (en) * | 2016-06-03 | 2016-10-26 | 西安电子科技大学 | Single-sample face recognition method based on self-adaptive virtual sample generation criterion |
-
2017
- 2017-06-14 CN CN201710446076.0A patent/CN107292256B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872424A (en) * | 2010-07-01 | 2010-10-27 | 重庆大学 | Facial expression recognizing method based on Gabor transform optimal channel blur fusion |
CN105139395A (en) * | 2015-08-19 | 2015-12-09 | 西安电子科技大学 | SAR image segmentation method based on wavelet pooling convolutional neural networks |
CN106056088A (en) * | 2016-06-03 | 2016-10-26 | 西安电子科技大学 | Single-sample face recognition method based on self-adaptive virtual sample generation criterion |
Non-Patent Citations (1)
Title |
---|
Recognition of facial expressions using Gabor wavelets and learning vector quantization;Shishir Bashyal;《Engineering applications of Artificial Intelligence》;20081031;第21卷(第7期);Pages 1056-1063 * |
Also Published As
Publication number | Publication date |
---|---|
CN107292256A (en) | 2017-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107292256B (en) | Auxiliary task-based deep convolution wavelet neural network expression recognition method | |
CN106529447B (en) | Method for identifying face of thumbnail | |
CN109685819B (en) | Three-dimensional medical image segmentation method based on feature enhancement | |
US10891511B1 (en) | Human hairstyle generation method based on multi-feature retrieval and deformation | |
CN109522874B (en) | Human body action recognition method and device, terminal equipment and storage medium | |
CN112288011B (en) | Image matching method based on self-attention deep neural network | |
CN108062543A (en) | A kind of face recognition method and device | |
CN107977661B (en) | Region-of-interest detection method based on FCN and low-rank sparse decomposition | |
CN112464865A (en) | Facial expression recognition method based on pixel and geometric mixed features | |
CN114092833B (en) | Remote sensing image classification method and device, computer equipment and storage medium | |
CN110674685B (en) | Human body analysis segmentation model and method based on edge information enhancement | |
CN111652273B (en) | Deep learning-based RGB-D image classification method | |
CN107784288A (en) | A kind of iteration positioning formula method for detecting human face based on deep neural network | |
CN110503613A (en) | Based on the empty convolutional neural networks of cascade towards removing rain based on single image method | |
CN110826462A (en) | Human body behavior identification method of non-local double-current convolutional neural network model | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN114419732A (en) | HRNet human body posture identification method based on attention mechanism optimization | |
CN112132145A (en) | Image classification method and system based on model extended convolutional neural network | |
CN114445715A (en) | Crop disease identification method based on convolutional neural network | |
CN113554084A (en) | Vehicle re-identification model compression method and system based on pruning and light-weight convolution | |
CN114782979A (en) | Training method and device for pedestrian re-recognition model, storage medium and terminal | |
CN113221660B (en) | Cross-age face recognition method based on feature fusion | |
CN114170657A (en) | Facial emotion recognition method integrating attention mechanism and high-order feature representation | |
CN114049491A (en) | Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium | |
CN117576402A (en) | Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |