CN107506722A

CN107506722A - One kind is based on depth sparse convolution neutral net face emotion identification method

Info

Publication number: CN107506722A
Application number: CN201710714001.6A
Authority: CN
Inventors: 吴敏; 苏婉娟; 陈略峰; 周梦甜; 刘振焘; 曹卫华
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2017-12-22

Abstract

The invention provides one kind to be based on depth sparse convolution neutral net face emotion identification method, first emotion image preprocessing, afterwards affective feature extraction, last affective characteristics identification classification.The present invention is a kind of to be optimized based on depth sparse convolution neutral net face emotion identification method from Nesterov accelerating gradients descent algorithm to the weights of depth sparse convolution neutral net, it is optimal network structure, to improve the generalization of face emotion recognition algorithm, because NAGD has anticipation, prevent to the property of can appreciate that algorithm is progressively too fast or too slow, the responding ability of algorithm can be strengthened simultaneously, and more preferable local optimum can be obtained.

Description

One kind is based on depth sparse convolution neutral net face emotion identification method

Technical field

The present invention relates to one kind to be based on depth sparse convolution neutral net face emotion identification method, belongs to pattern-recognition neck Domain.

Background technology

In recent years, as the development of all kinds of technologies, social intelligence degree are also improving constantly, people increasingly thirst for The man-machine interaction of natural harmony can be experienced.However, emotion is across a letter that can not more more between people and machine all the time Ditch.Thus, the bottleneck for breaking through current affection computation is the key of artificial emotion field development.Expression is the weight that the mankind show emotion One of channel is wanted, face emotion recognition all has in fields such as man-machine interaction, fatigue driving detection, telenursing and pain Assessments Certain application value, application prospect are quite varied.Therefore, realize that more accurately Expression Recognition can promote social intelligenceization to send out Exhibition.

Face emotion recognition can be divided mainly into affective feature extraction and affective characteristics identification classification.Face emotion recognition is still located In laboratory stage, in interactive process, also without image of Buddha people it is equally natural and tripping identify the expression of other side.It is existing Some face emotion recognition algorithms are difficult to extract affective characteristics exactly, and the complexity of algorithm is higher, identify the time used It is longer, the requirement of real-time in interactive process can not be met.Therefore, the spy that there were significant differences between each expression is extracted Sign, the expression of different expression form is more accurately classified, and improves efficiency of algorithm, is the pass for realizing face emotion recognition Key.

Deep learning is a new field in machine learning research, and its motivation, which is to establish, simulates human brain is divided The neutral net of study is analysed, the mechanism that it imitates human brain explains data.Depth sparse convolution neutral net is by convolutional Neural Network, Dropout, Softmax return the neutral net of composition, are one of models of deep learning.The present invention passes through Dropout layers introduce the openness of randomization in depth convolutional network, improve the training effectiveness of network；And its is each The network structure of training optimization is different from, and the optimization of such weights will not be fixed against fixed relationship neuron collective effect, The simultaneous adaptation between neuron is weakened, this is similar to the generative propagation in natural selection, meanwhile, improve network Generalization ability.In deep learning, the selection of optimized algorithm is particularly significant, and often only payes attention to net in some conventional researchs The setting of network structure, traditional gradient descent algorithm are easily trapped into poor local optimum, cause the generalization of neutral net Can be poor.

The content of the invention

In order to solve the deficiencies in the prior art, the invention provides one kind to be based on depth sparse convolution neutral net face feelings Recognition methods is felt, from Nesterov accelerating gradients descent algorithm (Nesterov Accelerated Gradient Descent, NAGD) weights of depth sparse convolution neutral net are optimized, it is optimal network structure, to improve people The generalization of face emotion recognition algorithm.NAGD has anticipation, prevents to the property of can appreciate that algorithm is progressively too fast or too slow, The responding ability of algorithm can be strengthened simultaneously, and more preferable local optimum can be obtained.

The present invention is that technical scheme is used by solving its technical problem：Provide a kind of based on depth sparse convolution god Through network face emotion identification method, comprise the following steps：

(1) emotion image preprocessing：Emotion image pattern to be identified is carried out at rotation correction and face cutting first Reason, affective characteristics key area is extracted, by image normalization to uniform sizes, histogram equalization then is carried out to emotion image Change, obtain pretreated emotion image；

(2) affective feature extraction：First, principal component emotion spy is carried out to pretreated emotion image based on PCA methods Sign extraction, obtain the characteristic of different emotions；Then, albefaction is carried out to the characteristic extracted, obtains emotion to be identified The PCA characteristic patterns of image；

(3) affective characteristics identification classification：Structure one is returned by convolutional layer, sub-sampling layer, Dropout layers and Softmax The depth sparse convolution neutral net of layer composition, the label value of the PCA characteristic patterns of training set and corresponding emotion is inputted first deep Sparse convolution neutral net is spent, and it is excellent to the progress of depth sparse convolution neutral net using Nesterov accelerating gradients descent algorithm Change, the PCA characteristic patterns of emotion image to be identified are then inputted into depth sparse convolution neutral net, export recognition result, i.e. feelings Feel the label value corresponding to classification.

Step (1) the emotion image preprocessing, specifically includes procedure below：

Two and three characteristic points of nose in (1-1) demarcation emotion image, obtain the coordinate value of three characteristic points；

(1-2) rotates according to the coordinate value of left and right two to emotion image, makes two in the same horizontal line, if two Eye between at intervals of d, its midpoint is O；

(1-3) is cut according to face feature to face, and on the basis of O, horizontal direction or so respectively takes d, vertical direction 0.5d and 1.5d rectangular area is respectively cut up and down, and the rectangular area is face emotion subregion；

The change of scale of face emotion subregion is the unified pixel of size 128 × 128 by (1-4)；

(1-5) implements histogram equalization to face emotion subregion, obtains pretreated emotion image.

Step (2) described affective feature extraction, specifically includes following steps：

(2-1) carries out average Regularization using the method for the average brightness value μ for subtracting emotion image, makes in emotion image The characteristic mean of data specifically includes procedure below all near 0：

The pretreated emotion view data that size is 128 × 128 is stored in 128 × 128 matrix, i.e. { x ′⁽¹⁾,x′⁽²⁾,…,x′⁽ⁿ⁾, x '⁽ⁱ⁾∈Rⁿ, n=128, every pretreated emotion image is entered with formula (1) and formula (2) Row zero averaging：

x′⁽ⁱ⁾=x '⁽ⁱ⁾-μ (2)

The covariance matrix Σ of emotion image after (2-2) calculating zero averaging characteristic vector U, wherein Σ calculating are public Formula is：

Then emotion image pixel point value x ' characteristic vector U base { u₁,u₂,…,u_nExpression：

(2-3) selects x '_rotPreceding k principal component retain 99% variance, that is, choose the minimum for meeting k during formula (5) Value：

λ_jRepresent j-th of characteristic value corresponding in characteristic vector U；

(2-4) is by x '_rotIn addition to the k principal component to be retained, remaining all zero setting, ifIt is x '_rotApproximate representation, thenIt is expressed as：

(2-5) is rightZoom in and out, to remove the correlation between each feature, make it that all there is unit variance：

(2-6) enters line translation using ZCA albefactions to characteristic vector U, the covariance matrix of emotion image is become as unit square Battle array I：

x′_ZCAwhite=Ux '_PCAwhite (8)

Then x '_ZCAwhiteThe PCA characteristic patterns of emotion image as to be identified.

Label value 1~7 described in step (3) respectively with it is angry, detest, fear, happily, this neutral, sad and surprised 7 class Emotion corresponds.

Step (3) the affective characteristics identification classification, specifically includes procedure below：

(3-1) creates a depth being made up of successively convolutional layer, sub-sampling layer, Dropout layers and Softmax recurrence layer Sparse convolution neutral net, training set data, which is inputted into depth sparse convolution neutral net, the training set data, includes instruction Practice the PCA characteristic patterns of collection and the label value of corresponding emotion, i.e. { (x₁,y₁),...,(x_m,y_m), and y_m∈ { 1,2 ..., k }, its Middle x_iFor the PCA characteristic patterns of training set, y_iFor x_iCorresponding affective tag value, i ∈ { 1,2 ..., m }, using NAGD algorithms pair Depth sparse convolution nerve net is iterated training, and the repetitive exercise includes procedure below：

(3-1-1) is shuffled at random to training set data, by the packet in training set, the quantity of data in every group Unanimously, and by each group sequentially input in depth sparse convolution neutral net；

(3-1-2) every group of training set data passes through convolutional layer respectively first, and it is 29 that the convolutional layer, which sets 100 dimensions, × 29 convolution kernel, convolution kernel moving step length are 1；Depth sparse convolution neutral net excavates the PCA of training set by convolution kernel Local association information in characteristic pattern, the implementation process of convolutional layer are：

a_i,k=f (x_i*rot90(W_k,2)+b_k) (9)

Wherein, a_i,kIt is i-th PCA feature diagram data by k-th of convolution kernel in convolutional layer to the training set of input x_iCarry out the obtained convolution characteristic pattern of process of convolution, * is valid convolution algorithms, W_kRepresent the weights of k-th of convolution, b_kFor kth Deviation corresponding to individual convolution kernel, f () are Sigmoid type activation primitives：

Convolution characteristic pattern caused by (3-1-3) convolutional layer is input into sub-sampling layer, and sub-sampling layer is put down using average pond Equal pond dimension is set to 4, and moving step length 4, the then pond characteristic pattern size that convolution characteristic pattern obtains after sub-sampling layer becomes For original a quarter, characteristic pattern quantity is constant, and average pondization uses below equation：

Wherein, c_jFor jth Zhang Chihua characteristic patterns caused by sub-sampling layer, p is average pond dimension；

(3-1-4) mitigates network over-fitting using Dropout layers, allows all data by Dropout layers at random, I.e. pond characteristic pattern caused by step (3-1-3) does not work, and idle data retain, and its calculating process is：

DropoutTrain (x)=RandomZero (p) × x (12)

Wherein DropoutTrain (x) represents the data matrix obtained after Dropout layers the training stage, The value that RandomZero (p) represents to allow with the Probability p set in the data matrix x for inputting this layer is set to 0；

(3-1-5) returns layer to the data matrix row Classification and Identification that is obtained after Dropout layers using Softmax：

(3-1-5-1), which is used, assumes function h_θ(x) calculate the data matrix obtained after Dropout layers and appear in each Expression classification j probable value p (y=j | x), h_θ(x) output is a k dimensional vector, and each vector element value corresponds to this k respectively The probable value of classification, and vector element and for 1, h_θ(x) form is：

Wherein θ₁,θ₂,...,θ_k∈Rⁿ⁺¹It is the parameter of model, is obtained in training starting stage random assignment；x⁽ⁱ⁾Represent warp The i-th pond feature diagram data crossed after Dropout layers in obtained data matrix；

(3-1-5-2) Softmax is returned layer and classifying quality is evaluated using cost function J (θ)：

Wherein 1 { y⁽ⁱ⁾=j } it is indicative function, its value rule is 1 { value is genuine expression formula }=1, such as 1 { 1+1=3 } =0,1 { 1+1=2 }=1, y⁽ⁱ⁾Represent affective tag value；

To above formula derivation, gradient formula is obtained：

Wherein weight attenuation term in λ expressions (15)The factor, be preset value；

(3-1-6) utilizes reverse conduction algorithm, calculates cost function J (W, b in the residual sum Softmax recurrence of each layer；x, Y) gradient of each parameter θ of network, specifically includes procedure below in：

(3-1-6-1) if l layers are to be connected to l+1 layers entirely, the residual computations of l layers use below equation：

δ^(l)=((W^(l))^Tδ^(l+1))·f‘(z^(l)) (16)

Parameter W gradient calculation formula is：

Parameter b gradient calculation formula is：

Wherein, δ^(l+1)It is the residual error of l+1 layers in network, J (W, b；X, y) it is cost function, (W, b) is weights and threshold value Parameter, (x, y) are training data and label respectively；

(3-1-6-2) if l layers are convolutional layers, l+1 layers are sub-sampling layers, then residual error is propagated by following formula：

Wherein k is the numbering of convolution kernel,Represent x_i*rot90(W_k,2)+b_k,It is Sigmoid type activation primitives Partial derivative, its form are：

The gradient for the θ that (3-1-7) basis is calculated, NAGD utilize momentum term γ v_t-1Carry out undated parameter θ, pass through calculating θ-γv_t-1To obtain the approximation of parameter θ Future Positions, NAGD more new formula is：

θ=θ-v_t (22)

Wherein,It is by (x in training set⁽ⁱ⁾,y⁽ⁱ⁾) gradient of parameter θ is calculated, α is study Rate, v_tIt is current velocity vector, v_t-1It is the velocity vector in last round of iteration, α is initially set to 0.1, v_tIt is initially set to 0, it is identical with parameter vector θ dimensions, γ ∈ (0,1], γ is arranged to 0.5 in the training starting stage, trains iteration knot for the first time Increased to 0.95 after beam；

(3-1-8) return to step (3-1-1), the iterations until reaching setting, complete depth sparse convolution nerve net The training optimization of network；

The PCA characteristic patterns of emotion image to be identified are inputted depth sparse convolution neutral net by (3-2), and it is identified Classification：

The PCA characteristic patterns of (3-2-1) emotion image to be identified first pass around convolutional layer and sub-sampling layer, by x '_ZCAwhiteReplace For the input x in formula (9)_i, obtain the PCA features to the emotion image to be identified of input by k-th of convolution kernel of convolutional layer Figure carries out the convolution characteristic pattern a ' that process of convolution obtains_i,k；

Again by a '_i,kSubstitute into formula (11) and substitute a therein_i,k, the pond characteristic pattern c ' of emotion image to be identified is obtained, i.e., High-rise affective characteristics；

When the pond characteristic pattern c ' of (3-2-2) emotion image to be identified continues through Dropout layers, then c ' is averaged Processing：

DropoutTest (c ')=(1-p) × c ' (23)

DropoutTest (c ') represents that the pond characteristic pattern c ' of emotion image to be identified is obtained after continuing through Dropout layers The data matrix arrived；

(3-2-3) returns the hypothesis function h of layer using Softmax_θ(x) calculate c ' and appear in each expression classification j's Probable value, and the classification j corresponding to obtained most probable value is exported, i.e. output category result.

The present invention is based on beneficial effect possessed by its technical scheme：

The present invention introduces the openness of randomization by Dropout layers in depth sparse convolution neutral net, and its is every The network structure of secondary training optimization is different from, and the optimization of such weights will not be fixed against fixed relationship neuron and make jointly With weakening the simultaneous adaptation between neuron, improve the generalization ability and training effectiveness of network.Using NAGD to depth The weights of degree convolutional neural networks optimize, and are optimal network structure；Compared with traditional gradient descent algorithm, NAGD With anticipation, prevent that algorithm is progressively too fast or too slow predictablely, while the responding ability of algorithm can be strengthened, and can Obtain more preferable local optimum.

Brief description of the drawings

Fig. 1 overall procedure block diagrams of the present invention.

Fig. 2 emotion image preprocessing schematic diagrames.

Fig. 3 extracts the face affective characteristics image after feature based on PCA.

Fig. 4 depth sparse convolution neutral net schematic diagrames.

Fig. 5 JAFFE and CK+ database portion partial image samples.

P is to recognition effect and the influence line chart of training time in Fig. 6 Dropout layers.

Fig. 7 symmetry transformation image comparison figures.

Fig. 8 tests confusion matrix.

Man-machine interactive system topology diagrams of the Fig. 9 based on face emotion recognition.

Figure 10 GUI system Debugging interfaces.

Embodiment

The invention will be further described with reference to the accompanying drawings and examples.

The invention provides one kind to be based on depth sparse convolution neutral net face emotion identification method, its overall procedure frame Figure is as shown in Figure 1.Image preprocessing is carried out to emotion image pattern first, i.e., face direction is corrected and cut, and it is right It implements histogram equalization；Afterwards based on PCA extraction bottom affective characteristicses；Finally using constructed depth sparse convolution god Learn high-rise affective characteristics through Web Mining and identified classification, optimization is trained to network weight using NAGD, is made whole Individual network structure is optimal, and face emotion recognition performance is improved with this.

Three parts can be divided mainly into based on depth sparse convolution neutral net face emotion identification method, i.e. emotion image is pre- Processing, affective feature extraction and affective characteristics identification classification, implementation process are as follows：

(1) emotion image preprocessing：As shown in Fig. 2 emotion image pattern to be identified is carried out first rotation correction and Face cutting is handled, and extracts affective characteristics key area, then image normalization to uniform sizes is carried out straight to emotion image Side's figure equalization, obtains pretreated emotion image；Specifically include procedure below：

(1-1) comes to demarcate two and three features of nose in emotion image manually using function [x, y]=ginput (3) Point, obtain the coordinate value of three characteristic points；

(2) affective feature extraction：First, principal component emotion spy is carried out to pretreated emotion image based on PCA methods Sign extraction, obtain the variant and characteristic that is easily processed between different emotions；Then, the characteristic extracted is carried out Albefaction, obtain the PCA characteristic patterns of emotion image to be identified.What is obtained extracts the face emotion image after affective characteristics based on PCA As shown in figure 3, specifically include following steps：

x′⁽ⁱ⁾=x '⁽ⁱ⁾-μ (2)

x′_ZCAwhite=Ux '_PCAwhite (8)

(3) affective characteristics identification classification：Structure one by convolutional layer, sub-sampling layer (pond layer), Dropout layers and Softmax return layer composition depth sparse convolution neutral net as shown in Figure 4, wherein convolutional layer, pond layer and Dropout layers excavate high-rise affective characteristics and it are learnt, and Softmax returns layer to be known to the affective characteristics learnt Do not classify, and the label value corresponding to output category result, i.e. emotional category.Described label value 1~7 respectively with it is angry, detest Dislike, fear, happily, this neutral, sad and surprised 7 class emotion correspond.

The label value of the PCA characteristic patterns of training set and corresponding emotion is inputted into depth sparse convolution neutral net first, and Depth sparse convolution neutral net is optimized using Nesterov accelerating gradients descent algorithm, network structure is reached most It is excellent, to improve the generalization of face emotion recognition algorithm, after network training terminates, network best initial weights are preserved, are obtained optimal The depth sparse convolution neutral net of change.Then in test phase, by test set, i.e., the PCA characteristic patterns of emotion image to be identified Depth sparse convolution neutral net is inputted, exports recognition result, i.e. label value corresponding to emotional category.Specifically include following mistake Journey：

a_i,k=f (x_i*rot90(W_k,2)+b_k) (9)

DropoutTrain (x)=RandomZero (p) × x (12)

(3-1-5) returns layer using Softmax and carries out Classification and Identification to input data：

To above formula derivation, gradient formula is obtained：

(3-1-6) utilizes reverse conduction algorithm, calculates cost function J (W, b in the residual sum Softmax recurrence of each layer； X, y) in each parameter θ of network gradient, specifically include procedure below：

δ^(l)=((W^(l))^Tδ^(l+1))·f‘(z^(l)) (16)

Parameter W gradient calculation formula is：

Parameter b gradient calculation formula is：

θ=θ-v_t (22)

(3-1-8) return to step (3-1-1), it is known that reach the iterations of setting, complete depth sparse convolution nerve net The training optimization of network；

DropoutTest (c ')=(1-p) × c ' (23)

Tested using the above method, the database of face emotion used is JAFFE and CK+ databases, database portion Divide sample image as shown in figure 5, wherein the first behavior JAFFE database samples, the second behavior CK+ database samples.JAFFE numbers 213 width gray level images being made up of according to storehouse 7 kinds of basic facial expressions of 10 women, image size are 256 × 256, everyone Every kind of facial expression image has 2 to 4.CK+ databases are made up of the other adult of 210 different race and sexs of 18 to 50 one full year of life, altogether Comprising 326 groups of facial expression image sequences for having label, every image size is 640 × 490, altogether including 7 kinds of expressions, i.e., anger, Detest, be frightened, happy, sad, surprised and despise.It is neutral expression to take the expression under its tranquility, and in addition to despising Seven kinds of basic facial expressions needed for six kinds of expression peak image frame compositions, totally 399.

The use of the 80% of JAFFE Facial expression databases is training sample, 20% is test sample.Change Dropout layers Middle p value size, curve map is obtained as shown in fig. 6, as seen from the figure, with the increase of p value, the training time is gradually shortened, and identifies Rate shows a rising trend.This shows that when being trained to depth sparse convolution neutral net it is appropriate to be selected in Dropout layers P value be advantageous to improve network Generalization Capability and shorten needed for training time.P is considered to training time and discrimination Influence, the present invention from p=0.5 is optimal value, and it can effectively reduce the time needed for network training, and training effectiveness is significantly Improve, and network performance is also improved, and can obtain good recognition effect.

One of current common problem of deep learning algorithm is that it needs substantial amounts of data to carry out in the training stage Study.However, available data volume is insufficient for the data needed for deep learning algorithm in some existing public databases. Thus, to increase training samples number, it is unlikely to some samples repeated occur again, all original samples is symmetrically become Change, double database sample size, symmetry transformation image comparison figure is as shown in Figure 7.For the effective of checking increase sample Property, set control variable experiment as follows：The use of the 80% of JAFFE Facial expression databases is training sample, 20% is test specimens This.Each parameter constant of algorithm is kept, the depth sparse convolution neutral net proposed is trained with two groups of training sets, wherein Two groups of training sets are made up of the image after original image and increase symmetry transformation respectively, and two groups of experiments are the same using test set 's.Because Dropout layers have a significant impact for recognition effect, for influence of the prominent increase sample to recognition effect, this experiment P is set to 0, masks Dropout layers, network is optimized using NAGD, experimental result is as shown in table 1：

The experimental result of table 1 contrasts

Table 2 gives traditional stochastic gradient descent algorithm (Momentum based Stochastic based on momentum Gradient Descent, MSGD) and NAGD algorithms obtained emotion is trained to depth sparse convolution neutral net and is known Other effect.Experimental data base uses JAFFE databases, training sample increase symmetry transformation image, and ε takes 1, p to take 0.5.Experiment knot Fruit shows, network is trained using NAGD than more being stablized to the experimental result that network is trained to obtain using MSGD, Recognition effect is more preferable.

NAGD the and MSGD experimental results of table 2

To verify the validity of algorithm proposed by the invention, tested respectively in JAFFE and CK+ databases.Use The 80% of JAFFE Facial expression databases is training sample, and 20% is test sample.CK+ databases are with respect to JAFFE databases The scope of the age, sex and the race that are covered are wider, and for preferably study, to all kinds of affective characteristicses of all kinds of people, CK+ storehouses make With the training set than JAFFE greater proportion, that is, the image for choosing database 90% is training set, and residue 10% is test set.Instruction Practice the image concentrated and increased per pictures after a symmetry transformation, ε takes 1, p to take 0.5, and obtained experimental result is as shown in table 3.

The recognition result that table 3 obtains on JAFFE and CK+ databases

As shown in Table 3, the algorithm proposed achieves good recognition effect on JAFFE and CK+ databases, The discrimination obtained on JAFFE is 97.62%, and the discrimination obtained on CK+ is 95.12%.According to the training obtained in table Time and recognition time, it is with must be averaged training time of every image of amount of images in itself divided by training set/test set 0.6757 second, the recognition time of average every image was 0.1258 second.The discrimination of two all kinds of expressions of experiment and mistake are classified Situation as shown in the confusion matrix in Fig. 8, wherein AN., DI., FE., NE., SA. and SU. correspond to respectively it is angry, detest, be frightened, Happily, this neutral, sad and surprised seven kinds of basic facial expressions.

Text of the invention has built a set of man-machine interactive system based on face emotion recognition algorithm, and the man-machine interactive system is main It is made up of wheeled robot, affection computation work station, router and data transmission set etc., its topology diagram is as shown in Figure 9. The system gathers face emotion image frame data by the Kinect configured on wheeled robot first, transfers data to afterwards Affection computation work station, work station enters data into be identified to the face emotion recognition system trained, is finally worked Stand and recognition result is fed back into wheeled robot, wheeled robot is realized and carry out interacting for natural harmony with people.

Gui interface is built at the system debug interface of man-machine interactive system by MATLAB 2016a to be debugged, GUI Interface schematic diagram is as shown in Figure 10.In GUI system Debugging interface, image preview button is clicked on, system can call Kinect color Color camera, and by the image real-time display captured on gui interface left image window；Click on emotion recognition button, meeting Obtain the image currently captured and include it, in gui interface right image window, to obtain two manually afterwards and nose is sat Mark, to be corrected and cut out to face, and the facial image cut out is inputted into the depth convolutional neural networks trained Carry out face emotion recognition；Last recognition result will feed back to gui interface and show.

7 kinds of basic facial expression picture frames for collecting 2 groups of 3 people are training set, and input to depth convolutional neural networks and carry out Training, the picture frame for afterwards capturing Kinect are inputted to the network trained and are identified.Table 4 gives this 7 kinds of 3 people The ONLINE RECOGNITION result of basic facial expression,

The application experiment result of table 4

As seen from table, three groups of empirical average discriminations are 76.190%, this demonstrate the present invention in actual applications before Scape.

Claims

1. one kind is based on depth sparse convolution neutral net face emotion identification method, it is characterised in that comprises the following steps：

(1) emotion image preprocessing：Carry out rotation correction and face cutting processing first to emotion image pattern to be identified, carry Affective characteristics key area is taken, by image normalization to uniform sizes, histogram equalization then is carried out to emotion image, obtained Pretreated emotion image；

(2) affective feature extraction：First, principal component affective characteristics is carried out to pretreated emotion image based on PCA methods to carry Take, obtain the characteristic of different emotions；Then, albefaction is carried out to the characteristic extracted, obtains emotion image to be identified PCA characteristic patterns；

(3) affective characteristics identification classification：Structure one returns layer group by convolutional layer, sub-sampling layer, Dropout layers and Softmax Into depth sparse convolution neutral net, it is dilute that the PCA characteristic patterns of training set and the label value of corresponding emotion are inputted into depth first Convolutional neural networks are dredged, and depth sparse convolution neutral net is optimized using Nesterov accelerating gradients descent algorithm, Then the PCA characteristic patterns of emotion image to be identified are inputted into depth sparse convolution neutral net, exports recognition result, i.e. emotion class Not corresponding label value.

2. according to claim 1 be based on depth sparse convolution neutral net face emotion identification method, it is characterised in that： Step (1) the emotion image preprocessing, specifically includes procedure below：

(1-2) rotates according to the coordinate value of left and right two to emotion image, makes two in the same horizontal line, if two it Between at intervals of d, its midpoint is O；

(1-3) is cut to face according to face feature, and on the basis of O, horizontal direction or so respectively takes d, above and below vertical direction Each rectangular area for cutting 0.5d and 1.5d, the rectangular area is face emotion subregion；

3. according to claim 1 be based on depth sparse convolution neutral net face emotion identification method, it is characterised in that： Step (2) described affective feature extraction, specifically includes following steps：

(2-1) carries out average Regularization using the method for the average brightness value μ for subtracting emotion image, makes data in emotion image Characteristic mean all near 0, specifically include procedure below：

The pretreated emotion view data that size is 128 × 128 is stored in 128 × 128 matrix, i.e. { x '⁽¹⁾,x ′⁽²⁾,…,x′⁽ⁿ⁾, x '⁽ⁱ⁾∈Rⁿ, n=128, it is equal that zero is carried out to every pretreated emotion image with formula (1) and formula (2) Value：

<mrow> <mi>&mu;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

x′⁽ⁱ⁾=x '⁽ⁱ⁾-μ (2)

The covariance matrix Σ of emotion image after (2-2) calculating zero averaging characteristic vector U, wherein Σ calculation formula For：

<mrow> <mi>&Sigma;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>=</mo> <msup> <mi>U</mi> <mi>T</mi> </msup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mn>1</mn> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mn>2</mn> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mi>n</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

(2-3) selects x '_rotPreceding k principal component retain 99% variance, that is, choose the minimum value for meeting k during formula (5)：

<mrow> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </msubsup> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> </mrow> </mfrac> <mo>&GreaterEqual;</mo> <mn>0.99</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

(2-4) is by x '_rotIn addition to the k principal component to be retained, remaining all zero setting, ifIt is x '_rotApproximate representation, thenTable It is shown as：

<mrow> <msup> <mover> <mi>x</mi> <mo>~</mo> </mover> <mo>&prime;</mo> </msup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&ap;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>n</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>x</mi> <mrow> <mi>P</mi> <mi>C</mi> <mi>A</mi> <mi>w</mi> <mi>h</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mo>,</mo> <mi>i</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>=</mo> <mfrac> <msubsup> <mover> <mi>x</mi> <mo>~</mo> </mover> <mi>i</mi> <mo>&prime;</mo> </msubsup> <msqrt> <mrow> <msub> <mi>&lambda;</mi> <mi>i</mi> </msub> <mo>+</mo> <mi>&epsiv;</mi> </mrow> </msqrt> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

(2-6) enters line translation using ZCA albefactions to characteristic vector U, the covariance matrix of emotion image is changed into unit matrix I：

x′_ZCAwhite=Ux '_PCAwhite (8)

4. according to claim 1 be based on depth sparse convolution neutral net face emotion identification method, it is characterised in that： Label value 1~7 described in step (3) respectively with it is angry, detest, fear, happily, this neutral, sad and surprised 7 class emotion one by one It is corresponding.

5. according to claim 3 be based on depth sparse convolution neutral net face emotion identification method, it is characterised in that： Step (3) the affective characteristics identification classification, specifically includes procedure below：

It is sparse that (3-1) creates a depth being made up of successively convolutional layer, sub-sampling layer, Dropout layers and Softmax recurrence layer Convolutional neural networks, training set data, which is inputted into depth sparse convolution neutral net, the training set data, includes training set PCA characteristic patterns and corresponding emotion label value, i.e. { (x₁,y₁),...,(x_m,y_m), and y_m∈ { 1,2 ..., k }, wherein x_i For the PCA characteristic patterns of training set, y_iFor x_iCorresponding affective tag value, i ∈ { 1,2 ..., m } are dilute to depth using NAGD algorithms Dredge convolutional Neural net and be iterated training, the repetitive exercise includes procedure below：

(3-1-1) is shuffled at random to training set data, by the packet in training set, the quantity one of data in every group Cause, and each group is sequentially input in depth sparse convolution neutral net；

(3-1-2) every group of training set data passes through convolutional layer respectively first, and it is 29 × 29 that the convolutional layer, which sets 100 dimensions, Convolution kernel, convolution kernel moving step length are 1；Depth sparse convolution neutral net excavates the PCA characteristic patterns of training set by convolution kernel In local association information, the implementation process of convolutional layer is：

a_i,k=f (x_i*rot90(W_k,2)+b_k) (9)

Wherein, a_i,kIt is i-th PCA feature diagram datas x by k-th of convolution kernel in convolutional layer to the training set of input_iCarry out The convolution characteristic pattern that process of convolution obtains, * are valid convolution algorithms, W_kRepresent the weights of k-th of convolution, b_kFor k-th of convolution Deviation corresponding to core, f () are Sigmoid type activation primitives：

Convolution characteristic pattern caused by (3-1-3) convolutional layer is input into sub-sampling layer, and sub-sampling layer is using average pond, average pond Change dimension and be set to 4, moving step length 4, the then pond characteristic pattern size that convolution characteristic pattern obtains after sub-sampling layer is changed into former The a quarter come, characteristic pattern quantity is constant, and average pondization uses below equation：

(3-1-4) mitigates network over-fitting using Dropout layers, allows all data by Dropout layers at random, that is, walks Suddenly pond characteristic pattern does not work caused by (3-1-3), and idle data retain, and its calculating process is：

DropoutTrain (x)=RandomZero (p) × x (12)

Wherein DropoutTrain (x) represents the data matrix obtained after Dropout layers the training stage, RandomZero (p) value for representing to allow with the Probability p set in the data matrix x for inputting this layer is set to 0；

(3-1-5-1), which is used, assumes function h_θ(x) calculate the data matrix obtained after Dropout layers and appear in each expression Classification j probable value p (y=j | x), h_θ(x) output is a k dimensional vector, and each vector element value corresponds to this k classification respectively Probable value, and vector element and for 1, h_θ(x) form is：

<mrow> <msub> <mi>h</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mn>1</mn> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mn>2</mn> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>k</mi> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>j</mi> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </mfrac> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mn>1</mn> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mn>2</mn> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>k</mi> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>

Wherein θ₁,θ₂,...,θ_k∈Rⁿ⁺¹It is the parameter of model, is obtained in training starting stage random assignment；x⁽ⁱ⁾Represent by warp The i-th pond feature diagram data crossed after Dropout layers in obtained data matrix；

<mrow> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <mo>&lsqb;</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mn>1</mn> <mo>{</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>}</mo> <mi>log</mi> <mfrac> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>j</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>l</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> </mrow> </mfrac> <mo>&rsqb;</mo> <mo>+</mo> <mfrac> <mi>&lambda;</mi> <mn>2</mn> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mi>&theta;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>2</mn> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>

Wherein 1 { y⁽ⁱ⁾=j } it is indicative function, its value rule is 1 { value is genuine expression formula }=1, such as 1 { 1+1=3 }=0,1 { 1+1=2 }=1, y⁽ⁱ⁾Represent affective tag value；

To above formula derivation, gradient formula is obtained：

<mrow> <msub> <mo>&dtri;</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mo>&lsqb;</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>{</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>}</mo> <mo>-</mo> <mi>p</mi> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>)</mo> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>&lambda;&theta;</mi> <mi>j</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>

(3-1-6) utilizes reverse conduction algorithm, calculates cost function J (W, b in the residual sum Softmax recurrence of each layer；X, y) in The gradient of each parameter θ of network, specifically includes procedure below：

δ^(l)=((W^(l))^Tδ^(l+1))·f‘(z^(l)) (16)

Parameter W gradient calculation formula is：

<mrow> <msub> <mo>&dtri;</mo> <msup> <mi>W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>,</mo> <mi>b</mi> <mo>;</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <msup> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>17</mn> <mo>)</mo> </mrow> </mrow>

Parameter b gradient calculation formula is：

<mrow> <msub> <mo>&dtri;</mo> <msup> <mi>b</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>,</mo> <mi>b</mi> <mo>;</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow>

Wherein, δ^(l+1)It is the residual error of l+1 layers in network, J (W, b；X, y) it is cost function, (W, b) is weights and threshold value ginseng Number, (x, y) is training data and label respectively；

<mrow> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>u</mi> <mi>p</mi> <mi>s</mi> <mi>a</mi> <mi>m</mi> <mi>l</mi> <mi>e</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>W</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow>

Wherein k is the numbering of convolution kernel,Represent x_i*rot90(W_k,2)+b_k,It is Sigmoid type activation primitive local derviations Number, its form are：

<mrow> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> <msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mfrac> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow>

The gradient for the θ that (3-1-7) basis is calculated, NAGD utilize momentum term γ v_t-1Carry out undated parameter θ, by calculating θ-γ v_t-1To obtain the approximation of parameter θ Future Positions, NAGD more new formula is：

v_t=γ v_t-1+α▽_θJ(θ-γv_t-1；x⁽ⁱ⁾,y⁽ⁱ⁾) (21)

θ=θ-v_t (22)

Wherein, ▽_θJ(θ；x⁽ⁱ⁾,y⁽ⁱ⁾) it is by (x in training set⁽ⁱ⁾,y⁽ⁱ⁾) it is calculated the gradient of parameter θ, α is learning rate, v_t It is current velocity vector, v_t-1It is the velocity vector in last round of iteration, α is initially set to 0.1, v_t0 is initially set to, with Parameter vector θ dimensions are identical, and γ ∈ (0,1], γ is arranged to 0.5, after training iteration to terminate for the first time in the training starting stage Increased to 0.95；

(3-1-8) return to step (3-1-1), the iterations until reaching setting, complete depth sparse convolution neutral net Training optimization；

The PCA characteristic patterns of emotion image to be identified are inputted depth sparse convolution neutral net by (3-2), it are identified point Class：

The PCA characteristic patterns of (3-2-1) emotion image to be identified first pass around convolutional layer and sub-sampling layer, by x '_ZCAwhiteSubstitute public Input x in formula (9)_i, obtain entering the PCA characteristic patterns of the emotion image to be identified of input by k-th of convolution kernel of convolutional layer The convolution characteristic pattern a ' that row process of convolution obtains_i,k；

Again by a '_i,kSubstitute into formula (11) and substitute a therein_i,k, obtain the pond characteristic pattern c ' of emotion image to be identified, i.e., it is high-rise Affective characteristics；

When the pond characteristic pattern c ' of (3-2-2) emotion image to be identified continues through Dropout layers, then average place is carried out to c ' Reason：

DropoutTest (c ')=(1-p) × c ' (23)

DropoutTest (c ') represents that the pond characteristic pattern c ' of emotion image to be identified continues through what is obtained after Dropout layers Data matrix；

(3-2-3) returns the hypothesis function h of layer using Softmax_θ(x) probability that c ' appears in each expression classification j is calculated It is worth, and the classification j corresponding to obtained most probable value is exported, be i.e. output category result.