CN100492399C

CN100492399C - Method for making human face posture estimation utilizing dimension reduction method

Info

Publication number: CN100492399C
Application number: CNB2007100380763A
Authority: CN
Inventors: 杨杰; 杜春华; 张田昊; 署光; 杨晓超
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2007-03-15
Filing date: 2007-03-15
Publication date: 2009-05-27
Anticipated expiration: 2027-03-15
Also published as: CN101021900A

Abstract

This invention discloses a method for estimating man-face gestures by a dimensionality reduction method including the following steps: 1, preprocessing man-face image trained samples of different gestures, 2, carrying out PCA process to processed data, 3, initializing limited nerve network of a Boltzmann machine, 4, pre-training the limited Boltzmann machine nerve network with the data processed by PCA, 5, adjusting the limited Boltzmqnn nerve network parameters, 6, identifying gestures of new man-face images, which reduces the error rate even further compared with the present technology and reduces its dimensions greatly.

Description

Utilize dimension reduction method to carry out the human face posture estimation approach

Technical field

The present invention relates to the method in a kind of image recognition technology field, specifically is that a kind of dimension reduction method that combines principal component analysis (PCA) (PCA) and limited Boltzmann machine neural network carries out the human face posture estimation approach.

Background technology

The research emphasis of recognition of face at present from the 2 D extension to the three-dimensional, and in the three-dimensional face identification the most the technology of core be exactly how to estimate the 3 d pose of this people's face according to two-dimension human face image.It is a classification problem in essence that attitude is estimated, judges promptly which attitude the people's face in the width of cloth facial image belongs to.But facial image is typical high dimensional data, and general sorting technique can not be applied directly to attitude to be estimated, therefore at first must carry out dimensionality reduction to higher-dimension people face data, and then the data behind the dimensionality reduction is carried out attitude estimate.

Find by prior art documents, G.G.Hilton etc. are at " Science " (science magazine, 2006 the 313rd volume the 504th page of the 5786th phase) " the Reducing the Dimensionality of Data withNeural Networks " that delivers (reducing the data dimension) with neural network, this article has proposed a kind of Nonlinear Dimension Reduction method based on limited Boltzmann machine neural network, experimental results show that this method can be used to attitude and estimate.Utilize the facial image data of a large amount of different attitudes that network is trained in advance and obtain initial weight, and then make network reach optimum condition to the weights adjustment of whole network with raw data.But the training time of this method sharply increases along with the rising of the number of training sample and dimension.But in practical application, facial image is subjected to extraneous interference and has comprised the information and the noise of a lot of redundancies in generative process, information that these are redundant and noise not only can increase computing time, also can influence the accuracy of identification in later stage, therefore be necessary to remove these redundant information and noises in real neural network that these data are utilized before carrying out dimension-reduction treatment, limited in addition Boltzmann machine neural network dimension reduction method also has defectives such as speed is slow, precision is low simultaneously.

Summary of the invention

The objective of the invention is to overcome deficiency of the prior art, provide a kind of dimension reduction method that utilizes to carry out the human face posture estimation approach, promptly at first raw data is carried out PCA (principal component analysis (PCA)) thereby processing removal redundant information and noise, and then utilize the data after neural net method is handled principal component analysis (PCA) to carry out the dimension-reduction treatment operation.This new dimension reduction method is applied to human face posture identification, thereby can provides human face posture information can under multi-angle, carry out recognition of face.

The present invention is achieved by the following technical solutions, the present invention at first carries out principal component analysis (PCA) to training sample and handles, remove redundant information and noise in the raw data, and the data after obtaining PCA and handling, use these data that limited Boltzmann machine neural network is trained in advance then, obtain pre-training network parameter, and adjust whole network parameter with the gradient descending method.The facial image of estimating for pending attitude carries out principal component analysis (PCA) to it and handles at last, and the data after again principal component analysis (PCA) being handled are sent into and carried out the gesture recognition classification in this neural network of succeeding in school.

The present invention specifically comprises the steps:

(1) the facial image training sample to different attitudes carries out pretreatment operation;

(2) pretreated data being carried out principal component analysis (PCA) handles;

(3) the limited Boltzmann machine neural network of initialization;

(4) with the limited Boltzmann machine neural network of the pre-training of the data after the principal component analysis (PCA) processing;

(5) adjust limited Boltzmann machine neural network parameter;

(6) new facial image is carried out gesture recognition;

Described step (1) is meant: for each facial image training sample, at first its be scaled high for h pixel, widely be the image of w pixel.Then the facial image behind this convergent-divergent is transformed to gray level image, and the gray-scale value of all pixels is normalized to [0,1], at last they are pulled into the vector that length is h * w, dimension that promptly should vector is h * w.

Described step (2), be meant: the pretreated data in the step (1) are carried out the principal component analysis (PCA) operation, remove redundant information and noise, keep 98% information, all data are all dropped to the s dimension from h * w, obtain average vector x and proper vector P simultaneously, if the vector data of original h * w dimension is expressed as X, s dimension data behind the dimensionality reduction is expressed as b, and X can be expressed as so: X=x+Pb, the data b behind the dimensionality reduction can be expressed as: b=P ^T(X-x).

Described step (3) is meant: setting this neural network has the L layer.The nodal point number of each layer is respectively N1, N2 ..., NL.The classification number is C, trains in advance and the number of times of adjusting parameter is respectively Pt and Pc.Number according to the network number of plies and every layer of node is determined network structure, and the random number between the generation [0,1] is as connecting weights between the network node simultaneously.

Described step (4), be meant: the node that the visual layers of the limited Boltzmann machine of ground floor in the network is set is corresponding one by one with s value in the middle vector of step (2), the nodal point number that is the visual layers of the limited Boltzmann machine of ground floor is s, train the weighting parameter between this limited Boltzmann machine visual layers node and the hidden layer node, train Pt time altogether.And then with the hidden layer node of the limited Boltzmann machine of ground floor as the limited Boltzmann machine visual layers of second layer node, train the weighting parameter between this limited Boltzmann machine visual layers node and the hidden layer node equally, also train Pt time.The rest may be inferred, i.e. the visual layers node of the limited Boltzmann machine of one deck is to train down the limited Boltzmann machine of one deck under the hidden layer node of the limited Boltzmann machine of the last layer conduct.So just finish the pre-training of whole network, also obtained simultaneously the parameter of the limited Boltzmann machine of each layer that train in advance.

Described step (5) is meant: to utilize the gradient descent method be the criterion backpropagation with the reconstruction error minimum and adjust weighting parameter between all limited Boltzmann machine visual layers nodes and the hidden layer node, and this step is carried out Pc time altogether.

Described step (6), be meant: for the facial image of the new pending gesture recognition of a width of cloth, earlier it is zoomed to the high h that is, wide is the image of w, and be the image transformation of this convergent-divergent gray level image, then the gradation of image value is normalized to [0,1], then the image after the normalization is pulled into the vectorial X that a length is h * w, and utilize b=P ^T(X-x) this vectorial dimension is dropped to s, at last the vector of this s dimension is sent into the network that has trained and carry out gesture recognition.

The human face posture method of estimation that the present invention proposes has quite high precision, does test with the face database of taking, and its identification error rate is 1.95%.The method of directly carrying out dimensionality reduction with limited Boltzmann machine nerve net with not handling through principal component analysis (PCA) is compared, and the error rate of this method further reduces.Simultaneously since the dimension of its data handle through principal component analysis (PCA) after dimensionality reduction greatly, the node number of each layer also reduces thereupon in the follow-up network, this has not only reduced the training time, test speed also greatly improves simultaneously.

Description of drawings

Fig. 1 is a present embodiment attitude synoptic diagram

Wherein: the attitude angle of 1 to 9 facial image of representing respectively is-90 ° ,-60 ° ,-45 ° ,-30 °, 0 °, 30 °, 45 °, 60 °, 90 °.

Fig. 2 is the result of present embodiment gesture recognition

Wherein: the attitude angle of facial image is 60 °.

Embodiment

Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment has provided detailed embodiment and detailed process being to implement under the prerequisite with the technical solution of the present invention, but protection scope of the present invention is not limited to following embodiment.

Embodiment

1. (this face database includes the facial image of 9 different attitudes of 2270 people face database.As shown in Figure 1, the attitude of figure a, b, c, d, e, f, g, h, these 9 facial images of i is respectively-90 ° ,-60 ° ,-45 ° ,-30 °, 0 °, 30 °, 45 °, 60 °, 90 °.Image in the face database is divided into 9 classes according to its different attitude, and each class has 2270.) in all images to be scaled height be 30 pixels, wide is the image of 30 pixels, then the facial image behind this convergent-divergent is transformed to gray level image, and the grey scale pixel value of image is normalized to [0,1], at last gray level image being pulled into length is 900 vector.

2. all vector datas in the step (1) being carried out principal component analysis (PCA) handles, keep 98% information, finally the dimension of vector is dropped to 342 dimensions from 900 dimensions, obtain average vector x and proper vector P simultaneously, if the vector data of original 900 dimensions is expressed as X, 342 dimension data behind the dimensionality reduction are expressed as b, and X can be expressed as so: X=x+Pb, the data b behind the dimensionality reduction can be expressed as: b=P ^T(X-x).

3. setting this neural network haves three layers.The nodal point number of each layer is respectively 300,300, and 800.The classification number is 9, trains in advance and the number of times of adjusting parameter is respectively 50 and 100.Number according to the network number of plies and every layer of node is determined network structure, and the random number between the generation [0,1] is as connecting weights between the network node simultaneously.

4. the node that the visual layers of the limited Boltzmann machine of ground floor in the network is set is corresponding one by one with 342 values in the middle vector of step (2), the nodal point number that is the visual layers of the limited Boltzmann machine of ground floor is 342, train the weighting parameter between 300 nodes of 342 nodes of this limited Boltzmann machine visual layers and hidden layer, train altogether 50 times.And then with the hidden layer node of the limited Boltzmann machine of ground floor as the limited Boltzmann machine visual layers of second layer node, train the weighting parameter between 300 nodes of 300 nodes of this limited Boltzmann machine visual layers and hidden layer equally, also train 50 times.The rest may be inferred, i.e. the visual layers node of the limited Boltzmann machine of one deck is to train down the limited Boltzmann machine of one deck under the hidden layer node of the limited Boltzmann machine of the last layer conduct.So just finish the pre-training of whole network, also obtained simultaneously the parameter of the limited Boltzmann machine of each layer that train in advance.

5. to utilize the gradient descent method be the criterion backpropagation with the reconstruction error minimum and adjust weighting parameter between all limited Boltzmann machine visual layers nodes and the hidden layer node, and this step is carried out 100 times altogether.

6. for the facial image of the new pending gesture recognition of a width of cloth, earlier it being zoomed to height is 30, wide is 30 image, and be the image transformation of this convergent-divergent gray level image, then the gradation of image value is normalized to [0,1], then the image after the normalization being pulled into a length is 900 vector, and utilizes b=P ^T(X-x) this vectorial dimension is dropped to 342, at last the vector of these 342 dimensions is sent into the network that has trained and carry out gesture recognition.As shown in Figure 2, the image of this pending gesture recognition is one 60 ° a facial image, can correctly identify the attitude of this facial image with method of the present invention.

From as can be seen above, the human face posture recognition methods that present embodiment proposes can further be applied to three-dimensional face identification.

Claims

1. one kind is utilized dimension reduction method to carry out the human face posture estimation approach, specifically comprises the steps:

(3) the limited Boltzmann machine neural network of initialization;

(5) adjust limited Boltzmann machine neural network parameter;

(6) new facial image is carried out gesture recognition.

2. the dimension reduction method that utilizes according to claim 1 carries out the human face posture estimation approach, it is characterized in that, described step (1) specifically comprises:

1) for each facial image training sample, at first its be scaled high for h pixel, widely be the image of w pixel;

2) facial image behind this convergent-divergent is transformed to gray level image;

3) gray-scale value of all pixels is normalized to [0,1];

4) they are pulled into the vector that length is h * w, dimension that promptly should vector is h * w.

3. the dimension reduction method that utilizes according to claim 2 carries out the human face posture estimation approach, it is characterized in that, described step (2) specifically comprises:

1) pretreated data is carried out the principal component analysis (PCA) operation, remove redundant information and noise, keep 98% quantity of information;

2) all data are all dropped to the s dimension from h * w, obtain average vector x and proper vector P simultaneously, physical relationship is as follows: X=x+Pb, the data b behind the dimensionality reduction is expressed as: b=P ^T(X-x), wherein X is the vector data of original h * w dimension, and b is the s dimension data behind the dimensionality reduction.

4. the dimension reduction method that utilizes according to claim 1 carries out the human face posture estimation approach, it is characterized in that, described step (3) specifically comprises:

1) sets this neural network the L layer is arranged; The nodal point number of each layer is respectively N1, N2 ..., NL; The classification number is C; The number of times of pre-training and adjustment parameter is respectively Pt and Pc;

2) determine network structure according to the number of the network number of plies and every layer of node, the random number between the generation [0,1] is as connecting weights between the network node simultaneously.

5. the dimension reduction method that utilizes according to claim 3 carries out the human face posture estimation approach, it is characterized in that, described step (4) specifically comprises:

1) s value in the vector is corresponding one by one in the node that the visual layers of the limited Boltzmann machine of ground floor in the network is set and the step (2), the nodal point number that is the visual layers of the limited Boltzmann machine of ground floor is s, train the weighting parameter between this limited Boltzmann machine visual layers node and the hidden layer node, train Pt time altogether;

2) again with the hidden layer node of the limited Boltzmann machine of ground floor as the limited Boltzmann machine visual layers of second layer node, train the weighting parameter between this limited Boltzmann machine visual layers node and the hidden layer node equally, also train Pt time;

3) the rest may be inferred, the hidden layer node conduct that is the limited Boltzmann machine of last layer descends the visual layers node of the limited Boltzmann machine of one deck to train down the limited Boltzmann machine of one deck, finally finish the pre-training of whole network, also obtained simultaneously the parameter of the limited Boltzmann machine of each layer that train in advance.

6. the dimension reduction method that utilizes according to claim 1 carries out the human face posture estimation approach, it is characterized in that, described step (5) is meant: to utilize the gradient descent method be the criterion backpropagation with the reconstruction error minimum and adjust weighting parameter between all limited Boltzmann machine visual layers nodes and the hidden layer node, and this step is carried out Pc time altogether.

7. the dimension reduction method that utilizes according to claim 1 carries out the human face posture estimation approach, it is characterized in that, described step (6) is meant: for the facial image of the new pending gesture recognition of a width of cloth, earlier it is zoomed to the high h that is, wide is the image of w, and is the image transformation of this convergent-divergent gray level image, then the gradation of image value is normalized to [0,1], then the image after the normalization is pulled into the vectorial X that a length is h * w, and utilize b=P ^T(X-x) this vectorial dimension is dropped to s, at last the vector of this s dimension is sent into the network that has trained and carry out gesture recognition.