AU2014350727A1

AU2014350727A1 - Face positioning method and device

Info

Publication number: AU2014350727A1
Application number: AU2014350727A
Authority: AU
Inventors: Chuanyun DENG; Tianlin LIN
Original assignee: SMART CITIES SYSTEM SERVICES PRC CO Ltd
Current assignee: Shenzhen Smart Security & Surveillance Service Robot Co Ltd
Priority date: 2013-11-13
Filing date: 2014-11-12
Publication date: 2016-06-09
Anticipated expiration: 2034-11-12
Also published as: CN103593654A; WO2015070764A1; AU2014350727B2; CN103593654B

Abstract

A face positioning method, comprising: obtaining an original image of a user with a camera; roughly positioning the original image of the user, to obtain a roughly positioned face image; obtaining information of a face detection area according to the roughly positioned face image, the information of the face detection area comprising position information of each part of the face; according to the information of the face detection area, obtaining the precise shape of each part of the face by a local shape fitting method. Also disclosed is a face positioning device. The method and the device reduce computing complexity and improve fitting precision.

Description

FACE POSITIONING METHOD AND DEVICE

[0001] This application claims the benefit of Chinese Patent Application No. 201310560912.X, filed with the Chinese Patent Office on November 13, 2013 and entitled “Method and apparatus for positioning a human face”, which is hereby incorporated by reference in its entirety.

Field [0002] The present invention relates to the field of human-machine interactions and particularly to a method and apparatus for positioning a human face.

Background [0003] Detection and positioning of a human face of a user plays a significant role in the field of human-machine interactions.

[0004] A human face is typically positioned by a separate module instead of a unified framework in the prior art. A feature point of the human face is positioned by an Active Shape Model (ASM) and an improved model thereof at lower fitting precision in the prior art.

Summary [0005] The invention provides a method and apparatus for positioning a human face so as to improve the fitting precision.

[0006] The invention provides a method for positioning a human face, the method including: obtaining an original image of a user using a camera; positioning roughly the original image of the user to obtain a roughly positioned image of a human face; obtaining information about a detected area of the human face from the roughly positioned image of the human face, wherein the information about the detected area of the human face includes positional information of respective parts of the human face; and performing local shape fitting on the information about the detected area of the human face to obtain precise shape information of the respective parts of the human face.

[0007] Preferably the obtaining the information about the detected area of the human face from the roughly positioned image of the human face includes: dividing the roughly positioned image of the human face into several sub-windows; calculating a variance of image in each sub-window, comparing the image variance in the each sub-window with a preset variance threshold, and if the variance is below the preset variance threshold, then determining that the sub-window includes a target area, and accepting the sub-window; otherwise, rejecting the sub-window; inputting the sub-windows with the image variances below the preset variance threshold to an online learning classifier, and obtaining sub-windows output from the online learning classifier; and performing an Non-maximum suppression (NMS) process on the sub-windows output from the online learning classifier, and obtaining the information about the detected area of the human face.

[0008] Preferably the inputting the sub-windows with the image variances below the preset variance threshold to the online learning classifier, and obtaining the sub-windows output from the online learning classifier includes: calculating a posterior probability of a random forest classifier of each of the sub-windows with the image variances below the preset variance threshold, and if the posterior probability is above a preset probability threshold, then accepting the sub-window; otherwise, rejecting the sub-window; and calculating a matching coefficient between each of the sub-windows with the posterior probabilities above the preset probability threshold and a target template in a sample library of a Normalized Cross Correlation (NCC) classifier, and if the matching coefficient is above a preset coefficient threshold, then accepting the sub-window; otherwise, rejecting the sub-window.

[0009] Preferably the local shape fitting is performed as supervised sequence fitting including: the step a of extracting shape information of the respective parts of the human face according to the information about the detected area of the human face, and determining the extracted shape information as initial values of the shapes of the respective parts of the human face; the step b of extracting current feature descriptors according to mark points in the current shapes of the respective parts of the human face, wherein a current feature descriptor vector is consisted of several current feature descriptors; the step c of searching an update matrix library for a corresponding update matrix using the current feature descriptor vector as an index, updating the current shape information of the respective parts of the human face using the corresponding update matrix, obtaining updated current shape information of the respective parts of the human face, and replacing the current shape information of the respective parts of the human face in the step b with the updated current shape information of the respective parts of the human face; the step d of determining whether the preset largest number of iterations is exceeded, or whether an norm error between two latest shape error vectors is below a preset vector norm error threshold, and if so, then proceeding to the step e; otherwise, then returning to the step b; and the step e of obtaining precise shape information of the respective parts of the human face.

[0010] Preferably the method further includes: performing structured learning on the precise shape information of the respective parts of the human face to obtain object functions of the respective parts of the human face; and optimizing the object functions of the respective parts of the human face to obtain optimized positions of the respective parts of the human face.

[0011 ] Preferably the method further includes: tracking moving positions of the respective parts of the human face in two consecutive frames according to the optimized positions of the respective parts of the human face; and updating the online learning classifier according to the moving positions of the respective parts of the human face.

[0012] The invention further provides an apparatus for positioning a human face, the apparatus including: an image obtaining module configured to acquire an original image of a user using a camera, and to send the original image of the user to a roughly positioning module; the roughly positioning module connected with the image obtaining module, configured to position roughly the original image of the user to obtain a roughly positioned image of a human face, and to send the roughly positioned image of the human face to an area detecting module; the area detecting module connected with the roughly positioning module, configured to obtain information about a detected area of the human face from the roughly positioned image of the human face, wherein the information about the detected area of the human face includes positional information of respective parts of the human face; and a fitting module connected with the area detecting module, configured to perform local shape fitting on the information about the detected area of the human face to obtain precise shape information of the respective parts of the human face.

[0013] Preferably the area detecting module includes: a sliding window module configured to divide the roughly positioned image of the human face into several sub-windows; a variance filtering module connected with the sliding window module, configured to calculate a variance of image in each sub-window; to compare the variance of image in the each sub-window with a preset variance threshold, and if the variance is below the preset variance threshold, to determine that the sub-window includes a target area, and to accept the sub-window; otherwise, to reject the sub-window; an online learning module connected with the variance filtering module, configured to input the sub-windows with the variances of image below the preset variance threshold to an online learning classifier, and to obtain sub-windows output from the online learning classifier; and a NMS module connected with the online learning module, configured to perform a NMS process on the sub-windows output from the online learning classifier, and to obtain the information about the detected area of the human face.

[0014] Preferably the apparatus further includes: an optimizing module connected with the fitting module, configured to perform structured learning on the precise shape information of the respective parts of the human face to obtain object functions of the respective parts of the human face, and to optimize the object functions of the respective parts of the human face to obtain optimized positions of the respective parts of the human face.

[0015] Preferably the apparatus further includes: an online updating module connected with the optimizing module, configured to track moving positions of the respective parts of the human face in two consecutive frames according to the optimized positions of the respective parts of the human face, and to update the online learning classifier according to the moving positions of the respective parts of the human face.

[0016] In the embodiment above of the invention, the image of the human face is positioned roughly in the image of the user acquired by the camera to obtain the information about the detected area of the human face, and further local shape fitting is performed on the information about the detected area of the human face to obtain the precise shapes of the respective parts of the human face, thus improving the precision of fitting.

Brief Description of the Drawings [0017] In order to make technical solutions according to embodiments of the invention or in the prior art more apparent, drawings to which reference is made in the description of the embodiments or the prior art will be described below briefly, and apparently the drawings described below are merely illustrative of some of the embodiments of the invention, and those ordinarily skilled in the art can further obtain other drawings from these drawings without any inventive effort. In the drawings: [0018] Fig.l is a schematic flow chart of a method for positioning a human face according to an embodiment of the invention; [0019] Fig.2 is a schematic flow chart of a method for positioning a human face according to another embodiment of the invention; [0020] Fig.3 is a schematic flow chart of a method for positioning a human face according to a further embodiment of the invention; [0021] Fig.4 is a schematic flow chart of an apparatus for positioning a human face according to an embodiment of the invention; [0022] Fig.5 is a schematic flow chart of an apparatus for positioning a human face according to another embodiment of the invention; and [0023] Fig.6 is a schematic structural diagram of a matrix library updating sub-module according to an embodiment of the invention.

Detailed Description of the Embodiments [0024] In order to make technical problems to be addressed by the invention, and technical solutions and advantageous effects of the invention more apparent, the invention will be described below in further details with reference to the drawings and the embodiments thereof.

[0025] Referring to Fig.l illustrating a schematic flow chart of a method for positioning a human face according to an embodiment of the invention, the method includes: [0026] The step S101 is to acquire an original image of a user using a camera.

[0027] Particularly after the original image of the user is acquired, the original image of the user is preprocessed by denoising, illumination-equalizing, etc.

[0028] The step S102 is to position roughly the original image of the user to obtain a roughly positioned image of a human face.

[0029] Particularly the human face is detected and positioned roughly in the original image of the user in the Haar and AdaBoost algorithms, and then a skin color filter is applied based upon a skin color distribution feature of the human face to remove a falsely detected area, the detected area of the human face is cutting out, and the roughly positioned image of the human face is obtained.

[0030] The step S103 is to obtain information about a detected area of the human face from the roughly positioned image of the human face, where the information about the detected area of the human face includes positional information of respective parts of the human face.

[0031] The positional information of the respective parts of the human face includes left eye positional information, right eye positional information, nose positional information, and mouth positional information.

[0032] Particularly this step S103 includes: [0033] In a first sub-step, the roughly positioned image of the human face is divided into several sub-windows (i.e., at least two sub-windows), and an image variance in each sub-window is calculated; [0034] In a second sub-step, for any one of the several sub-windows, the image variance in the sub-window is compared with a preset variance threshold, and if the variance is below the preset variance threshold, then it is determined that the sub-window includes a target area, and the sub-window is accepted; otherwise, the sub-window is rejected; [0035] In a third sub-step, the sub-windows accepted in the second sub-step (i.e., the sub-windows with the image variances below the preset variance threshold) are passed through (i.e., input to) an online learning classifier, and sub-windows passing through the online learning classifier (i.e., the sub-windows output from the online learning classifier) are obtained; and [0036] In a fourth sub-step, a Non-Maximal Suppression (NMS) process is performed on the sub-windows output in the third sub-step, and the information about the detected area of the human face is obtained.

[0037] The step S104 is to perform local shape fitting on the information about the detected area of the human face to obtain precise shape information of the respective parts of the human face.

[0038] Particularly local shape fitting is performed in the Supervised Sequence Method (SSM). The shape information of the respective parts of the human face includes left eye shape information, right eye shape information, nose shape information, and mouth shape information.

[0039] In the embodiment above of the invention, the image of the human face is positioned roughly in the image of the user acquired by the camera to obtain the information about the detected area of the human face, and further local shape fitting is performed on the information about the detected area of the human face to obtain the precise shape information of the respective zones of the human face, thus improving the precision of fitting.

[0040] A method for positioning a human face according to an embodiment of the invention will be described below in further details referring to a schematic flow chart of a method for positioning a human face according to another embodiment of the invention as illustrated in Fig.2.

[0041] The step S201 is to acquire an original image of a user using a camera.

[0042] Particularly after the original image of the user is acquired, the original image of the user is preprocessed by denoising, illumination-equalizing, etc.

[0043] The step S202 is to position roughly the original image of the user to obtain a roughly positioned image of a human face.

[0044] Particularly the human face is detected and positioned roughly in the original image of the user in the Haar and AdaBoost algorithms, and then a skin color filter is applied based upon a skin color distribution feature of the human face to remove a falsely detected area, the detected area of the human face is cutting out, and the roughly positioned image of the human face is obtained.

[0045] The step S203 is to obtain information about a detected area of the human face from the roughly positioned image of the human face, where the information about the detected area of the human face includes positional information of respective parts of the human face.

[0046] The positional information of the respective parts of the human face includes left eye positional information, right eye positional information, nose positional information, and mouth positional information.

[0047] Particularly this step S203 includes: [0048] In a first sub-step, the roughly positioned image of the human face is divided into several sub-windows (i.e., at least two sub-windows), and a variance of image in each sub-window is calculated; [0049] In a second sub-step, for any one of the several sub-windows, the image variance in the sub-window is compared with a preset variance threshold, and if the variance is below the preset variance threshold, then it is determined that the sub-window includes a target area, and the sub-window is accepted; otherwise, the sub-window is rejected; [0050] In a third sub-step, the sub-windows accepted in the second sub-step (i.e., the sub-windows with the image variances below the preset variance threshold) are passed through (i.e., input to) an online learning classifier, and sub-windows passing through the online learning classifier are obtained; [0051] The online learning classifier includes a random forest classifier and a Normalized

Cross Correlation (NCC) classifier; and [0052] In a fourth sub-step, an NMS process is performed on the sub-windows output in the third sub-step (i.e., sub-windows passing the online learning classifier), and the information about the detected area of the human face is obtained.

[0053] The step S204 is to perform local shape fitting on the information about the detected area of the human face to obtain precise shape information of the respective parts of the human face.

[0054] Particularly local shape fitting is performed in the SSM. The shape information of the respective parts of the human face includes left eye shape information, right eye shape information, nose shape information, and mouth shape information.

[0055] The step S205 is to perform structured learning on the precise shape information of the respective parts of the human face to obtain object functions of the respective parts of the human face.

[0056] Particularly structured learning is performed using a Structured Support Vector Machine (SSVM).

[0057] The step S206 is to optimize the object functions of the respective parts of the human face to obtain optimized positions of the respective parts of the human face.

[0058] Particularly the object functions of the respective parts of the human face are optimized in the Stochastic Gradient Descent (SGD) algorithm to obtain the optimized positions of the respective parts of the human face.

[0059] The step S207 is to track moving positions of the respective parts of the human face in two consecutive frames according to the optimized positions of the respective parts of the human face, and to update the online learning classifier according to the moving positions of the respective parts of the human face.

[0060] Particularly the moving positions of the respective parts of the human face in the two consecutive frames are tracked by a forward and backward optical flow tracking algorithm according to the optimized positions of the respective parts of the human face; positive and negative samples of the respective parts of the human face are obtained according to the currently tracked moving positions of the respective part of the human face, and coverage proportions and posterior probabilities of the respective sub-windows; several samples with high confidences (for example, above a preset confidence threshold) are selected from the obtained positive and negative samples of the respective parts of the human face, and features thereof are calculated; and then prior probabilities of the random forest classifier are updated, and a sample library of the NCC classifier is updated by adding the obtained positive and negative samples of the respective parts of the human face to the sample library of the NCC classifier.

[0061] In the embodiment above of the invention, the image of the user is acquired using the camera, and the human face is positioned in the sliding window using the online learning classifier in the NMS algorithm; and since the human face can be positioned at a higher speed using parallel programming due to the feature of the sliding window itself, and no complex operations will be involved in the filter and the classifier, the complexity of calculation can be lowered while guaranteeing the robustness of the program; and the features of the respective parts of the human face can be fit, the positions of the respective parts of the human face can be optimized, and the respective parts of the human face can be tracked to thereby position the human at higher precision and robustness.

[0062] A method for positioning a human face according to an embodiment of the invention will be described below in further details referring to a schematic flow chart of a method for positioning a human face according to a further embodiment of the invention as illustrated in

Fig-3.

[0063] The step S301 is to acquire an original image of a user using a camera.

[0064] Particularly after the original image of the user is acquired, the original image of the user is preprocessed by denoising, lamination-equalizing, etc.

[0065] The step S302 is to position roughly the original image of the user to obtain a roughly positioned image of a human face.

[0066] Particularly the human face is detected and positioned roughly in the original image of the user in the Haar and AdaBoost algorithms, and then a skin color filter is applied based upon a skin color distribution feature of the human face to remove a falsely detected area, the detected area of the human face is cutting out, and the roughly positioned image of the human face is obtained.

[0067] The step S303 is to divide the roughly positioned image of the human face divided into several sub-windows.

[0068] The step S304 is to calculate an image variance in each sub-window, to compare the image variance in the sub-window with a preset variance threshold, and if the variance is below the preset variance threshold, to determine that the sub-window includes a target area, and to accept the sub-window; otherwise, to reject the sub-window.

[0069] The step S305 is to calculate a posterior probability of a random forest classifier of each of the sub-windows accepted in the step S304, and if the posterior probability is above a preset probability threshold, to accept the sub-window; otherwise, to reject the sub-window.

[0070] Particularly the random forest classifier is consisted of thirteen decision trees, each of which has a feature obtained by comparing brightness values of every two of ten random image blocks in each sub-window, and the posterior probability of the random forest classifier is the average of posterior probabilities of the thirteen decision trees, and the distribution of a prior probability of the random forest classifier will be updated in real time after the human face is tracked, to thereby be adapted to a variation in shape and a variation in texture of a target; and for any decision tree, the posterior probability of the decision tree is obtained from the prior probability, and the feature of the decision tree.

[0071] The step S306 is to calculate a matching coefficient between each of the sub-windows accepted in the step S305 and a target template in a sample library of an NCC classifier, and if the matching coefficient is above a preset coefficient threshold, to accept the sub-window; otherwise, to reject the sub-window.

[0072] Particularly the sample library of the NCC classifier will be updated in real time after the human face is tracked, to thereby describe accurately the tracked target.

[0073] The step S307 is to perform an NMS process on the sub-windows output in the step S306 to obtain information about a detected area of the human face.

[0074] Particularly the information about the detected area of the human face includes at least positional information of the left eye, the right eye, the nose, and the mouth in the human face.

[0075] The respective parts of the human face will be fit as described below taking the left eye in the detected area of the human face as an example in the following embodiment.

[0076] The step S308 is to extract the shape of the left eye according to the positional information of the left eye in the human face in the Principal Component Analysis (PCA) algorithm, where the extracted shape of the left eye is an initial value.

[0077] The step S309 is to extract feature descriptors according to mark points in the shape of the left eye, where a feature descriptor vector is consisted of several feature descriptors.

[0078] Particularly the feature descriptors can be extracted in the Scale Invariant Feature Transform (SIFT) algorithm or a variant algorithm thereof.

[0079] The step S310 is to calculate a difference vector between the shape of the left eye and a preset real shape.

[0080] The step S311 is to obtain an update matrix according to the feature descriptor vector in the step S309 and the difference vector in the step S310.

[0081 ] Particularly a norm-2 error function is consisted of the feature descriptor vector in the step S309, the difference vector in the step S310, and the update matrix to be solved, and optimized in the linear least square method to solve the update matrix.

[0082] The step S312 is to apply the update matrix obtained in the step S311 to the shape of the left eye in the step S309 (that is, to perform a vector product operation on the left eye in the step S309 and the update matrix obtained in the step S311) to obtain an updated shape of the left eye, to extract a feature descriptor vector of the updated shape of the left eye, and to store locally the feature descriptor vector of the updated shape of the left eye as an index corresponding to the update matrix obtained in the step S311; and to replace the shape of the left eye in the step S309 with the updated shape of the left eye.

[0083] The step S313 is to determine whether a preset largest number of iterations for updating the matrix library is exceeded, or whether the norm error between two latest update matrixes is below a preset matrix norm error threshold, and if so, to proceed to the step S314; otherwise, to proceed to the step S309.

[0084] The step S314 is to obtain an updated matrix library consisted of the indexes and the update matrixes corresponding to the respective indexes.

[0085] The step S315 is to extract current feature descriptors according to the mark points in the current shape of the left eye, where a current feature descriptor vector is consisted of several current feature descriptors.

[0086] Particularly an initial value of the current shape of the left eye is the shape of the left eye in the step S308.

[0087] The step S316 is to search the update matrix library for a corresponding update matrix using the current feature descriptor vector as an index, to update the current shape of the left eye using the corresponding update matrix, to obtain an updated current shape of the left eye, and to replace the current shape of the left eye in the step S315 with the updated current shape of the left eye.

[0088] The step S317 is to determine whether the preset largest number of iterations is exceeded, or whether an norm error between two latest shape error vector is below a preset vector norm error threshold, and if so, to proceed to the step S318; otherwise, to go back to the step S315.

[0089] The step S318 is to obtain a precise shape of the left eye.

[0090] Particularly alike a precise shape of the right eye, a precise shape of the nose, and a precise shape of the mouth can be obtained as described in the embodiment above. The fitting process in the method above only involves the search and the matrix vector product operation, and the processes of fitting the respective parts of the human face and of extracting the feature descriptor vector can be performed concurrently so as to enable the fitting process to be performed in real time, and moreover since there are a variety of sample libraries of the NCC classifier, and the feature descriptor vector can be invariant in scale, rotatably variable, etc., the fitting process can be performed at high precision and in more real time.

[0091] The step S319 is to extract feature information of the left eye according to the precise shape of the left eye, and to compose a feature vector of the left eye.

[0092] Particularly in the embodiment of the invention, the feature vector of the left eye is composed of the feature information of the left eye in the Histogram of Oriented Gradient (HOG) algorithm, and a linear dimension reduction is performed on the feature vector of the left eye.

[0093] The step S320 is to select some part as an anchor point, and to determine a distance feature vector between the left eye and the part.

[0094] Particularly taking the nose as an anchor point, the pixel differences between the left eye and the nose are calculated, and the square sum of the respective differences is determined as the distance feature vector between the left eye and the part (nose).

[0095] The step S321 is to determine the feature vector of the left eye composed in the step S319 and the distance feature vector determined in the step S320 as a feature mapping function to obtain a target function of the left eye from the feature mapping function.

[0096] Particularly the target function is obtained from the feature mapping function in the SSVM structured algorithm.

[0097] The step S322 is to optimize the target function of the left eye to obtain an optimized position of left eye part.

[0098] Particularly the target function of the left eye is optimized in the SGD algorithm to obtain the optimized position of the left eye part.

[0099] Alike an optimized position of the right eye zone, an optimized position of the nose zone, and an optimized position of the mouth zone can be obtained as described in the method above. Moreover the feature points of the respective parts of the human face are fit and adjusted locally, and the positions of the four parts are adjusted globally per part of the human face, to thereby satisfy constraints on the relative positions of the respective parts of the human face (i.e., shape constraints thereof), where the positions are optimized per part in the SGD algorithm to thereby enable the algorithm to be performed effectively, robustly and in real time.

[0100] The step S323 is to track a moving position of the left eye in two consecutive frames by a forward and backward optical flow tracking algorithm according to the optimized position of the left eye zone.

[0101] The step S324 is to obtain positive and negative samples of the left eye in the human face according to the currently tracked moving position of the left eye, and coverage proportions and posterior probability of the respective sub-windows; [0102] The step S325 is to select several samples with high confidences (for example, above a preset confidence threshold) from the obtained positive and negative samples of the left eye, and to calculate features of the positive and negative samples thereof, and then update the prior probabilities of the random forest classifier.

[0103] The step S326 is to update the sample library of the NCC classifier by adding the obtained positive and negative samples of the left eye to the sample library of the NCC classifier.

[0104] In the embodiment above of the invention, the image of the user is acquired using the camera, and the human face is positioned in the sliding window using the variance filter, the random forest classifier, the NCC classifier, and the NMS algorithm sequentially; and since the human face can be positioned at a higher speed using parallel programming due to the feature of the sliding window itself, and no complex operations will be involved in the filter and the classifiers, the complexity of calculation can be lowered while guaranteeing the robustness of the program; and the features of the respective parts of the human face can be fit, the positions of the respective parts of the human face can be optimized, and the respective parts of the human face can be tracked to thereby position the human at higher precision and robustness.

[0105] Referring to Fig.4 illustrating a schematic structural diagram of an apparatus for positioning a human face according to an embodiment of the invention, the structure of the apparatus according to the embodiment of the invention will be described below in further details.

[0106] An image obtaining module 401 is configured to acquire an original image of a user using a camera, and to send the original image of the user to a roughly positioning module 402.

[0107] Particularly after the original image of the user is acquired, the original image of the user is preprocessed by denoising, lamination-equalizing, etc.

[0108] The roughly positioning module 402 connected with the image obtaining module 401 is configured to position roughly the original image of the user to obtain a roughly positioned image of a human face, and to send the roughly positioned image of the human face to an area detecting module 403.

[0109] Particularly the human face is detected and positioned roughly in the original image of the user in the Haar and AdaBoost algorithms, and then a skin color filter is applied based upon a skin color distribution feature of the human face to remove a falsely detected area, the detected area of the human face is cutting out, and the roughly positioned image of the human face is obtained.

[0110] The area detecting module 403 connected with the roughly positioning module 402 is configured to obtain information about a detected area of the human face from the roughly positioned image of the human face, where the information about the detected area of the human face includes positional information of respective parts of the human face.

[0111] Particularly the area detecting module 403 includes: [0112] A sliding window module 4031 is configured to divide the roughly positioned image of the human face into several sub-windows; [0113] A variance filtering module 4032 connected with the sliding window module 4031 is configured to calculate an image variance in each sub-window; to compare the image variance in the each sub-window with a preset variance threshold, and if the variance is below the preset variance threshold, to determine that the sub-window includes a target area, and to accept the sub-window; otherwise, to reject the sub-window; [0114] An online learning module 4033 connected with the variance filtering module 4032 is configured to input the sub-windows with the image variances below the preset variance threshold to an online learning classifier, and to obtain sub-windows output from the online learning classifier; and [0115] A NMS module 4034 connected with the online learning module 4033 is configured to perform an NMS process on the sub-windows output from the online learning classifier, and to obtain the information about the detected area of the human face.

[0116] Particularly the positional information of the respective parts of the human face includes left eye positional information, right eye positional information, nose positional information, and mouth positional information.

[0117] A fitting module 404 connected with the area detecting module 403 is configured to perform local shape fitting on the information about the detected area of the human face to obtain precise shape information of the respective parts of the human face.

[0118] Particularly local shape fitting is performed in the SSM. The shape information of the respective parts of the human face includes left eye shape information, right eye shape information, nose shape information, and mouth shape information.

[0119] The apparatus further includes: [0120] An optimizing module 405 connected with the fitting module 404 is configured to perform structured learning on the precise shape information of the respective parts of the human face to obtain object functions of the respective parts of the human face, and to optimize the object functions of the respective parts of the human face to obtain optimized positions of the respective parts of the human face.

[0121] Particularly the object functions of the respective parts of the human face are optimized in the SGD algorithm to obtain the optimized positions of the respective parts of the human face.

[0122] An online updating module 406 connected with the optimizing module 405 is configured to track moving positions of the respective parts of the human face in two consecutive frames according to the optimized positions of the respective parts of the human face, and to update the online learning classifier according to the moving positions of the respective parts of the human face.

[0123] Particularly the moving positions of the respective parts of the human face in the two consecutive frames are tracked by a forward and backward optical flow tracking algorithm according to the optimized positions of the respective parts of the human face; positive and negative samples of the respective parts of the human face are obtained according to the currently tracked moving positions of the respective parts of the human face, and coverage proportions and posterior probabilities of the respective sub-windows; several samples with high confidences (for example, above a preset confidence threshold) are selected from the obtained positive and negative samples of the respective parts of the human face, and features thereof are calculated; and then prior probabilities of the random forest classifier are updated, and a sample library of the NCC classifier is updated by adding the obtained positive and negative samples of the respective parts of the human face to the sample library of the NCC classifier.

[0124] In the embodiment above of the invention, the human face can be positioned roughly in the image of the user acquired by the camera to obtain the information about the detected area of the human face, and further the precise shapes of the respective parts of the human face can be obtained through local shape fitting according to the information about the detected area of the human face, thereby improving the precision of fitting.

[0125] Referring to Fig.5 illustrating a schematic structural diagram of an apparatus for positioning a human face according to another embodiment of the invention, the structure of the apparatus according to the embodiment of the invention will be described below in further details.

[0126] An image obtaining module 501 is configured to acquire an original image of a user using a camera, and to send the original image of the user to a roughly positioning module 502.

[0127] Particularly after the original image of the user is acquired, the original image of the user is preprocessed by denoising, lamination-equalizing, etc.

[0128] The roughly positioning module 502 connected with the image obtaining module 501 is configured to position roughly the original image of the user to obtain a roughly positioned image of a human face, and to send the roughly positioned image of the human face to a sliding window module 503.

[0129] Particularly the human face is detected and positioned roughly in the original image of the user in the Haar and AdaBoost algorithms, and then a skin color filter is applied based upon a skin color distribution feature of the human face to remove a falsely detected area, the detected area of the human face is cutting out, and the roughly positioned image of the human face is obtained.

[0130] The sliding window module 503 connected with the roughly positioning module 502 is configured to divide the roughly positioned image of the human face into several sub-windows.

[0131] A variance filtering module 504 connected with the sliding window module 503 is to calculate an image variance in each sub-window, to compare the image variance in the sub-window with a preset variance threshold, and if the variance is below the preset variance threshold, to determine that the sub-window includes a target area, and to accept the sub-window, and to send the accepted sub-window to a random forest classifier 505; otherwise, to reject the sub-window.

[0132] The random forest classifier 505 connected with the variance filter module 504 is configured to calculate a posterior probability of the random forest classifier of each of the sub-windows accepted in the variance filtering module 504, and if the posterior probability is above a preset probability threshold, to accept the sub-window, and to send the accepted sub-window to a NCC classifier 506; otherwise, to reject the sub-window.

[0133] Particularly the random forest classifier is consisted of thirteen decision trees, each of which has a feature obtained by comparing brightness values of every two of ten random image blocks in each sub-window, and the posterior probability of the random forest classifier is the average of posterior probabilities of the thirteen decision trees. The distribution of a prior probability of the random forest classifier will be updated in real time after the human face is tracked, to thereby be adapted to a variation in shape and a variation in texture of a target; and for any decision tree, the posterior probability of the decision tree is obtained from the prior probability, and the feature of the decision tree.

[0134] The NCC classifier 506 connected with the random forest classifier 505 is to calculate a matching coefficient between each of the sub-windows accepted in the random forest classifier 505 and a target template in a sample library of the NCC classifier, and if the matching coefficient is above a preset coefficient threshold, to accept the sub-window; otherwise, to reject the sub-window.

[0135] Particularly the sample library of the NCC classifier will be updated in real time after the human face is tracked, to thereby describe accurately the tracked target.

[0136] An NMS module 507 connected with the NCC classifier 506 is configured to perform an NMS process on the sub-windows accepted in the NCC classifier 506 to obtain information about a detected area of the human face.

[0137] Particularly the information about the detected area of the human face includes at least positional information of the left eye, the right eye, the nose, and the mouth in the human face.

[0138] The apparatus further includes: [0139] A human face parts features fitting module 508 connected with the NMS module 507, further including: [0140] A first extracting sub-module 5081 is configured to extract the shapes of the respective parts of the human face according to the information about the detected area of the human face; [0141] A first feature descriptor vector sub-module 5082 is configured to extract current feature descriptors according to mark points in the current shapes of the respective parts of the human face, where a current feature descriptor vector is consisted of several current feature descriptors; [0142] A first updating sub-module 5083 is configured to search an update matrix library for a corresponding update matrix using the current feature descriptor vector as an index, to update the current shapes of the respective parts of the human face using the corresponding update matrix, and to obtain updated current shapes of the respective parts of the human face; [0143] A first determining sub-module 5084 is configured to determine whether the preset largest number of iterations is exceeded, or whether an norm error between two latest shape error vector is below a preset vector norm error threshold, and if so, to send the updated current shapes of the respective parts of the human face to a first result sub-module 5085; otherwise, to return the updated current shapes of the respective parts of the human face to the first feature descriptor vector sub-module 5082; and [0144] The first result sub-module 5085 is configured to obtain precise shapes of the respective parts of the human face.

[0145] As illustrated in Fig.6, the human face parts features fitting module 508 further includes an update matrix library sub-module 5086 including: [0146] A second extracting sub-module 50861 is configured to extract the shapes of the respective parts of the human face according to the information about the detected area of the human; [0147] A second feature descriptor vector sub-module 50862 is configured to extract feature descriptors according to mark points in the shapes of the respective parts of the human face, where a feature descriptor vector is consisted of several feature descriptors; [0148] A calculating sub-module 50863 is configured to calculate difference vectors between the shapes of the respective parts of the human face and preset real shapes; [0149] An update matrix sub-module 50864 is configured to obtain update matrixes according to the feature descriptor vectors in the second feature descriptor vector sub-module 50862 and the difference vectors in the calculating sub-module 50863; [0150] A second updating sub-module 50865 is configured to apply the update matrixes obtained by the update matrix sub-module 50864 to the shapes of the respective parts of the human face in the second feature descriptor vector sub-module 50862 to obtain updated shapes of the respective parts of the human face, to extract feature descriptor vectors of the updated shapes of the respective parts of the human face, and to store locally the feature descriptor vectors of the updated shapes of the respective parts of the human face as indexes and the update matrixes obtained by the update matrix sub-module 50864, between which there is a correspondence.; [0151] A second determining sub-module 50866 is configured to determine whether a preset largest number of iterations for updating the matrix library is exceeded, or whether the norm error between two latest update matrixes is below a preset matrix norm error threshold, and if so, to send the locally stored update matrixes and the indexes to a second result sub-module 50867; otherwise, to return the updated shapes of the respective parts of the human face to the second feature descriptor vector sub-module 50862 as the shapes of the respective parts of the human face; and [0152] The second result sub-module 50867 is configured to obtain an updated matrix library consisted of the indexes and the update matrixes, between which there is a one-to-one correspondence.

[0153] The apparatus further includes: [0154] A human face parts positions optimizing module 509 connected with the human face parts features fitting module 508 is configured to extract feature information of the respective parts of the human face according to the precise shapes of the respective parts of the human face, to compose feature vectors of the respective parts of the human face, to select some part as an anchor point, to determine a distance feature vector between other parts of the human face and the part, to determine the feature vectors of the respective parts of the human face and the distance feature vectors as feature mapping functions, to obtain target functions of the respective parts of the human face, and to optimize the target functions of the respective parts of the human face to obtain optimized positions of the respective parts of the human face; and [0155] A human face parts tracking module 510 connected with the human face parts positions optimizing module 509 is configured to track moving positions of the respective parts of the human face in two consecutive frames by a forward and backward optical flow tracking algorithm according to the optimized positions of the respective parts of the human face; to obtain positive and negative samples of the respective parts of the human face in the human face according to the currently tracked moving positions of the respective parts of the human face, and coverage proportions and posterior probability of the respective sub-windows; and to select several samples with high confidences (for example, above a preset confidence threshold) from the obtained positive and negative samples of the respective parts of the human face, and to calculate features thereof, and then update the prior probabilities of the random forest classifier, and to update the sample library of the NCC classifier by adding the obtained positive and negative samples of the respective parts of the human face to the sample library of the NCC classifier.

[0156] In the embodiment above of the invention, the image of the user is acquired using the camera, and the human face is positioned in the sliding window using the variance filter, the random forest classifier, the NCC classifier, and the NMS algorithm sequentially; and since the human face can be positioned at a higher speed using parallel programming due to the feature of the sliding window itself, and no complex operations will be involved in the filter and the classifiers, the complexity of calculation can be lowered while guaranteeing the robustness of the program; and the features of the respective parts of the human face can be fit, the positions of the respective parts of the human face can be optimized, and the respective parts of the human face can be tracked to thereby position the human at higher precision and robustness.

[0157] Those skilled in the art shall appreciate that the embodiments of the invention can be embodied as a method, a system or a computer program product. Therefore the invention can be embodied in the form of an all-hardware embodiment, an all-software embodiment or an embodiment of software and hardware in combination. Furthermore the invention can be embodied in the form of a computer program product embodied in one or more computer useable storage mediums (including but not limited to a disk memory, a CD-ROM, an optical memory, etc.) in which computer useable program codes are contained.

[0158] The invention has been described in a flow chart and/or a block diagram of the method, the device (system) and the computer program product according to the embodiments of the invention. It shall be appreciated that respective flows and/or blocks in the flow chart and/or the block diagram and combinations of the flows and/or the blocks in the flow chart and/or the block diagram can be embodied in computer program instructions. These computer program instructions can be loaded onto a general-purpose computer, a specific-purpose computer, an embedded processor or a processor of another programmable data processing device to produce a machine so that the instructions executed on the computer or the processor of the other programmable data processing device create means for performing the functions specified in the flow(s) of the flow chart and/or the block(s) of the block diagram.

[0159] These computer program instructions can also be stored into a computer readable memory capable of directing the computer or the other programmable data processing device to operate in a specific manner so that the instructions stored in the computer readable memory create an article of manufacture including instruction means which perform the functions specified in the flow(s) of the flow chart and/or the block(s) of the block diagram.

[0160] These computer program instructions can also be loaded onto the computer or the other programmable data processing device so that a series of operational steps are performed on the computer or the other programmable data processing device to create a computer implemented process so that the instructions executed on the computer or the other programmable device provide steps for performing the functions specified in the flow(s) of the flow chart and/or the block(s) of the block diagram.

[0161 ] Although the preferred embodiments of the invention have been described, those skilled in the art benefiting from the underlying inventive concept can make additional modifications and variations to these embodiments. Therefore the appended claims are intended to be construed as encompassing the preferred embodiments and all the modifications and variations coming into the scope of the invention.

[0162] Evidently those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus the invention is also intended to encompass these modifications and variations thereto so long as the modifications and variations come into the scope of the claims appended to the invention and their equivalents.

Claims

1. A method for positioning a human face, the method comprising: obtaining an original image of a user using a camera; positioning roughly the original image of the user to obtain a roughly positioned image of a human face; obtaining information about a detected area of the human face from the roughly positioned image of the human face, wherein the information about the detected area of the human face comprises positional information of each part of the human face; and performing local shape fitting on the information about the detected area of the human face to obtain precise shape information of the each part of the human face.

2. The method according to claim 1, wherein the obtaining the information about the detected area of the human face from the roughly positioned image of the human face comprises: dividing the roughly positioned image of the human face into several sub-windows; calculating an image variance in each of the sub-windows, comparing the image variance in the each of the sub-windows with a preset variance threshold, determining that each of the sub-windows with an image variance below the preset variance threshold comprises a target area, and accepting the each of the sub-windows with the image variance below the preset variance threshold; and rejecting each of sub-windows with an image variance greater than or equal to the preset variance threshold; inputting sub-windows with image variances below the preset variance threshold to an online learning classifier, and obtaining sub-windows output from the online learning classifier; and performing a Non-Maximal Suppression (NMS) process on the sub-windows output from the online learning classifier, and obtaining the information about the detected area of the human face.

3. The method according to claim 2, wherein the inputting the sub-windows with the image variances below the preset variance threshold to the online learning classifier, and obtaining the sub-windows output from the online learning classifier comprises: calculating a posterior probability of a random forest classifier of each of the sub-windows with the image variance below the preset variance threshold, and comparing the posterior probability with a preset probability threshold, accepting each of sub-windows with a posterior probability above the preset probability threshold, and rejecting each of sub-windows with a posterior probability less than or equal to the preset probability threshold; and calculating a matching coefficient between each of the sub-windows with a posterior probability above the preset probability threshold and a target template in a sample library of a Normalized Cross Correlation (NCC) classifier, and comparing the matching coefficient with a preset coefficient threshold, accepting each of sub-windows corresponding to a matching coefficient above the preset coefficient threshold, rejecting each of sub-windows corresponding to a matching coefficient less than or equal to the preset coefficient threshold.

4. The method according to claim 1, wherein the local shape fitting is performed as supervised sequence fitting comprising: step a of extracting shape information of the each part of the human face according to the information about the detected area of the human face, and determining the extracted shape information as an initial value of shape of the each part of the human face; step b of extracting current feature descriptors according to mark points in current shape of the each part of the human face, wherein a current feature descriptor vector is consisted of several current feature descriptors; step c of searching a update matrix library for a update matrix using the current feature descriptor vector as an index, updating the shape information of the each part of the human face using the update matrix, obtaining updated shape information of the each part of the human face, and replacing the shape information of the each part of the human face in the step b with the updated shape information of the each part of the human face; step d of determining whether the largest preset number of iterations is exceeded, or whether an norm error between two latest shape error vectors is below a preset vector norm error threshold, and if so, then proceeding to step e; otherwise, then returning to the step b; and step e of obtaining precise shape information of the each part of the human face.

5. The method according to claim 2 or 3, further comprising: performing structured learning on the precise shape information of the each part of the human face to obtain an object function of the each part of the human face; and optimizing the object function of the each part of the human face to obtain optimized position of the each part of the human face.

6. The method according to claim 5, further comprising: tracking moving positions of the each part of the human face in two consecutive frames according to the optimized position of the each part of the human face; and updating the online learning classifier according to the moving positions of the each part of the human face.

7. An apparatus for positioning a human face, the apparatus comprising: an image obtaining module configured to acquire an original image of a user using a camera, and to output the original image of the user; a roughly positioning module connected with the image obtaining module, configured to position roughly the original image of the user to obtain a roughly positioned image of a human face, and to output the roughly positioned image of the human face; an area detecting module connected with the roughly positioning module, configured to obtain information about a detected area of the human face from the roughly positioned image of the human face, wherein the information about the detected area of the human face comprises positional information of each part of the human face; and a fitting module connected with the area detecting module, configured to perform local shape fitting on the information about the detected area of the human face to obtain precise shape information of the each part of the human face.

8. The apparatus according to claim 7, wherein the area detecting module comprises: a sliding window module configured to divide the roughly positioned image of the human face into several sub-windows; a variance filtering module connected with the sliding window module, configured to calculate an image variance in each of the sub-windows; to compare the image variance in each of the sub-windows with a preset variance threshold, and to determine that each of the sub-windows with an image variance below the preset variance threshold comprises a target area, and to accept the each of the sub-windows with the image variance below the preset variance threshold, to reject each of sub-windows with an image variance greater than or equal to the preset variance threshold; an online learning module connected with the variance filtering module, configured to input sub-window with the image variances below the preset variance threshold to an online learning classifier, and to obtain sub-windows output from the online learning classifier; and a Non-Maximal Suppression (NMS) module connected with the online learning module, configured to perform an NMS process on the sub-windows output from the online learning classifier, and to obtain the information about the detected area of the human face.

9. The apparatus according to claim 8, further comprising: an optimizing module connected with the fitting module, configured to perform structured learning on the precise shape information of the each part of the human face to obtain an object function of the each part of the human face, and to optimize the object function of the each part of the human face to obtain an optimized position of each part of the human face.

10. The apparatus according to claim 9, further comprising: an online updating module connected with the optimizing module, configured to track moving positions of the each part of the human face in two consecutive frames according to the optimized position of the each part of the human face, and to update the online learning classifier according to the moving positions of the each part of the human face.