Summary of the invention
In view of this, the present invention aims to provide a kind of human face in-vivo detection method and system, and not only to attitude strong adaptability, security is high, and coordinates without the need to user, and speed is fast, and Consumer's Experience is good.
Specifically, described human face in-vivo detection method comprises step:
S1, obtained respectively by two cameras and identify two width images of object simultaneously;
The human face region in two width images oriented by S2, employing face classification device, obtains facial image corresponding with two width images respectively;
S3, positioning feature point is carried out to two width facial images;
S4, unique point quick stereo coupling is carried out to the facial image after positioning feature point;
S5, calculate the parallax of two width facial images at matching characteristic point, obtain the depth information value of matching characteristic point;
S6, depth information value according to multiple matching characteristic point, judge whether the face of described identification object is live body.
Preferably, in embodiments of the present invention, described two cameras are arranged by left and right or are arranged composition Binocular Stereo Vision System up and down.
Preferably, in embodiments of the present invention, described face classification device is based on following at least one mode: the colour of skin is communicated with geometrical property, integral projection, template matches, line.
Preferably, in embodiments of the present invention, described unique point comprises all or part of key point being arranged at least one region following: facial contour, eyebrow, eyes, nose and face.
Preferably, in embodiments of the present invention, in step s3, the locator meams of unique point adopts following at least one algorithm: the degree of depth study related algorithm, active shape model related algorithm, active appearance models related algorithm and cascade shape return related algorithm.
Preferably, in embodiments of the present invention, step S4 is specially: with the unique point in a facial image for fixed reference feature point, adds up SAD mode obtain the matching characteristic point corresponding with this fixed reference feature point by absolute error around another facial image character pair point.
Preferably, in embodiments of the present invention, described fixed reference feature point is identical with the ordinate of described matching characteristic point; Or, when fixed reference feature point is not identical with the ordinate of corresponding matching characteristic point, the ordinate of described matching characteristic point is forced to equal the ordinate of the fixed reference feature point corresponding with it, obtain match point, and around match point, respectively get two pixels, using match point and each pixel candidate point as matching characteristic point, and calculate the deviation accumulation value centered by candidate point respectively, choosing the minimum candidate point of deviation accumulation value is matching characteristic point.
Preferably, in embodiments of the present invention, step S6 at least one in the following way: support vector machines algorithm, the feature extraction algorithm based on principal component analysis (PCA) PCA, linear discriminate analysis LDA algorithm; Or step S6 comprises:
S61, determine that the minimum matching characteristic point of depth information value is smallest match unique point, obtain the depth information value of described smallest match unique point;
S62, respectively the depth information value of each matching characteristic point is deducted the depth information value of described smallest match unique point, obtain the relative depth information value of each matching characteristic point;
S63, calculate each matching characteristic point relative depth information value and or quadratic sum as fiducial value, judge that when described fiducial value is less than predetermined threshold value described identification object is as non-living body, otherwise be judged to be live body.
Preferably, in embodiments of the present invention, also comprise after step S63:
S64, for the identification object being judged to be live body through step S63, according to positioning feature point result, estimate face size, calculate the face depth-width ratio of left and right two width facial images, when face the ratio of width to height of left and right two width facial image all exceeds pre-set interval, judge that described identification object is as non-living body.
Preferably, in embodiments of the present invention, described method also comprised step before step S1: S0, demarcate the Binocular Stereo Vision System of described two cameras composition.
Preferably, in embodiments of the present invention, described method also comprised step before step S2: the image pretreatment operation two width images being comprised to three-dimensional correction.
Preferably, in embodiments of the present invention, described comprises image three-dimensional correction: according to the calibration result in step S 1, to the plane of delineation re-projection of two cameras, two images are accurately dropped in same plane, and the row of two images is registered in the parallel structure of forward direction completely.
At the another side of the embodiment of the present invention, additionally provide a kind of face In vivo detection system, described system comprises:
Two cameras, for obtaining the two width images identifying object respectively simultaneously;
Face classification device, orients human face region in the two width images that obtain at two cameras, obtains facial image corresponding with two width images respectively;
Positioning feature point unit, for carrying out positioning feature point to two width facial images;
Stereo matching unit, for carrying out unique point quick stereo coupling to the facial image after positioning feature point;
Computing unit, for calculating the parallax of two width facial images at matching characteristic point, obtains the depth information value of matching characteristic point;
Processing unit, for the depth information value according to multiple matching characteristic point, judges whether the face of described identification object is live body.
In at least one scheme of the present invention, twin camera is utilized to simulate the eyes of people, two width or the two-dimensional images of user's (identification object) is taken from different perspectives by two video cameras, finally converted to the three dimensional practicality information in world coordinate system by a series of technology, the inside and outside parameter of video camera itself is wherein determined by the demarcation of video camera, then captured two-dimensional image is processed, thus captured image can be made in the same plane, Stereo matching is carried out to the pixel of left images, point in three dimensional practicality is marked respectively in two width two dimensional images, according to the principle of parallax of bionics human eye model is placed in affine space again and carries out calculating reduction with the depth information value obtaining object.
When using Binocular Stereo Vision System shooting user, the degree of depth of the human face obtained as nose, eyes is starkly lower than facial contour, head pose change time face organ and a certain side profile depth disparity more obvious, image, video then cannot show the depth difference of distinction, namely large-sized image allows to show obvious depth difference when bending, also will sacrifice normal human face ratio.Can differentiate whether user is live body by the depth information value and priori that calculate different characteristic point in facial image.
It can thus be appreciated that, after adopting the solution of the present invention, when face In vivo detection, too much coordinate without the need to user, security is high, disguised strong, attitude strong adaptability, wide accommodation, and in images match, using positioning feature point technology, accurately can find out the position of matching characteristic point, only need search among a small circle and can reach unique point images match, greatly simplified images match process, improve matching speed, Consumer's Experience is good.
Embodiment
To it should be pointed out that in this part to the description of concrete structure and description order it is only explanation to specific embodiment, should not be considered as there is any restriction to protection scope of the present invention.In addition, when not conflicting, the embodiment in this part and the feature in embodiment can combine mutually.
Please also refer to Fig. 1 to Fig. 5, below in conjunction with accompanying drawing, the human face in-vivo detection method of the embodiment of the present invention and system are elaborated.
Shown in composition graphs 1, face activity detection approach of the present invention can comprise:
Step 1: captured respectively by two cameras simultaneously and identify object (below with user's explanation for example) two width images, be i.e. left and right two width images;
Step 2: the human face region in two width images located by employing face classification device, obtains facial image corresponding with two width images respectively;
Step 3: positioning feature point is carried out to two width facial images;
Step 4: unique point quick stereo coupling is carried out to the facial image after positioning feature point;
Step 5: calculate the parallax of two width facial images at matching characteristic point, obtains the depth information value of matching characteristic point;
Step 6: according to the depth information value of multiple matching characteristic point, judges whether the face of this user is live body.
Wherein, in the specific implementation, two cameras can adopt same model 5,000,000 pixel IP Camera, and two cameras are left and right fixed and arranged, and make the Binocular Stereo Vision System of two camera compositions; In actual applications, it is in the same plane that the horizontal range setting between two cameras meets two cameras, and can obtain user images simultaneously.The distance of camera and face is about about 0.5 meter, and under being in general indoor illumination environment, left images photo size is 640 × 480, and the functions such as Face datection, positioning feature point, In vivo detection, recognition system can be realized by PC.
As a kind of optimal way, the method for the embodiment of the present invention can also comprise step in step 1: demarcate the Binocular Stereo Vision System of two camera compositions, its specific implementation flow process can be:
Step 11: demarcate two cameras respectively, demarcates content and specifically can comprise: left and right camera internal reference matrix number and Distortion Vector.
Step 12: demarcate the Binocular Stereo Vision System be made up of two cameras.Demarcate content specifically can comprise: the rotation matrix of Binocular Stereo Vision System and translation vector.
Step 13: obtain three-dimensional correction parameter and re-projection matrix.
In this approach, step 11 can adopt chessboard calibration method to carry out camera calibration, and Intrinsic Matrix can comprise camera horizontal direction focal length, vertical direction focal length, principle point location; Distortion Vector can be made up of coefficient of radial distortion and tangential distortion coefficient.
In this approach, to the calibration result that step 11 obtains, mathematical method can be utilized to eliminate lens distortion that is radial and tangential direction, thus export orthoscopic image, make (x
p, y
p) for there is no the position of the point distorted, (x
d, y
d) be distorted position, have:
So just obtain the image not having lens distortion.
In this approach, the rotation matrix described in step 12 and translation vector may be used for the position relationship describing the relative left camera of right camera, and mathematical notation is X
r=R*X
l+ T, wherein, X
l, X
rbe the three-dimensional position vector of any point P in the camera reference frame of Binocular Stereo Vision System left and right in three dimensions, R, T are rotation matrix and the translation vector of Binocular Stereo Vision System.
In this approach, step 13 can use Bouguet algorithm to obtain the three-dimensional correction parameter of Binocular Stereo Vision System, and utilize reverse Mapping method to obtain the correction maps table of left and right view, obtain for by X-Y scheme picture point re-projection to the re-projection matrix in three-dimensional.
In addition, before the human face region in two width images located by use face sorter described in step 2, the method for the embodiment of the present invention can also comprise step: the image pretreatment operation two width images being comprised to three-dimensional correction.Wherein, can according to aforementioned calibration result to the operation of three-dimensional correction, by searching left and right correction maps table, to the plane of delineation re-projection of two video cameras, left images is made accurately to drop in same plane, and the row of two images is registered in the parallel structure of forward direction completely, namely same point is positioned in same a line at the pixel column of a video camera and another, and the left images before and after three-dimensional correction can see shown in Fig. 2 a and Fig. 2 b.
In this approach, face sorter is used to locate human face region in two width images described in step 2, the image pretreatment operation adopted can carry out greyscale transformation as to the image after three-dimensional correction, filtering process, obtains forward direction parallel alignment and the high gray-scale map of quality; The Face detection method adopted can be a kind of method based on template matches, namely integration is utilized to calculate respective Haar-Like wavelet character value fast to left and right gray-scale map, be applied to the Adaboost-Cascade sorter that off-line training is good, detect left images human face region.Certainly, also can based on the colour of skin with geometrical property, based on integral projection, based on the methods such as line is communicated with realization location.
In this approach, carry out positioning feature point to locating the facial image obtained described in step 3, unique point refers to the position coordinates of facial contour, eyebrow, eyes, nose and face; In the present embodiment, adopt cascade shape regression algorithm to carry out positioning feature point, certainly in other embodiments, also can adopt and realize based on modes such as the degree of depth study related algorithm, active shape model related algorithm, active appearance models related algorithms.Adopt cascade shape regression algorithm to carry out positioning feature point, as adopted the Cascaded Double-layer shape regression algorithm of index feature, directly can learn a vector regression function, combining image itself also minimizes training shapes alignment error to estimate face shape; Combine two-layer cascade recurrence, shape indexing characteristic sum based on relevant feature selection approach, one can be trained fast accurately without ginseng regression model, have the speed be exceedingly fast, be a kind of efficient, high-precision human face characteristic point Precision Orientation Algorithm, its concrete steps are:
Step 31: load the good Cascaded Double-layer shape based on index feature of off-line training and return device;
Step 32: to return device the original shape of Stochastic choice L training sample shape as human face region unique point to be positioned from loading;
Step 33: calculate with some by the estimating user face shape of training sample for obtaining during original shape selected;
Step 34: using the face shape of the mean value of L estimating user face shape as user.
Wherein, shape refers to that the vector that human face characteristic point positional information forms, mathematical notation are S={x
1, y
1, x
2, y
2..., x
n, y
n, x
i, y
iit is the pixel coordinate of i-th Feature point correspondence.
In this approach, step 33 specifically comprise following flow process:
Step 331: calculate F gray scale difference value feature;
Step 332: the updated value obtaining current shape, upgrades current shape;
Step 333: complete the shape renewal that regulation returns number of times, obtain final active user's face and estimate shape.
Wherein, the obtain manner of gray scale difference value feature described in step 331 can be: to unique point call number and relative position information according to F in recurrence device, obtain the gray-scale value of manipulative indexing feature on user's facial image, calculate often pair of index unique point gray scale difference value.
The method obtaining the updated value of current shape described in step 332 can be: make comparisons obtaining F gray scale difference value in step 331 with corresponding threshold value, obtains corresponding F bit two-stage system value, can find out the more new shape that this binary value in training aids is corresponding.
The specific algorithm upgrading current shape described in step 332 can be:
S
i=S
i-1+δS
Wherein, S
i-1be that current the front of estimation shape once estimates shape, 6S refers to more new shape.
In addition, described in step 4, unique point quick stereo coupling is carried out to the facial image after positioning feature point, refer to the matching characteristic point between two width three-dimensional correction images about " absolute error adds up (SAD, Sum of Absolute Difference) " wicket can be used to search.With each unique point in left image for reference point, around right image character pair point, carry out match search with SAD moving window, absolute error adds up minimum point and is the matching characteristic point of left image characteristic point on right image.
In this approach, described SAD moving window carries out searching for the absolute error aggregate-value of window position in matching process and can be:
Because step 3 first can carry out accurate facial modeling to left images, and image size is 640 × 480, therefore adopt the SAD moving window of size predetermined number (as 5), in the wicket of predetermined number centered by right image characteristic point (as 5) pixel, carry out match search.
In this approach, it is identical that matching characteristic point after the quick stereo of unique point described in step 4 coupling can be required to meet ordinate, when matching characteristic point ordinate is not identical, matching characteristic point ordinate in right image can be forced the ordinate equaling the corresponding referenced unique point of left image, obtain match point M ', and two pixels are respectively got in left and right around M ', using the candidate point of these 5 points as M ', calculate the deviation accumulation value centered by candidate point respectively, choosing the minimum point of deviation accumulation value is match point, accompanying drawing 3 is positioning feature point result and matching characteristic point location diagram in right image, as seen from the figure, matching characteristic point is just near the unique point of location, save match time.
In addition, step 5 can comprise:
Step 51: calculate the parallax of left and right two width facial images at matching characteristic point;
Step 52: the depth information value obtaining unique point according to the parallax of this unique point.
Wherein, parallax described in step 51 can be left and right matching characteristic point horizontal ordinate distance.
In step 52 mate the depth information value stating unique point similar triangle theory can be utilized to obtain, the depth information value calculating method of matching characteristic point is
wherein, T refers to the distance of two video camera projection centres, and f is the focal length of same model two video cameras.In the present invention, use two cameras that model is identical, T and f demarcates in advance, and d is left and right matching characteristic point parallax.Known parallax d and X-Y scheme picture point (x, y), can utilize the re-projection matrix Q in step 1 can three-dimensional coordinate corresponding to X-Y scheme picture point in computed image:
be this three-dimensional coordinate,
for the degree of depth of X-Y scheme picture point.
In addition, according to the depth information value of different parts matching characteristic point described in step 6, judge whether face is live body, and concrete steps can be:
Step 61: determine that the minimum matching characteristic point of depth information value is smallest match unique point, obtain the depth information value of described smallest match unique point;
Step 62: the depth information value respectively the depth information value of each matching characteristic point being deducted described smallest match unique point, obtains the relative depth information value of each matching characteristic point;
Step 63: calculate the quadratic sum of the relative depth information value of described each matching characteristic point or the degree of depth and, compare with predetermined threshold value, it is non-living body that quadratic sum is less than threshold determination, otherwise is judged to be live body;
When implementation step 6, respectively with the human face photo of the real human face of user (live body) and user for the matching characteristic point depth information value identifying object and obtain can see shown in Fig. 4 a and Fig. 4 b, wherein, Fig. 4 a is the schematic diagram of the unique point depth information value of real human face in the human face in-vivo detection method of the embodiment of the present invention, and Fig. 4 b is the schematic diagram of human face photo unique point depth information value in the described human face in-vivo detection method of the embodiment of the present invention.According to relatively can finding out of Fig. 4 a and Fig. 4 b, obvious for identifying the change of the depth information value between the Different matching unique point that object obtains by the real human face of user, then relatively not obvious for identifying the change of the depth information value between the Different matching unique point that object obtains by the human face photo of user.
Step 64: according to positioning feature point result, estimates face size, calculates the face depth-width ratio of left and right two width facial images, when face the ratio of width to height of left and right two width facial image all exceeds pre-set interval, judges that described identification object is as non-living body.
The object of step 64 will be judged to be that the identification object of live body further identifies through step 63, thus improve discrimination, avoids cheating certification.
In actual applications, by the mode of bending large photo (close with entity face size), likely by the judgement of step 63, that is, step 63 is likely carried out deception certification.In order to avoid this deception certification, in embodiments of the present invention, further comprises the determining step of face depth-width ratio, concrete:
For the user being judged to be live body through step 63, in this step, calculate the face depth-width ratio of left and right two width facial images respectively, the mode whether exceeding pre-set interval by calculating face depth-width ratio judges to identify whether object is bending large photo.
In actual applications, can judge when first time and second time face high wide exceed pre-set interval after, return step 1 Resurvey image to judge, when three times (or other pre-determined numbers) are all greater than this threshold value, be judged to be non-living body, like this by judgement repeatedly to avoid judging by accident.This constrained procedure can prevent from cheating certification by the mode of bending large photo.Then, if be finally judged to be live body, then recognition of face link is entered.
In addition, in whole process, real user and each 200 of human face photo can be taken respectively by Binocular Stereo Vision System, during shooting, user constantly changes the position of attitude, expression, relative camera, human face photo to be kept flat, far and near, bending etc. to operation, and human face photo uses 6 cun of sizes and 12 cun of large small photos experiments, can entirely true differentiation user whether live body.
It should be noted that, the scheme of above-described embodiment is not only applicable to the Binocular Stereo Vision System of camera left and right fixed and arranged, also (two cameras are in arranging up and down to be applicable to the Binocular Stereo Vision System of upper and lower fixed and arranged, it is in the same plane that vertical range setting between two cameras meets two cameras, and can obtain user images simultaneously).For adopting the scheme of arranging up and down, aforementioned left image corresponds to the epigraph in upper and lower displacement structure, the corresponding hypograph of right image, the corresponding ordinate of horizontal ordinate, the corresponding horizontal ordinate of ordinate.
It should be noted that, according to the depth information value of different parts matching characteristic point described in step 6, judge that whether face is that the sorting technique of live body does not limit to such scheme, also support vector machines, the method such as feature extraction, linear discriminate analysis LDA based on principal component analysis (PCA) PCA can be used, as long as corresponding function can be realized.
It should be noted that, the pretreatment operation of image can select different preprocess methods according to the different mode of face classification device.
After the human face in-vivo detection method application of the embodiment of the present invention, twin camera can be utilized to simulate the eyes of people, two width or the two-dimensional images of user is taken from different perspectives by two video cameras, finally converted to the three dimensional practicality information in world coordinate system by a series of technology, the inside and outside parameter of video camera itself is wherein determined by the demarcation of video camera, then captured two-dimensional image is processed, thus captured image can be made in the same plane, Stereo matching is carried out to the pixel of left images, point in three dimensional practicality is marked respectively in two width two dimensional images, according to the principle of parallax of bionics human eye model is placed in affine space again and carries out calculating reduction with the depth information value obtaining object.
When using Binocular Stereo Vision System shooting user, the degree of depth of the human face obtained as nose, eyes is starkly lower than facial contour, head pose change time face organ and a certain side profile depth disparity more obvious, image, video then cannot show the depth difference of distinction, namely large-sized image allows to show obvious depth difference when bending, also will sacrifice normal human face ratio.Can differentiate whether user is live body by the depth information value and priori that calculate Different matching unique point in facial image.From the foregoing, compared with prior art, the scheme of the embodiment of the present invention, not only to attitude strong adaptability, security is high, and coordinates without the need to user, and speed is fast, and Consumer's Experience is good.
In addition, shown in composition graphs 5, the embodiment of the present invention additionally provides a kind of face In vivo detection system, and this system can comprise two cameras, face classification device, positioning feature point unit, Stereo matching unit, computing unit and processing unit.Wherein: two cameras are used for the image obtaining user respectively; The two width framing that face classification device is used for two cameras obtain go out human face region; Positioning feature point unit is used for locating to face classification device the facial image obtained and carries out positioning feature point; Stereo matching unit is used for carrying out unique point quick stereo coupling to the facial image after positioning feature point; Computing unit is for calculating the depth information value of matching characteristic point in facial image; Processing unit is used for the depth information value according to different parts matching characteristic point, judges whether the face of this user is live body.
In the specific implementation, the function of above-mentioned each unit and face classification device is not limited to and is realized by PC, as long as can realize the corresponding function of these parts.In addition, this face In vivo detection system other expansion and illustrate see the associated description of embodiment of the method, hereby can not repeat; And this system also can play corresponding technique effect.
One of ordinary skill in the art will appreciate that, part steps/the units/modules realizing above-described embodiment can have been come by the hardware that programmed instruction is relevant, foregoing routine can be stored in computer read/write memory medium, this program, when performing, performs and comprises step corresponding in each unit of above-described embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or laser disc etc. various can be program code stored medium.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.