CN1866270B - Face recognition method based on video frequency - Google Patents

Face recognition method based on video frequency Download PDF

Info

Publication number
CN1866270B
CN1866270B CN2005100709199A CN200510070919A CN1866270B CN 1866270 B CN1866270 B CN 1866270B CN 2005100709199 A CN2005100709199 A CN 2005100709199A CN 200510070919 A CN200510070919 A CN 200510070919A CN 1866270 B CN1866270 B CN 1866270B
Authority
CN
China
Prior art keywords
video
frame
identified
video sequence
subspace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2005100709199A
Other languages
Chinese (zh)
Other versions
CN1866270A (en
Inventor
汤晓鸥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong CUHK
Original Assignee
Chinese University of Hong Kong CUHK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong CUHK filed Critical Chinese University of Hong Kong CUHK
Publication of CN1866270A publication Critical patent/CN1866270A/en
Application granted granted Critical
Publication of CN1866270B publication Critical patent/CN1866270B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Collating Specific Patterns (AREA)
  • Image Analysis (AREA)

Abstract

The provided video-to-video face recognition method comprises: synchronizing the video both on time and space, taking multi-grade sub-space analysis, processing data to extract target feature. This invention makes full use of video sequence information, overcomes process speed and scale defect, and can obtain almost perfect classification result in XM2VTS database.

Description

Face recognition method based on video
Technical field
The present invention relates to field of image recognition, relate more specifically to carry out the technology of face recognition based on video image.
Background technology
Automatically face recognition is a challenging task in the pattern identification research.In recent years, a large amount of technology had been proposed, for example:
1. the local feature analytical approach comprises
1) active presentation model (AAM) method: referring to T.F.Cootes, C.J.Edwards, " the Active Appearance Models " that is shown with C.J.Taylor (presentation model (AAM) initiatively, list of references 1), publish in IEEE Trans.On PAMI (IEEE is about the proceedings of PAMI) the 23rd volume, the 6th phase, the 681-685 page or leaf, June calendar year 2001; With
2) elastic graph coupling (EGM) method: referring to L.Wiskott, J.M.Fellous, N.Krueger, " the Face Recognition by Elastic Bunch GraphMatching " that C.von der Malsbug is shown (carrying out face recognition, list of references 2 by elasticity string figure coupling) is published in IEEETrans.on Pattern Analysis and Machine Intelligence (IEEE pattern analysis and machine intelligence proceedings), the 19th volume, the 7th phase, 775-779 page or leaf, 1997.
2. based on the subspace method of presentation, comprising:
1) eigenface (eigenface) method: (the facial method of use characteristic is carried out face recognition referring to " Facerecognition using eigenfaces " that M.Turk and A.Pentland showed, list of references 3), IEEE International Conference Computer Vision and Pattern Recognition (IEEE international computer vision and pattern identification meeting, list of references 3), the 586-591 page or leaf, 1991.
2) LDA method: referring to V.Belhumeur, J.Hespanda, shown with D.Kiregeman, " Eigenfaces vs.fisherfaces:Recognition Using Class Specific Linear Projection " (the facial comparison with expense rounding off face of feature: use classes specific linear projection discern list of references 4), be published in IEEE Trans.on PAMI (IEEE is about the proceedings of PAMI), the 19th volume, the 7th phase, 711-720 page or leaf, in July, 1997.And W.Zhao, R.Chellappa, " Empirical performance analysis of linear discriminantclassifiers " (the experience performance evaluation of linear discrimination classification device of being shown with N.Nandhakumar, list of references 5), Proceedings ofCVPR (CVPR procceedings), the 164-169 page or leaf, 1998.
3) Bayes (Bayesian) method: referring to B.Moghaddam, T.Jebara, " Bayesian face the recognition " (face recognition of Bayes's method of being shown with A.Pentland, list of references 6), PatternRecognition (pattern identification), the 33rd volume, 1771-1782 page or leaf, 2000 years.
But above-mentioned these methods all belong to uses the face recognition method based on image of rest image as the input data.First problem based on the face recognition of image is, the someone may use in advance the mug shot of record to remove to confuse camera, and the photograph chance error is taken pictures it as the object of activity.Second problem is to compare with the biometric techniques of other high accuracy, and be still too low in the application of some reality based on the accuracy of the identification of image.In order to address these problems, the face recognition based on video has been proposed recently.A main advantage based on the face recognition of video is to have prevented to cheat recognition system by the face-image of storage in advance.Although this is possible because forge video sequence before real-time video camera, difficulty is very big.Like this, can guarantee that the biological data when authentication is from real object.Based on another key advantage of the recognition methods of video is that available information in video sequence is more than single image.If can correctly extract extra information, just can further improve the identification accuracy.
But, compare with a large amount of facial recognition techniques based on image, still there is limitation about video to the research of the face recognition of video.Most of research about the face recognition in the video mainly concentrates on carries out facial detection and tracking in the video.
In case face is positioned in the frame of video, what the common use of existing method was traditional carries out the identification of single frames based on the facial recognition techniques of image.Identification about direct use video data, can be referring to " Comparative Evaluation of Face Sequence Matching for Content-BasedVideo Access " (comparative evaluation that is used for the facial sequences match of content-based video access that S.Satoh showed, list of references 8), be published in Proceedings of IEEE International Conference on AutomaticFace and Gesture (the facial automatically and gesture recognition international symposium procceedings of IEEE), the 163-168 page or leaf, 2000.Immediate frame is to mating this two video sequences in two videos by selecting for Satoh, and it remains the coupling of image to image.
In addition, train the method for the statistical model face that is used to mate about using video sequence, can be with reference to following document:
" the Exemplar-based Face Recognition fromVideo " that V.Kruger and S.Zhou showed is (based on sample to the face recognition in the video, list of references 9), be published in Proceedings ofIEEE International Conference on Automatic Face and Gesture (the facial automatically and gesture recognition international symposium procceedings of IEEE), the 182-187 page or leaf, 2002.
G.Edwards, C.Taylor, " the Improving IdentificationPerformance by Integrating Evidence from Sequences " that shown with T.Cootes (improves recognition performance by integrated evidence from sequence, list of references 10), be published in IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (the IEEE ACM (Association of Computing Machinery) is about the symposial of computer vision and pattern identification), the 486-491 page or leaf, 1999.
Though this training pattern is than more stable and healthy and strong from the model of single image training, if given identical intrinsic dimensionality, the Global Information that is included in so in this model is still similar to single image.Similar to the coupling of image to image, its training data scale has also increased.
In the document of above-mentioned Satoh and O.Yamaguchi, K.Fukui, " Face Recognition Using Temporal Image the Sequence " (face recognition that service time, image sequence carried out of being shown with K.Maeda, list of references 11, be published in Proceedings of IEEE InternationalConference on Automatic Face and Gesture (the facial automatically and gesture recognition international symposium procceedings of IEEE), the 318-323 page or leaf, 1998) in a kind of mutual subspace (mutualsubspace) method has been described, calculate the eigen space of many individualities for everyone uses frame of video.Because it can not obtain discriminant information from the difference between the different people, so the identification accuracy is lower than other method.
In addition, though information available and can help to improve the identification accuracy thus than many in the single image in video sequence, it must solve, and data scale is big, processing speed is slow, and handles the high problem of complexity.
Summary of the invention
Therefore, in view of the problems of the prior art about face recognition discussed above, the purpose of this invention is to provide the face recognition method of a kind of new video to video, can make full use of the space-time information that is included in the video sequence, realize high accuracy of identification, can overcome simultaneously and adopt video sequence to carry out the big and slow defective of processing speed of data scale that face recognition is brought.
Face recognition method according to the present invention comprises: 1) determine corresponding a plurality of similar frame of video in the video sequence in the video sequence that is identified and reference picture storehouse; 2) the similar video frame of the correspondence in the video sequence in described video sequence that is identified and reference picture storehouse is carried out the aligning of reference point; 3) from the described face data cube that is identified the video through a plurality of frame of video formation persons of being identified behind the reference point aligning; With 4) described face data cube is carried out subspace analysis, extract the person's of being identified facial characteristics, compare with facial characteristics vector in the described reference picture storehouse.
Wherein, in the present invention, above-mentioned steps 1) determine to be identified in the video sequence with the reference picture storehouse in the processing of frame of video of image similarity be called as frame of video carried out time synchronized.By this time synchronized, determine to have in two video sequences the frame of similar image.According to a kind of scheme of the present invention, used the frame of waveform definite expectation in each video of sound signal.Therefore, utilize the sound signal that comprises in the video, simply and effectively avoided complicated algorithm.
After time synchronized, the process of each image alignment reference point is called spatial synchronization in the present invention.In an embodiment of the present invention, use the Gabor wavelet character to carry out spatial synchronization.Can be about the Gabor wavelet character referring to list of references 2.The back will further specify.For the shape similarity of utilizing in subspace method between different face-images, the aligning of reference point is important.
For the big video sequence to elapsed time and spatial synchronization mates identification fast, method provided by the invention comprises multistage subspace analysis method and multi-classifier integrating method.
Wherein, multistage subspace analysis method be proper vector with facial cubical each frame of the person of being identified in the video as a feature sheet (slice), in first order subspace analysis, from each feature sheet, extract and differentiate proper vector.In the subspace analysis of the second level, the differentiation proper vector that will extract from each feature sheet is connected to form new proper vector successively earlier.Then, new proper vector is carried out PCA handle, eliminate the redundant information in a plurality of frames.Choose feature, with the final proper vector that is formed for discerning with big eigenwert.
In multi-classifier integrating method according to the present invention, behind the first order subspace analysis in carrying out above-mentioned multistage subspace analysis method, do not carry out second level subspace analysis, but directly adopt the differentiation proper vector that obtains in the first order subspace analysis to come each frame is discerned, use multiple fusion rule to merge the result of all sorters based on frame then, to carry out the last identification of video sequence.
According to the present invention, can obtain following useful effect:
1) avoided original video data directly discerned and handled the processing complicated problems bring, can be fast and high accuracy ground carry out face recognition.
2) for adopting the auxiliary video frequency identifying method of audio frequency to carry out the system that identity is differentiated, since need the person of being identified sounding in real time, can avoid the problem of the traditional identification based on rest image (even comprising traditional video identification) security deficiency, therefore have higher security.
Description of drawings
Fig. 1 has shown the synoptic diagram of assisting the time synchronized of carrying out video sequence frame according to the employing audio frequency of the inventive method;
Fig. 2 is the synoptic diagram of people's face graphics template, shows the example of the reference point of selecting on the face the people.
Embodiment
Below with reference to description of drawings preferred implementation of the present invention.
In the recognition methods based on video according to the present invention, can provide the advantage of more information in order to bring into play video, the independent frame in the video should be mutually different.Because if all frames are all similar mutually, the information that is included in so in the video sequence is identical with the situation of single image basically.Yet for the video that content frame changes, two video sequences (template video sequence and the video sequence that is identified) simple coupling frame by frame can not have very great help.This be because, the situation that the frame with the different expressions in the frame in the video and another video mates may take place, this can further damage the performance of face recognition on the contrary.
Therefore, carrying out improved key for the recognition performance based on video is, image in two video sequences must have identical order with regard to it with regard to each independent frame, for example amimia (neutral) face can with amimia facial match, smile's face can with smile's facial match.This shows, if video sequence is used for face recognition, be important to two video sequences with the similar frame of video of identical series arrangement (being that the time is synchronous) so.In other words, need reorder to original time video sequence (template video sequence and the video sequence that is identified) according to the content in each frame.
In order to realize this point, can use conventional expression algorithmic technique to come the similar expression of coupling in different videos based on face.But this calculating cost for the big situation of this data scale of video data is too high, and the accuracy of Expression Recognition neither be very high.Certainly, can use information such as expression, illumination or direction to be used for audio video synchronization.According to preferred implementation of the present invention, can use the information of the sound signal that comprises in the video to carry out the time synchronized of video sequence frame.To specify this method below.
With XM2VTS database (the facial video database of the maximum of public Ke De, referring to list of references 12, K.Messer, J.Matas, J.Kittler, J.Luettin, " XM2VTSDB:The Extended M2VTS the Database " (XM2VTSDB: the M2VTS database of expansion) that is shown with G.Matitre, Second International Conference on AVBPA (second international AVBPA symposial), in March, 1999) be example, video data wherein comprises 295 people's video sequence.For everyone, extract several video sequences (each is 20 seconds) with four different time periods (session).In each section, when recorded video sequence, people are required to chant two sections literal: " 0,1,2 ..., 9 " and " 5,0,6,9,2,8,1,3,7,4 ".Can position frame in conjunction with the difference expression with these voice signals.
Fig. 1 has shown an example, and wherein the pronunciation with 5 words is an example: " Zero ", " one ", " two ", " three ", " four ".This example is that the peak value of the audio volume control of each pronunciation of words (maximum point) is positioned, and chooses corresponding frame of video constantly with this audio volume control peak value then.All adopt this method to select frame of video to the test video that is used to set up the training video in reference picture storehouse and be identified, thereby the frame of video in two kinds of video sequences is carried out time synchronized.Certainly, also can use other parameter to choose corresponding frame of video as reference point (for example central point of the audio zone of the trough of audio volume control (smallest point) or each word).Usually, when a people reads different words, can show different expressions.Certainly, can use other paragraph or sentence, as long as it is identical with the content that the test video that is identified uses to be used for the training video of modelling.
Although can use the more senior speech recognition that assesses the cost to improve this result,, said method has proved and has produced effect very much and efficient for adopting the synchronization video sequence and choosing a plurality of difference frames that are used for face recognition.
In addition, also can easily expand said method to comprise more information.For example, in identification system, the method that also audio frequency that adopts the above-mentioned utilization person of being identified sounding can be selected the video frequency identifying method of frame and discern based on checking (as the password authentication) method of the content of sounding or/and to the person's of being identified tone features is integrated, realizes more accurate and safe performance.
After carrying out above-mentioned time synchronized, each image is carried out the aligning of reference point, this is that their face will move and change because when people talk.Fig. 2 has shown the example of the facial reference point of this image.In this embodiment, have 35 reference points.In this manual, claim that this step is a spatial synchronization.The aligning of reference point utilizes the shape similarity in the face of different people for subspace method be very important.Can use the Gabor wavelet character to come on schedule as the spatial synchronization allocation base.
Concrete grammar is, calculate the Gabor wavelet character value of each reference point of reference picture, extract Gabor wavelet character value to being identified the regional area of image at each reference point place, seek then to be identified in the image and go up the point that near the reference point of correspondence position has the most close Gabor wavelet character value, be identified near the reference point of image this position as this with reference picture (template).
For all video sequences of using in the identification, the frame of video after elapsed time and spatial synchronization (two-dimensional matrix) constitutes (aligned) 3D face data cube (three-dimensional matrice) of everyone aligning respectively.On this basis, can use a large amount of methods to carry out the video sequence coupling.But, as mentioned above, use traditional method (for example image or subspace method recently mutually) can not utilize the discriminant information in all video datas.
A kind of direct method is that whole data cube is treated as single big proper vector, and carries out normal subspace analysis to extract feature.Though the fusion method of this eigenwert level has been utilized data all in the video, there are several problems in this method.At first, data scale is very huge.For example use for each video sequence and be of a size of 21 images of 41 * 27, then intrinsic dimensionality is 23247.Big like this vector is carried out direct subspace analysis, and processing cost is very high.The second, more serious problem is, because with respect to the big intrinsic dimensionality of differentiating the subspace analysis algorithm, sample size is but very little, so there is the problem of so-called over-fitting (over fitting).
In order to overcome these problems,, adopted a kind of multistage subspace analysis algorithm according to preferred implementation of the present invention.That is, cubical each frame of the face data in the video as a feature sheet, is carried out unified subspace analysis to each feature sheet then, from each sheet, extract and differentiate feature.Detailed content about this analytical approach can be referring to list of references 13, be " Unified Subspace Analysis for Face Recognition " (the unified subspace analysis that is used for face recognition) that X.Wang and X.Tang show, Proceeding of IEEE International Conference on Computer Vision (symposial of IEEE international computer vision), 2003.
Then, will make up from the differentiation proper vector that each sheet extracts, to form new proper vector.New proper vector is carried out PCA (principal component analysis (PCA)) handle,, thereby extract final proper vector with the redundant information between the elimination feature sheet.Specify multistage subspace analysis method of the present invention below.
In the present invention, the implication of term " class (class) " is meant the individuality (people) in training set or the reference picture storehouse.
In first order subspace analysis, for each feature sheet:
1-1. each feature sheet is projected to the PCA subspace of determining from the training set of this sheet, selects the dimension of PCA subspace then by the test findings of repeatedly identification, to remove most of noise.
1-2. scatter matrix (within-class scattermatrix) is determined (intrapersonal) subspace in the class in the use class in the PCA subspace that dimension reduces.
1-3. be the mean value that L the class in reference picture storehouse (gallery, promptly be used for discerning with reference to template base) calculated their training data respectively, with the center of the training sample that obtains each class.With all class central projection subspace in the class, by eigenwert in the class normalization is carried out in projection then, obtain (whitened) proper vector of albefaction.
Handle 1-4. PCA is carried out in the space that the proper vector center of the albefaction of above-mentioned all L class is formed, obtain differentiating proper vector.
In the subspace analysis of the second level, carry out following operation:
2-1. the differentiation proper vector that will extract from each sheet is connected to form new proper vector successively.
Handle 2-2. new proper vector is carried out PCA, eliminate the redundant information in a plurality of frames.Choose preceding several characteristic, the final proper vector that is identified with formation with big eigenwert.
In above-mentioned first order subspace analysis, the dimension of subspace is selected in the following way in PCA subspace and the class: the dimension of selecting subspace in a PCA subspace and the class, carry out identification test, by test of many times, choose the PCA subspace and the interior subspace of the class dimension of the recognition result that can obtain.
In the subspace analysis of the second level, only use PCA rather than unified subspace analysis.This is to be reduced because of changing in the class in first order albefaction step, is extracted in the step 1-4 of first order subspace analysis and differentiate feature.Repeat unified subspace analysis and can not increase any new information.But, between different sheets, still have a large amount of overlay informations.Although because have expression shape change, these frames are still closely similar each other.Need to adopt PCA to reduce redundant information.
Multistage subspace analysis of the present invention can not lose a lot of information than existing subspace analysis.Specifically, because the albefaction step only needs except change information in the class, so do not need to consider them during the information loss in analytical algorithm.Only need to pay close attention to two PCA steps.Handle in order to carry out PCA, at first generate the sampling matrix that a n takes advantage of m.
A = x 1 ( 1 ) x 2 ( 1 ) . . . x m ( 1 ) x 1 ( 2 ) x 2 ( 2 ) . . . x m ( 2 ) . . . . . . . . . . . . x 1 ( n ) x 2 ( n ) . . . x m ( n ) - - - ( 1 )
X wherein iBe the face data cube proper vector of video, n is the length of vector, and m is the number of training sampling.The length that is decomposed into the g=n/k group by the proper vector with length is the little proper vector of k,
Figure G200510070919920050524D000091
Can be at the short set of eigenvectors B of g group iIn each on carry out PCA.Form new proper vector by a few eigenwert of choosing from each group then.By new proper vector is carried out PCA, calculate final proper vector.
Be that example illustrates to choose two groups short set of eigenvectors below.Eigenvectors matrix and its covariance matrix are:
A = B 1 B 2 , - - - ( 3 )
W = AA T = B 1 B 1 T B 1 B 2 T B 2 B 1 T B 2 B 2 T = W 1 W 12 W 21 W 2 - - - ( 4 )
If covariance matrix W 1And W 2Eigenvectors matrix be respectively T 1And T 2, so
T 1 T W 1 T 1 = Λ 1 - - - ( 5 )
T 2 T W 2 T 2 = Λ 2 - - - ( 6 )
Wherein, Λ 1And Λ 2It is the diagonal angle eigenvalue matrix.Grouping (B for the first order 1, B 2..., B g) effective rotation matrix of PCA be
T = T 1 0 0 T 2 - - - ( 7 )
T also is an orthogonal matrix, because
T T T = T 1 T T 1 0 0 T 2 T T 2 = I - - - ( 8 )
So grouping (B in the first order 1, B 2..., B g) PCA after because the orthogonality of rotation matrix T, the covariance matrix of rotation proper vector
W r = T T WT = Λ 1 T 1 T W 12 T 2 T 2 T W 21 T 1 Λ 2 = Λ 1 b 0 0 Λ 1 s C bb C bs C sb C ss T C bb C bs C sb C ss Λ 2 b 0 0 Λ 2 s - - - ( 9 )
It is the similar matrix of former proper vector covariance matrix W.Because similar matrix has identical eigenwert,, influence on former proper vector covariance matrix W is discussed by only keeping in each group in front a few dominant eigenvalue so can use the rightest of equation (9).
In equation (9), n=1 or 2 o'clock, A NbAnd A NsDifference representation eigenvalue matrix Λ nGreater advantage eigenwert section and than I override feature value section.C Xx(wherein x=b or s) represents the cross covariance matrix of two groups of rotation features.By only keeping the dominant eigenvalue among the PCA of the second level, new proper vector covariance matrix becomes
W d = Λ 1 b C bb T C bb Λ 2 b - - - ( 10 )
The item of eliminating from Wr has Λ 1s, Λ 2s, C Ss, C BsAnd C SbBecause main energy is comprised in the middle of the dominant eigenvalue, Λ 1sAnd Λ 2sInformation loss very little, thereby be included in energy C in the cross covariance matrix of two little energy feature vectors SsShould be littler.
Can prove C BsAnd C SbAll can not be very big.If two stack features B 1And B 2Uncorrelated mutually, all cross covariance C in the equation (9) so XxMatrix all can be very little.On the other hand, if two stack features values are very relevant mutually, this dominant eigenvalue of two groups can be closely similar.Therefore, the cross covariance Matrix C of second group of big feature and first group of little feature BsCan be closely similar with the cross covariance matrix of first group of big feature and first group of little feature, and owing to the decorrelation of PCA is zero.
As two stack features B 1And B 2During part correlation, relevant part should be main signal, and this is because feature B 1And B 2Noise section relevant hardly each other.The key property of PCA is for all signal energies in a few the big eigenwert that remains on the front.So, B 2In most of signal energy, particularly with B 1Relevant B 2The major part of signal energy is retained in B 2In the big eigenwert section of covariance matrix.B 2The energy that is dropped of little eigenwert section comprise hardly and B 1Relevant energy.So, C BsAnd C SbShould be very little, with them from covariance matrix W rMiddle removal can not lost too many information.
As the above analysis, covariance matrix W dBe W rApproximate, and W rIt is the similar matrix of W.Therefore, we can say W from multistage subspace method dEigenwert be actually the approximate of the eigenwert that calculates from the W of Standard PC A method.
According to another embodiment of the invention, in above-mentioned multistage subspace analysis method, also can substitute partial subspace analysis with the multi-categorizer integrated technology.That is, in the middle of the first order is analyzed, still handle each individual frame of video with unified subspace analysis.Then, come integrated all sorters, to determine last classification based on frame with fusion rule.Its detailed method is presented below.
First order subspace analysis is identical to 1-4 with step 1-1 in the above-described multistage subspace analysis, repeats no more.
In the analyzing and processing of the second level, carry out following steps:
2-1 '. in sorter, each frame is discerned with resulting differentiation proper vector among the step 1-4 based on frame.
2-2 '. use fusion rule that the recognition result based on the sorter of frame is made up, obtain final recognition result.
Had much about method the fusion of multi-categorizer.These methods all can be used for realizing said process of the present invention.Enumerate below and adopt two kinds of simple fusion rules to merge example based on the sorter of frame, i.e. majority rule voting rule and sum rule respectively.
Majority rule ballot (Majority voting)
Each sorter C k(x) face data of input is set the class label C k(x)=i.This incident can be expressed as a binary function,
With the majority rule ballot, last class can be chosen to
β ( x ) = arg max X i Σ k = 1 K T k ( x ∈ X i ) . - - - ( 12 )
Sum rule (Sum rule)
Suppose P (X i| C k(x)) be by sorter C based on frame kThe x of measurement (x) belongs to X iProbability.According to sum rule, the classification that is used for final decision is selected as:
β ( x ) = arg max X i Σ k = 1 K P ( X i | C k ( x ) ) - - - ( 13 )
P (X i| C k(x)) can from output, estimate based on the sorter of frame.For sorter C based on frame k(x), classification X iCenter m iX is projected as discriminant vector W with the input face data k
w k i = W k T m i - - - ( 14 )
w k x = W k T x - - - ( 15 )
P (X i| C k(x)) be estimated as
P ^ ( X i | C k ( x ) ) = ( 1 + ( w k x ) T ( w k i ) | | w k x | | · | | w k i | | ) / 2 - - - ( 16 )
Its value has been normalized to [0,1].
The present invention tests on the normal video face data storehouse XM2VTS of maximum.
294 * 4 video sequences from above-mentioned four different time periods, choosing 294 different people on XM2VTS.For training data, select 294 * 3 video sequences of first three section.The set of reference picture storehouse is made up of 294 video sequences of very first time section.Form by 294 video sequences of the 4th time period as the test set that is identified video sequence.People in video is required to read two Serial No.s " 0123456789 " and " 5069281374 ".
For each video, respectively by two policy selection 21 frames: audio-video time synchronized and do not have the picked at random of audio-frequency information.So two groups of different facial figure sequence set that are labeled as A-V synchrodata and A-V non-synchronous data are respectively arranged.For the A-V synchrodata, each frame is corresponding with the crest of numeral.Other frame alignment is at the mid point of the beginning of first sentence end and second sentence.The quantity of frame can be different for different experiments.
At first check use with the gradation of image value directly as the recognition result based on the method for presentation of feature.Result for rest image and video sequence is summarised in the table 1.Rest image is (situation that A-V is synchronous) chosen from first frame of video sequence, or from (the asynchronous situation of A-V) of video sequence picked at random.Can see the performance very low (61%) of directly using rest image by the Euclidean distance classification.In fact this result reflects that the identification difficulty of this database is very big.For face recognition, if the image in test pattern and the reference picture storehouse from the different time periods, the result is very poor usually so.Can represent a significant improvement by the video data that uses identical Euclidean distance (78.3%).After having used multistage subspace analysis algorithm of the present invention and multi-categorizer algorithm, the video identification rate further is increased to and surpasses 98%.This clearly illustrates that and in fact comprised a large amount of information in video sequence.
In two hurdles of table 1, compare time synchronized and asynchronous result below.As can be seen, the A-V method for synchronizing time is compared with other all sorting techniques, and the identification accuracy is had remarkable improvement.Though note using multistage subspace analysis that the improvement of visual classification is had only 1.7%, it reflects that the identification error rate has been reduced above 45%, and this result is significant.
Table 1 uses gray scale presentation Feature Recognition result's comparison
Summed up result in the table 2 with local wavelet character.As what expect, all results are further improved.More further confirming result of study in table 1 between the distinct methods.The final identification accuracy of noting this experiment of all three kinds of algorithms of use (time synchronized, spatial synchronization and multistage subspace analysis (or multi-categorizer)) is 99%.Consider it is the identification of striding the time period (cross-session), so this accuracy is very high.
Table 2 uses the comparison of the recognition result of local wavelet character
At last, in table 3, video frequency identifying method of the present invention and existing face recognition method based on video, nearest frame method and mutual subspace method are compared.The result of existing method calculates from A-V time synchronized video sequence in the attention table 3.Also used unified subspace analysis method in the frame method recently, so they are good than original method.Can be clear that from table 3 method of the present invention has obvious improvement, its error rate only is 5% to 10% of a classic method.
The comparison of table 3 and the recognition result of existing method based on video
Method based on video Identification accuracy (%)
Mutual subspace method 79.3
Method based on video Identification accuracy (%)
Use the nearest frame method of Euclidean distance 81.7
Use the nearest frame method of LDA 90.9
Use the nearest frame method of unified subspace analysis 93.2
The multistage subspace method of use gray feature of the present invention 98.0
The sum rule multistage classifier method based on video of use gray feature of the present invention 98.0
The voting rule multistage classifier method based on video of use gray feature of the present invention 98.6
The multistage subspace analysis method based on video of use wavelet character of the present invention 99.0
The sum rule multistage classifier method based on video of use wavelet character of the present invention 99.0
The voting rule multistage classifier method based on video of use wavelet character of the present invention 98.6
Face recognition method based on the auxiliary video of audio frequency more than has been described.This method has made full use of all the space-time information in the video sequence.In order to overcome processing speed and data scale problem, room and time frame synchronization algorithm, multistage subspace analysis algorithm and the integrated algorithm of multi-categorizer have been developed.Is effective in all these technology that experimental results show that on the facial video database that gets of maximum on the improvement recognition performance.Obtained being close to perfect recognition result by new algorithm.Compare with existing method based on video with the method based on rest image, it has marked improvement.And the present invention can also use the multi-categorizer integrated technology to come further to can further improve the identification accuracy thus to based on the visual classification of presentation with carry out integratedly based on the visual classification method of small echo.

Claims (7)

1. based on the face recognition method of video, comprising:
Determine corresponding a plurality of similar frame of video in the video sequence in the video sequence that is identified and reference picture storehouse;
The similar video frame of the correspondence in the video sequence in described video sequence that is identified and reference picture storehouse is carried out the aligning of reference point;
From the described face data cube that is identified the video sequence through a plurality of frame of video formation persons of being identified behind the reference point aligning; With
Described face data cube is carried out subspace analysis, extracts the person's of being identified facial characteristics, compare with facial characteristics vector in the described reference picture storehouse,
Wherein, describedly facial data cube carried out subspace analysis comprise:
From the feature sheet that cubical each frame of described face data is formed, extract and differentiate proper vector;
In based on the sorter of frame, come each frame is discerned with described differentiation proper vector; And
Use fusion rule that the result of described sorter is merged, the video sequence that is identified is discerned.
2. method according to claim 1 is characterized in that, described fusion rule comprises: majority rule voting rule, sum rule.
3. according to each described method of claim 1-2, it is characterized in that, determining to be identified video sequence comprises with step as a plurality of similar frame of video corresponding in the video sequence of reference image library: use the waveform of the sound signal that predetermined sound produced that comprises in the described video sequence, selects the described a plurality of similar frame that is identified video sequence and conduct with reference to the correspondence in the video sequence of image library.
4. method according to claim 3, it is characterized in that, select from the waveform of described sound signal to comprise that a kind of in the following parameter is benchmark, choose described frame of video: the peak value of audio volume control, the trough of audio volume control, the central point of each word audio zone.
5. method according to claim 4 is characterized in that, further comprises the person's of being identified content of sounding when being identified is discerned, or/and
Tone features to the person of being identified is discerned.
6. based on the face recognition method of video, comprising:
Extract in the feature sheet that each frame from the video sequence that is identified is formed and differentiate proper vector;
In based on the sorter of frame, come each frame is discerned with described differentiation proper vector; With
Use fusion rule that the result of described sorter is merged, the video sequence that is identified discerned,
Wherein, the described step of differentiating proper vector of extracting from the feature sheet comprises:
Each described feature sheet is projected to the principal component analysis (PCA) subspace of determining according to the training set of this feature sheet;
Determine subspace in the class from described principal component analysis (PCA) subspace;
Determine the center of the training data class of the individuality in the reference picture storehouse, subspace in the described class is arrived in all class central projection;
Utilize the interior eigenwert of class of subspace in the described class that normalization is carried out in projection, to determine the proper vector of albefaction;
The space that the proper vector center of the described albefaction of described all classes is formed is carried out principal component analysis (PCA) and is handled, and determines to differentiate proper vector.
7. method according to claim 6 is characterized in that, described fusion rule comprises: majority rule voting rule, sum rule.
CN2005100709199A 2004-05-17 2005-05-17 Face recognition method based on video frequency Expired - Fee Related CN1866270B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US57150804P 2004-05-17 2004-05-17
US60/571,508 2004-05-17

Publications (2)

Publication Number Publication Date
CN1866270A CN1866270A (en) 2006-11-22
CN1866270B true CN1866270B (en) 2010-09-08

Family

ID=37425288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2005100709199A Expired - Fee Related CN1866270B (en) 2004-05-17 2005-05-17 Face recognition method based on video frequency

Country Status (2)

Country Link
CN (1) CN1866270B (en)
HK (1) HK1095187A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996879B2 (en) * 2010-12-23 2015-03-31 Intel Corporation User identity attestation in mobile commerce
CN103329146B (en) * 2011-01-25 2018-02-06 吉尔博斯有限责任公司 Identify the characteristic of individual using face recognition and provide a display for the individual
JP5791364B2 (en) * 2011-05-16 2015-10-07 キヤノン株式会社 Face recognition device, face recognition method, face recognition program, and recording medium recording the program
CN103093273B (en) * 2012-12-30 2016-05-11 信帧电子技术(北京)有限公司 Based on the video demographic method of body weight for humans identification
CN105025193B (en) * 2014-04-29 2020-02-07 钰立微电子股份有限公司 Portable stereo scanner and method for generating stereo scanning result of corresponding object
CN105282375B (en) * 2014-07-24 2019-12-31 钰立微电子股份有限公司 Attached stereo scanning module
CN104239858B (en) * 2014-09-05 2017-06-09 华为技术有限公司 A kind of method and apparatus of face characteristic checking
CN107004115B (en) * 2014-12-03 2019-02-15 北京市商汤科技开发有限公司 Method and system for recognition of face
TWI667621B (en) * 2018-04-09 2019-08-01 和碩聯合科技股份有限公司 Face recognition method
CN111382648A (en) * 2018-12-30 2020-07-07 广州市百果园信息技术有限公司 Method, device and equipment for detecting dynamic facial expression and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403997A (en) * 2001-09-07 2003-03-19 昆明利普机器视觉工程有限公司 Automatic face-recognizing digital video system
US6594382B1 (en) * 1999-11-04 2003-07-15 The United States Of America As Represented By The Secretary Of The Navy Neural sensors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6594382B1 (en) * 1999-11-04 2003-07-15 The United States Of America As Represented By The Secretary Of The Navy Neural sensors
CN1403997A (en) * 2001-09-07 2003-03-19 昆明利普机器视觉工程有限公司 Automatic face-recognizing digital video system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Xiaoou Tang and Zhifeng Li.Frame Synchronization and Multi-Level Subspace AnalysisforVideo Based Face Recognition.Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2004,2-5. *

Also Published As

Publication number Publication date
CN1866270A (en) 2006-11-22
HK1095187A1 (en) 2007-04-27

Similar Documents

Publication Publication Date Title
CN1866270B (en) Face recognition method based on video frequency
CN100426314C (en) Feature classification based multiple classifiers combined people face recognition method
Ruiz-del-Solar et al. Recognition of faces in unconstrained environments: A comparative study
Craw et al. Face recognition by computer
KR100543707B1 (en) Face recognition method and apparatus using PCA learning per subgroup
Christlein et al. Writer identification and verification using GMM supervectors
US20070296863A1 (en) Method, medium, and system processing video data
Khan et al. Multi-shot person re-identification using part appearance mixture
Mondal et al. Secure and hassle-free EVM through deep learning based face recognition
Lu et al. Automatic gender recognition based on pixel-pattern-based texture feature
Pranoto et al. Real-time triplet loss embedding face recognition for authentication student attendance records system framework
CN100356387C (en) Face recognition method based on random sampling
Ioannidis et al. Key-frame extraction using weighted multi-view convex mixture models and spectral clustering
Yuan et al. Holistic learning-based high-order feature descriptor for smoke recognition
Duta et al. Learning the human face concept in black and white images
Sharma et al. Face photo-sketch synthesis and recognition
Kaur Review of face recognition system using MATLAB
Raghavendra et al. Multimodal person verification system using face and speech
Intan Combining of feature extraction for real-time facial authentication system
Li et al. Aging face verification in score-age space using single reference image template
Yuan et al. Face identification by a cascade of rejection classifiers
Kittler et al. Face authentication using client specific fisherfaces
Drygajlo et al. Adult face recognition in score-age-quality classification space
Zhao et al. Co-lda: A semi-supervised approach to audio-visual person recognition
Fedias et al. A New approach based in mean and standard deviation for authentication system of face

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1095187

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1095187

Country of ref document: HK

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100908

Termination date: 20170517

CF01 Termination of patent right due to non-payment of annual fee