CN102982165A

CN102982165A - Large-scale human face image searching method

Info

Publication number: CN102982165A
Application number: CN2012105278368A
Authority: CN
Inventors: 杨育彬; 毛晓蛟; 钱洪波
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2012-12-10
Filing date: 2012-12-10
Publication date: 2013-03-20
Anticipated expiration: 2032-12-10
Also published as: CN102982165B

Abstract

The invention discloses a large-scale human face image searching method. The method comprises the following steps of preprocessing human face images; extracting local characteristics from the human face images; extracting overall geometrical characteristics from the human face images; quantifying the local characteristics; quantifying the overall geometrical characteristics; establishing a reverse index; searching a candidate human face image set; and re-arranging the candidate human face image set. By the method, an index for a large-scale human face image database can be established, quick human face research is realized, and the research efficiency is realized. In addition, the accuracy of human face research is improved by embedding an auxiliary information characteristic quantifying and candidate human face image set re-arranging algorithm. Effective and accurate large-scale human face image search is realized by the method, so that the method has higher use value.

Description

A kind of extensive Research on face image retrieval

Technical field

The invention belongs to the facial image searching field, particularly a kind of extensive Research on face image retrieval.

Background technology

In recent years, along with the rise of microblogging and social network sites and the demand of public safety, the data of facial image increase the scale of magnanimity rapidly, in the face database of such magnanimity, retrieving own interested a part of facial image has become a urgent demand, and extensive face retrieval also so gradually becomes the focus of research.Extensive facial image retrieval requires algorithm that the data scale is had good extensibility, and in addition, recall precision, recall rate and accuracy rate etc. are to estimate the common index of retrieval performance, not only require recall precision high, also will guarantee the accuracy of retrieving.

For extensive face retrieval problem, if directly use the method for traditional recognition of face, being the feature of higher-dimension complexity and wanting the whole face database of linear sweep to seek the most similar people's face of extraction do not have extensibility.Simply utilize the word bag model (Bag-of-Words, BoW) in CBIR field, owing to do not utilize the special construction information of people's face itself, the error of quantification is larger, thereby retrieval rate is lower.Some solutions of present main flow, the method based on the ID quantization encoding such as partial block, come the lower quantization loss by this priori of positional information of utilizing people's face partial block, the method that reorders with many reference sets simultaneously improves accuracy rate, obtained preferably effect, but, these solutions still exist people's face geometry information under-utilized, quantize loss and error is large, reference set generates problems such as age etc. being changed inadequate robust, extensive face retrieval remains further to be improved in performance.

Summary of the invention

Goal of the invention: the present invention has proposed a kind of extensive Research on face image retrieval in order to solve the problems of the prior art, thereby effectively solves under the large-scale data, the quick and precisely search problem of facial image.

Summary of the invention: the invention discloses a kind of extensive Research on face image retrieval, comprise following steps:

Face images in facial image to be retrieved and the face database is carried out respectively the processing of following steps 1 ~ step 4;

Step 1, the facial image pre-service: the key point in the facial image of location, and with facial image and the standard faces image alignment of presetting;

Step 2, extract local feature: according to the key point position facial image is divided into the p piece, each piece is called a localized mass, extracts the feature of each localized mass;

Step 3 is extracted overall geometric properties: according to the key point position, extract facial image overall situation geometric properties, comprise distance feature, angle character and curvature feature;

Step 4 is quantified as the vision word with facial image local feature and overall geometric properties: utilize training set to obtain local feature dictionary and global characteristics dictionary, facial image local feature and overall geometric properties are quantified as the vision word;

Step 5 is set up inverted index to the face images in the face database;

Step 6 obtains the candidate face image collection according to the vision word of facial image to be retrieved from the inverted index of step 5;

Step 7 utilizes the algorithm that reorders that the candidate face image collection is reordered, and obtains the facial image reference set, and the facial image reference set is final orderly result for retrieval.

The pre-service of step 1 facial image specifically comprises the steps:

At first, utilize the key point in the facial image of active shape model (Active Shape Model, ASM) location, people's face shape is regarded in the set of all key points of people's face as, and the vector that these key points form is called people's face shape vector.According to two people's face shape vectors, i.e. the set of the key point of two facial images, i people's face shape vector is designated as:

X _i=(x _i1,y _i1,x _i2,y _i2,…x _in,y _in) ^T，

(x wherein _Ij, y _Ij) coordinate of expression i facial image j key point, wherein j is the arbitrary numerical value among 1 ~ n, n is the key point number, preferred 76 of the present invention, the transposition of T representing matrix; General key point is chosen for the natural number more than 10.

Given facial image a and facial image b, its shape is respectively X _a=(x _A1, y _A1, x _A2, y _A2... x _An, y _An) ^TAnd X _b=(x _B1, y _B1, x _B2, y _B2... x _Bn, y _Bn) ^T

Find parameter group (θ, s, t), make parameter E=(X _a-M (s, θ) [X _b]-t) ^TW (X _a-M (s, θ) [X _bThe value of]-t) is minimum; The conversion of people's face shape vector [X to its function M (s, θ) _i]+t represents, wherein:

M (s, θ) [\begin{matrix} x_{ij} \\ y_{ij} \end{matrix}] = (\begin{matrix} (s \cos θ) x_{ij} - (s \sin θ) y_{ij} \\ (s \sin θ) x_{ij} + (s \cos θ) y_{ij} \end{matrix}),

t=(t _x,t _y,…,t _x,t _y) ^T，

W is the weights (w of each key point ₁, w ₂..., w _n) diagonal matrix that forms, each key point weights is the same, n key point, and the weights of each key point are exactly 1/n, can certainly specifically set as required the weights at different facial images position, if each weights more than or equal to 0 and all weights sums be 1 just passable.t _x, t _yBe respectively key point x, the displacement that the y coordinate is corresponding.In minimizing the process of E, each parameter differentiated can obtain the result, according to the parameter that obtains facial image is changed, and makes two people's faces keep maximum position corresponding, and process is as follows: make scos θ=a _x, ssin θ=a _y, to a _x, a _y, t _x, t _yDifferentiate separately, make E minimum, obtain following four equatioies:

\{\begin{matrix} B_{x} a_{x} - B_{y} a_{y} + {Wt}_{x} = A_{x} \\ B_{y} a_{x} + B_{x} a_{y} + {Wt}_{y} = A_{y} \\ {Za}_{x} + B_{x} t_{x} + B_{y} t_{y} = C_{1} \\ {Za}_{y} - B_{y} t_{x} + B_{x} t_{y} = C_{2} \end{matrix},

Wherein

A_{x} = Σ_{k = 1}^{n} w_{k} x_{ak},

B_{x} = Σ_{k = 1}^{n} w_{k} x_{bk},

A_{y} = Σ_{k = 1}^{n} w_{k} x_{ak},

B_{y} = Σ_{k = 1}^{n} w_{k} x_{bk},

Z = Σ_{k = 1}^{n} w_{k} ({x_{bk}}^{2} + {y_{bk}}^{2}),

W = Σ_{k = 1}^{n} w_{k},

C_{1} = Σ_{k = 0}^{n} w_{k} (x_{ak} x_{bk} + y_{ak} y_{bk}),

C_{2} = Σ_{k = 0}^{n} w_{k} (y_{ak} x_{bk} + x_{ak} y_{bk}) .

Calculating can obtain conversion parameter group (θ by solving an equation; S; T), again image is changed.

Step 2 is extracted local feature and is comprised:

According to the key point in the facial image of step 1 location, people's face is carried out piecemeal.At first according to the organ site in the facial image, mainly comprise eyes, nose, cheek, people's face is divided into five bulks, method is as follows: with all key points of left eye as a set, with 4 key points of upper and lower, left and right as the border, form a rectangle, this rectangle is considered to piece corresponding to left eye.The acquisition methods of all the other four pieces is identical with the method for obtaining the left eye piece.Then with every a bulk of 4 * 5 fritters that are equally divided into, between these 20 the overlapping region is arranged.People's face can be divided into 100 fritters, and each fritter has independently numbering, and Serial Number Range is 1-100.At last, each piece is extracted local binary (Local Binary Pattern, LBP) and yardstick invariant features conversion (Scale Invariant Feature Transform, SIFT) two kinds of features.

Step 3 is extracted overall geometric properties and is comprised:

According to the key point in the facial image of step 1 location, extract the overall geometric properties between the key point, overall geometric properties is described the space geometry relation between the key point, comprises distance, angle, curvature.According to the key point of location, calculate in twos distance, will be spliced into vector apart from linearity, be called distance feature; Calculate the angle that forms angle between the key point, the angle linearity is spliced into vector, be called the angle vector.Key point to the chin position is calculated curvature, and the curvature linearity is spliced into vector, is called curvature vector.With distance vector, angle vector sum curvature vector linear mosaic, it is exactly the proper vector of overall geometric properties.

Definition vision word comprises in the step 4:

In a kind of extensive face retrieval method, definition vision word is＜Name ID Age ID, Gender ID, Position ID 〉, each part or overall geometric properties all are quantified as one or more vision words.In training set, each facial image has its name, age bracket, sex information separately.Wherein Name ID represents the name of people's face in the corresponding facial image training set of this feature, Age ID represents the age bracket under people's face in the corresponding facial image training set of this feature, Gender ID represents the sex of people's face in the corresponding facial image training set of this feature, and Position ID represents block number corresponding to facial image in the corresponding facial image training set of this feature.Especially, for local feature, because people's face is divided into 100 fritters, so the Position ID of the vision word that is quantized into of local feature is the numbering of its corresponding people's face piece, and for overall geometric properties, Position ID is unified to be set to 101.

Local feature quantizes to comprise in the step 4:

Divide block number with everyone face of training set by step 2, every is extracted respectively local feature, comprises the conversion of local binary and yardstick invariant features.All proper vectors of same characteristic features kind (namely being all the conversion of local binary or yardstick invariant features), same age, identical sex, identical numbering are classified as one group.Every stack features utilizes one group of base of sparse coding model training.

\min_{D, V} Σ_{i = 1}^{n} {| | x_{i} - {Dv}_{i} | |}^{2} + λ {| | v_{i} | |}_{1}

s . t . | | v_{i} | | \leq 1, &ForAll; i,

Wherein, the dictionary that D arrives for study, x _iBe the proper vector of every group of training set, v _iBe x _iCoefficient when carrying out linear reconstruction with D.λ is Sparse parameter.Can calculate dictionary D by training set, can train for every group and obtain a dictionary D.When retrieval, the age bracket of a given proper vector x' and its affiliated people's face, sex and minute block number are chosen corresponding dictionary D', with dictionary x' are reconstructed, and obtain reconstruction coefficients v', and at this moment, local feature x' just is encoded as v'.The name of people's face under the corresponding base of non-zero entry is the Name ID after x' quantizes among the v'.The age bracket of x', sex and piece label are respectively Age ID, Gender ID and the Position ID after the quantification.

Overall geometric properties quantizes to comprise in the step 4:

To all training set face extraction overall situation geometric properties, with people face age bracket, the sex grouping of overall geometric properties according to correspondence, other overall geometric properties of same age bracket homogeneity is divided into one group.

A given overall geometric properties F _gAnd age bracket, sex information, from training set, select the feature group of corresponding age bracket, sex, find F by neighbor search _gThe neighbour, adjacent identity, age, sex are F _gQuantification after Name ID, Age ID, Gender ID, the non-Position ID of overall geometric properties is unified to be set to 101.

Step 5 is set up inverted index and is comprised:

Facial images all in the database is set up inverted index, be used for retrieval.For a facial image, quantize according to step 5 and step 6 respectively, obtain representing the vision word of this people's face, this facial image is indexed on the vision word.For the facial image that does not have age and sex information in the database, can utilize existing age algorithm for estimating and sex algorithm for estimating to obtain.Because the age information that this algorithm needs is an age bracket but not therefore exact value, has certain fault-tolerance to the age algorithm for estimating.Everyone face of database is quantized, index respectively vision word separately.Index structure is the set of vision word, by given vision word, can retrieve the facial image set corresponding with it.

Step 6 obtains the candidate face image collection and comprises:

A given facial image to be retrieved obtains its vision word by above-mentioned quantization method, for the facial image that does not have age and sex information, can utilize existing age algorithm for estimating and sex algorithm for estimating to obtain.Because the age information that this algorithm needs is an age bracket but not therefore exact value, has certain fault-tolerance to the age algorithm for estimating.Find identical vision word from inverted index, facial image that will be corresponding with it takes out, and these facial images have formed the candidate face image collection, and the candidate face image collection is the preliminary search result.

Step 7 algorithm that reorders comprises:

At first utilize a part of facial image training set, training of human face sorter, this sorter are used for two facial images of prediction and whether belong to same person.Concrete steps are as follows:

(1) the sorter training set is some facial images pair, and wherein a part is the facial image pair of same person all ages and classes, and a part is the facial image pair of different people all ages and classes in addition.

(2) to the gradient face characteristic (GradientFaces) of an overall situation of each facial image extraction, the gradient face characteristic of every a pair of people's face asks poor, obtains a difference characteristic.

(3) facial image of same person all ages and classes to the difference characteristic that obtains as positive example, the facial image of different people all ages and classes to the difference characteristic that obtains as counter-example.Utilize support vector machine (Support Vector Machine, SVM) train classification models.

Train after the sorter, to the hamming code feature of all face extraction overall situations of candidate face image collection.

In the algorithm that reorders, by the mode of iteration, from the candidate face image collection, choose people's face at every turn, add the facial image reference set, method is as follows:

Each takes turns iteration, each facial image in the candidate face image collection is calculated the overall distance D of itself and facial image to be retrieved and facial image reference set facial image, the facial image that the D value is minimum is removed from the candidate face image collection, and adds the facial image reference set.D is expressed as:

D = - β \cdot sign (f (Q, I)) + d (Q, I) + α \cdot \frac{1}{| R |} \underset{i}{Σ} d (R_{i}, I),

Wherein, Q is facial image to be retrieved, and I is any one facial image in the candidate face image collection, the distance of-β sign (f (Q, I))+d (Q, I) expression candidate face image collection image I and facial image Q to be retrieved,

The distance of expression candidate face image collection image I and facial image reference set facial image, overall distance D is both sums; α, β is scale-up factor, the size of these two coefficients represents that algorithm stresses the distance of classification results or people's face hamming code of semantic classification item, as α during greater than β, algorithm stresses the distance of people's face hamming code, so that the impact of hamming code distance is greater than the classification results of semantic classification item, otherwise as β during greater than α, the classification results influence power of semantic classification item is greater than the hamming code distance, and actual value can be determined according to using needs, generally can get simultaneously 0.5.Sign (f (Q, I)) be the semantic classification item, f is the sorter of above-mentioned training, Q and I are extracted respectively gradient face characteristic and calculated difference feature, utilize the sorter judged result, sign (f (Q, I)) returns 1 expression Q and I is predicted as same person, and returning-1 expression is not same person.D (Q, I) is the distance of calculating the hamming code feature of Q and I.D (R _i, I) the hamming code distance of i people's face in calculating I and the facial image reference set.| R| represents facial image reference set size.Above-mentioned every iteration of taking turns is to choose a facial image I from the candidate face image collection, the D value that it calculates is minimum in the candidate face image collection face images, this facial image is removed from the candidate face image collection, added the facial image reference set.

The present invention is the method that proposes for extensive face retrieval specially.The present invention has following characteristics: 1) preprocessing process positions the key point in the facial image, and according to key point alignment people face, alignment procedure makes the different people appearance clearer and more definite with the position corresponding relation of part; 2) utilize simultaneously part and the overall geometric properties of people's face, and designed quantization method separately, utilized the vision word list face of leting others have a look at; 3) utilize the vision word that face database is set up inverted index, inverted index can retrieve people's face candidate face image collection fast, avoids the linear search to database, is conducive to the expansion of extensive face database; 4) designed the algorithm that reorders, the algorithm that reorders has been considered the distance of distance, candidate face image collection and the facial image reference set of the semantic classification of people's face to be retrieved and candidate face image collection, people's face to be retrieved and candidate face image collection simultaneously, and comprehensive like this optimization makes the algorithm that reorders can obtain accurately ranking results.The present invention can be directly used in the quick-searching of extensive people's face data set of has age, attitude variation.

Beneficial effect: the present invention can carry out the alignment of facial image, set up the index structure of large scale database, improve the extendability of retrieval performance and database, in addition, the algorithm that reorders among the present invention has higher accuracy rate, can and obtain an accurate rank of successively decreasing with human face similarity degree to be retrieved to the ordering of candidate face image collection people face.Therefore extensive Research on face image retrieval has higher use value.

Description of drawings

Fig. 1 is process flow diagram of the present invention.

Fig. 2 is for utilizing training set to obtain local feature dictionary process flow diagram.

Fig. 3 is for utilizing training set to obtain overall geometric properties dictionary process flow diagram.

Fig. 4 face piecemeal synoptic diagram of behaving.

Fig. 5 face overall situation geometric properties synoptic diagram of behaving.

Fig. 6 is embodiment 2 synoptic diagram.

Embodiment:

As shown in Figure 1, the invention discloses a kind of extensive Research on face image retrieval, comprise following steps:

Step 2, extract local feature: according to the key point position facial image is divided into the p piece, each piece is called a localized mass, extracts the feature of each localized mass; The present invention generally is divided into 100.

Step 5 is set up inverted index to the face images in the face database;

Step 1 specifically comprises the steps:

The locator key point positions the key point in the facial image, and the set of key point is as people's face shape;

Facial image and the standard faces image alignment of presetting are comprised for standard faces image a and any facial image b, its people's face shape is respectively X _aAnd X _b, alignment procedure is calculating parameter group (θ, s, t), makes E=(X _a-M (s, θ) [X _b]-t) ^TW (X _a-M (s, θ) [X _bThe minimum of]-t), wherein, M (s, θ) [X _b]+t represents that M represents alignment function to the conversion of facial image b people face shape vector, and θ represents rotation parameter, and s represents zooming parameter, and t represents displacement parameter, and W is the diagonal matrix that the weights of each key point form;

According to parameter group (θ, s, t), facial image b is alignd with standard faces image a.

People's face method of partition as shown in Figure 4 in the step 2 extraction local feature.At first according to organ site, mainly comprise eyes, nose, cheek, people's face is divided into five bulks, Fig. 4 is (owing to the invention relates to the invention of picture, therefore must use gray scale pictures as synoptic diagram) people's five bulks in regional corresponding the method for grid on the face among rear five people's face figure, method is as follows: with all key points of left eye as a set, with 4 key points of upper and lower, left and right as the border, form a rectangle, this rectangle is considered to piece corresponding to left eye.The acquisition methods of all the other four pieces is identical with the method for obtaining the left eye piece.Then with every a bulk of 4 * 5 fritters that are equally divided into, each bulk is divided into 20 fritters again by further gridding among Fig. 4, between these 20 the overlapping region is arranged.People's face can be divided into 100 fritters, and each fritter has independently numbering, and Serial Number Range is 1 ~ 100.The feature of extracting each localized mass in the step 2 comprises local binary and two kinds of features of yardstick invariant features conversion.

Extract overall geometric properties in the step 3 as shown in Figure 5 (owing to the invention relates to the invention of picture, therefore must use gray scale pictures as synoptic diagram), step 3 is extracted overall geometric properties and is comprised that specifically the space geometry between the key point concerns, comprises distance feature, angle character and curvature feature.The 2nd people's face figure expression distance feature among Fig. 5, its middle conductor represents the distance between the key point.The 3rd subgraph represents angle character among Fig. 5, and wherein the angle of two line segment compositions is the angle in the algorithm.The 4th subgraph represents the curvature feature among Fig. 5, and curve is the curve after key point connects, and the curvature feature in the algorithm is this curvature of a curve.According to the key point of location, calculate in twos distance, will be spliced into vector apart from linearity, be called distance feature; Calculate the angle that forms angle between the key point, the angle linearity is spliced into vector, be called the angle vector.Key point to the chin position is calculated curvature, and the curvature linearity is spliced into vector, is called curvature vector.With distance vector, angle vector sum curvature vector linear mosaic, it is exactly the proper vector of overall geometric properties.The result who returns according to the key point location algorithm in the facial image, select 5 key points as left eye region geometric properties point, select 5 key points as right eye region geometric properties point, select 5 key points as nasal area geometric properties point, select 4 key points as face region geometry unique point, select 7 key points as chin area geometric properties point, 26 geometric properties points represent with S set altogether.With coupling together with line between these 26 geometric properties points, obtain altogether 39 line segments, calculate the length of these line segments, i.e. distance feature between the key point.According to each above-mentioned line segment distance, use the triangle cosine law, calculate the angle character between the line segment.Extract the contour feature of chin area, the radius-of-curvature that 5 geometric properties points of use chin area are linked to be curve correspondence position point incorporates overall geometric properties as contour feature, comes the differential of this point of approximate representation with the difference of discrete point.

Utilize training set to obtain the local feature dictionary in the step 4 and the global characteristics dictionary comprises the steps:

Definition vision word: the vision word comprises Name ID, Age ID, Gender ID, Position ID; Wherein, for local feature, Name ID represents the name of the affiliated people's face of localized mass, and Age ID represents the age bracket of the affiliated people's face of localized mass, and Gender ID represents the sex of the affiliated people's face of localized mass, and Position ID represents the numbering of localized mass in affiliated people's face; For global characteristics, Name ID represents the name of the corresponding people's face of global characteristics, and Age ID represents the age bracket of the corresponding people's face of global characteristics, and Gender ID represents the sex of the corresponding people's face of global characteristics, Position ID is made as fixed value p+q, any positive integer of q value;

Utilize training set to generate the local feature dictionary as shown in Figure 2, its process is as follows: to facial images all in the training set carry out step 1 ~ 2, and local binary, the yardstick invariant features of training set face images are changed two kinds of local features be included into separately a set, obtain two set, each set is called the local feature set; With the local feature of same age bracket, same sex and same numbering in the set of each local feature as one group, every group is utilized respectively the training of sparse coding model, obtain a local feature dictionary, this dictionary is comprised of a stack features vector, a local feature in the corresponding group of each proper vector is called base; Any one proper vector all is reconstructed by the linear combination of base, and the coefficient vector of linear combination is that the sparse coding of proper vector represents;

Utilize training set to generate global characteristics dictionary such as Fig. 3, its process is as follows: facial images all in the training set is carried out step 1 and step 3, as one group, each group is considered as a global characteristics dictionary with the overall geometric properties of same age bracket, same sex in the training set.

In the step 4 facial image local feature and overall geometric properties being quantified as the vision word comprises the steps:

To everyone face image local feature F _lUtilize the baseline of the facial image local feature dictionary of its same characteristic features kind (local binary or the conversion of yardstick invariant features), same age section, identical sex and identical numbering to be combined into line reconstruction, obtain base corresponding to nonzero element in the coefficient vector; The name of the facial image of base is as facial image local feature F _lThe Name ID of corresponding vision word, age bracket is as facial image local feature F _lThe Age ID of corresponding vision word, sex is as facial image local feature F _lThe Gender ID of corresponding vision word, numbering is as facial image local feature F _lThe Position ID of corresponding vision word;

For people's face image overall geometric properties F _g, select its same age section, other global characteristics dictionary of homogeny, with facial image overall situation geometric properties F _gCarry out nearest neighbor search with the global characteristics in the dictionary, obtain the name of the facial image of arest neighbors in the dictionary as facial image overall situation geometric properties F _gThe Name ID of corresponding vision word, age bracket is as facial image overall situation geometric properties F _gThe Age ID of corresponding vision word, sex is as facial image overall situation geometric properties F _gThe Gender ID of corresponding vision word, facial image overall situation geometric properties F _gPosition ID be made as p+q, any positive integer of q value.

Step 5 comprises the steps: all unduplicated vision words are formed a vision set of letters, and each vision word is as an index entry in the vision set of letters, and each index entry and a facial image set that comprises this vision word are bound; The facial image that is quantized into the vision word in the face database is related with the vision word in the vision set of letters, form inverted index, after being about to the facial image quantification in the face database, facial image is joined in the index entry tabulation of its corresponding vision word.

Step 6 comprises the steps:

The vision word corresponding according to facial image to be retrieved, the described vision word of search from the vision set of letters of inverted index, the facial image of face database that will be related with it takes out, and the set that the face images of taking-up forms is called the candidate face image collection.

Step 7 comprises the steps:

According to facial image to be retrieved, from the candidate face image collection, select lineup's face image by process of iteration, add the facial image reference set, after iteration finished, final facial image reference set was the result for retrieval after reordering;

Each takes turns iteration, each facial image in the candidate face image collection is calculated the overall distance D of itself and facial image to be retrieved and facial image reference set facial image, the facial image that the D value is minimum is removed from the candidate face image collection, and adds the facial image reference set.Overall distance D adopts following formula to calculate:

D = - β \cdot sign (f (Q, I)) + d (Q, I) + α \cdot \frac{1}{| R |} \underset{i}{Σ} d (R_{i}, I),

The distance of expression candidate face image collection image I and facial image reference set facial image, overall distance D is both sums.α, β are scale-up factor, and the span of α, β is 0 ~ 1, sign (f (Q, I)) be the semantic classification item, wherein f (Q, I) expression is carried out semantic classification to the facial image I in facial image Q to be retrieved and the candidate face image collection, judge whether it belongs to same class, f is classification function, and sign gets sign function, d (Q, I) distance of image I in facial image Q to be retrieved and the candidate face image collection is calculated in expression

The mean distance of facial image I and facial image reference set in the set of expression calculated candidate facial image, | R| represents the size of facial image reference set, R _iIn the expression current iteration in the facial image reference set i open facial image.

Embodiment 1

Present embodiment comprises following part:

1. the facial image key point is located:

Utilize active shape model (Active Shape Models, ASM) to the location of the key point in the facial image, this module mainly is divided into two steps: model training and key point search.

In the training pattern process, used principal component analysis (PCA) (principal component analysis, PCA).

Principal component analysis (PCA) is to focus on exploratory statistic analysis method on certain several overall target (major component) with being dispersed in one group of information on the variable, utilize the inner structure of major component descriptor data set, in fact play a part Data Dimensionality Reduction, several eigenvalue of maximum characteristic of correspondence vectors consisted of one group of base before the method was selected former data covariance matrix, to reach the best purpose that characterizes former data.Consider the vector x in the n-dimensional space, for dimensionality reduction, need to be with the vector x of a m dimension ' come approximate simulation, wherein m＜n namely seeks a conversion f:R ⁿ→ R ^mThe conversion of the following form of PCA utilization:

y=W ^T(x-μ)

Wherein μ is expectation μ=E[x of random vector x], W=(w ₁, w ₂... w _m) be the transformation matrix of a n * m.Suppose that x is zero-mean, then y _i=w _i ^TX.The target of PCA conversion is under above-mentioned constraint condition, seek so that | y _i| ²Each maximum w _i, namely wish farthest to keep the original difference of x.The objective function of above optimization problem is:

E_{i} = \frac{Σ_{i = 1}^{D} {| y_{i} |}^{2}}{D} = \frac{Σ_{i = 1}^{D} y_{i} y_{i}}{D} = \frac{Σ_{i = 1}^{D} {w_{i}}^{T} {xx}^{T} w_{i}}{D} = {w_{i}}^{T} {Cw}_{i}

Wherein D is number of training,

Be x covariance matrix based on x ₁, x ₂..., x _DValue.In order to ask the maximal value of objective function, use method of Lagrange multipliers,

0 = \frac{&PartialD; (E_{i} + λ_{i} (1 - {w_{i}}^{T} w_{i}))}{&PartialD; w_{i}} = 2 {Cw}_{i} - 2 λ_{i} w_{i}

Work as Cw _i=λ w _iThe time objective function get maximal value.Therefore, w _iShould be taken as the proper vector of C, λ _iFor with this proper vector characteristic of correspondence value.Be λ according to the eigenwert ordering ₁〉=λ ₂〉=... 〉=λ _n, they are respectively w by the characteristic of correspondence vector ₁, w ₂... w _n, then get top m vectorial w ₁, w ₂... w _mForm the PCA transformation matrix.For the consideration of numerical evaluation aspect, usually use svd (SVD, Singular Value Decomposition) to calculate a front m major component.

Training pattern: key point is located main training of human face shape model and local gray-scale statistical model.Training set is people's face data set of mark key point,, all key points of people's face form people's face shape vector x _i, it is a 2 * n-dimensional vector:

x _i=(x _i1,y _i1,x _i2,y _i2,…,x _in,y _in) ^T,i=1,2,…,N

(x wherein _Ij, y _Ij) be the coordinate of j unique point on i the training sample.N is counting of people's face shape model, and N is the number of training sample.With the some combination of shapes in the training set, and by principal component analysis (PCA), training obtains shape subspace, and then people's face shape vector can be expressed as arbitrarily:

x = \overset{&OverBar;}{x} + Pb

Wherein: P=(p ₁, p ₂... p _t) be the matrix that front t proper vector that PCA obtains forms, be called the subspace, b=(b ₁, b ₂... b _t) ^TBe any people's face at the projection coefficient of subspace, can obtain different people's face shape vectors by changing b, so far finished the foundation of people's face shape model.

For certain specific key point of people's face, it be similar that the feature around it distributes.In each key point of the every width of cloth facial image of people's face data centralization of mark key point, through this point, along getting the gray scale differential of k pixel on its point of proximity vertical direction and to its standardization, with this vector as local gray level information.When the local gray-scale statistical model of training, j key point of all N sample image extracted gray feature, be designated as g _Ij, then to its standardization:

{g_{ij}}^{'} = \frac{g_{ij}}{Σ_{t = 1}^{k} | g_{ij} (k) |}

G wherein _Ij(k) expression g _IjK pixel grey scale differential.To j key point normalized feature of every people's face, obtain the training set of j key point, estimate average and covariance matrix with sample average and sample covariance.Sample average

With the sample covariance matrix ∑ _jBe calculated as follows:

\overset{&OverBar;}{l_{j}} = \frac{1}{N} Σ_{i = 1}^{N} {g_{ij}}^{'}

Σ_{j} = \frac{1}{N} Σ_{i = 1}^{N} ({g_{ij}}^{'} - \overset{&OverBar;}{l_{j}}) {({g_{ij}}^{'} - \overset{&OverBar;}{l_{j}})}^{T}

Each key point is calculated such statistical model, and so far, the training of local gray level statistical model is finished.

Key point search: at first find human face region by the fast face detection algorithm, then in human face region, provide the initial position of key point, it is original shape, the center of general employment face shape model, namely the mean value of people's face data centralization people face shape vector of mark key point is as initial position.Behind the given initial position, find the accurate location of each key point by iterative algorithm, process is as follows:

1) searches for each crucial neighborhood of a point, find the optimum point of epicycle search.Calculate local gray level feature g at certain neighborhood point, by calculating mahalanobis distance:

f (g) = (g - \overset{&OverBar;}{l_{j}}) {Σ_{j}}^{- 1} {(g - \overset{&OverBar;}{l_{j}})}^{T}

∑ wherein _j ^-1It is ∑ _jInverse matrix, the epicycle of thinking of mahalanobis distance minimum is searched for the position of optimum key point;

2) according to the displacement of all key points, by new shape of people's face shape model generation, replace original shape;

3) return step 1), until people's face shape vector changes less than threshold value T ₁。

Obtain the position of the key point in the facial image after search finishes, finished the key point location.

2. people's face alignment:

The corresponding relation of the key point position in the facial image is namely confirmed in alignment, and therefore, the present invention utilizes the people's face that aligns of the key point in the facial image of location, makes people's face position relationship keep farthest corresponding.Given two people's face shapes vector, i.e. the set of key point, the shape of i people's face is designated as:

X _i=(x _i1,y _i1,x _i2,y _i2,…x _in,y _in) ^T

(x wherein _Ij, y _Ij) coordinate of expression i people's face j key point, j is the arbitrary numerical value among 1 ~ n, n is the key point number, the transposition of T representing matrix.Given facial image a and facial image b, its shape is respectively X _a=(x _A1, y _A1, x _A2, y _A2... x _An, y _An) ^TAnd X _b=(x _B1, y _B1, x _B2, y _B2... x _Bn, y _Bn) ^T, the purpose of alignment is to find one group of parameter (θ; S; T), make

E=(X _a-M(s,θ)[X _b]-t) ^TW(X _a-M(s,θ)[X _b]-t)

Minimum.Wherein θ represents the anglec of rotation, and s represents scaling, and t represents the displacement of translation, to conversion M (s, the θ) [X of people's face shape vector _i]+t represents, wherein

M (s, θ) [\begin{matrix} x_{ij} \\ y_{ij} \end{matrix}] = (\begin{matrix} (s \cos θ) x_{ij} - (s \sin θ) y_{ij} \\ (s \sin θ) x_{ij} + (s \cos θ) y_{ij} \end{matrix})

t=(t _x,t _y,…,t _x,t _y) ^T

W is the weights (w of each key point ₁, w ₂..., w _n) diagonal matrix that forms, t _x, t _yBe respectively key point x, the displacement that the y coordinate is corresponding.In minimizing the process of E, each parameter differentiated can obtain the result, according to the parameter that obtains facial image is changed, and makes two people's faces keep maximum position corresponding, and process is as follows:

Make scos θ=a _x, ssin θ=a _y, to a _x, a _y, t _x, t _yDifferentiate separately, make E minimum, obtain following four equatioies:

\{\begin{matrix} B_{x} a_{x} - B_{y} a_{y} + {Wt}_{x} = A_{x} \\ B_{y} a_{x} + B_{x} a_{y} + {Wt}_{y} = A_{y} \\ {Za}_{x} + B_{x} t_{x} + B_{y} t_{y} = C_{1} \\ {Za}_{y} - B_{y} t_{x} + B_{x} t_{y} = C_{2} \end{matrix}

Wherein

A_{x} = Σ_{k = 1}^{n} w_{k} x_{ak},

B_{x} = Σ_{k = 1}^{n} w_{k} x_{bk},

A_{y} = Σ_{k = 1}^{n} w_{k} y_{ak},

B_{y} = Σ_{k = 1}^{n} w_{k} x_{bk},

Z = Σ_{k = 1}^{n} w_{k} ({x_{bk}}^{2} + {y_{bk}}^{2}),

W = Σ_{k = 1}^{n} w_{k},

C_{1} = Σ_{k = 0}^{n} w_{k} (x_{ak} x_{bk} + y_{ak} y_{bk}),

C_{2} = Σ_{k = 0}^{n} w_{k} (y_{ak} x_{bk} - x_{ak} y_{bk}) .

Calculate the transformation parameter group (θ that to obtain aliging by solving an equation; S; T), again image is changed.

3. extraction local feature

According to the key point in the facial image of step 1 location, people's face is carried out piecemeal.At first according to organ site, mainly comprise eyes, nose, cheek, people's face is divided into five bulks.Then with every a bulk of 4 * 5 fritters that are equally divided into, between these 20 the overlapping region is arranged.People's face can be divided into 100 fritters, and each fritter has independently numbering, and Serial Number Range is 1-100.At last, each piece is extracted local binary (Local Binary Pattern, LBP) and yardstick invariant features conversion (Scale Invariant Feature Transform, SIFT) two kinds of features.

4. extract overall geometric properties

According to the key point in the facial image of step 1 location, extract the overall geometric properties between the key point, overall geometric properties is described the space geometry relation between the key point, comprises distance feature, angle character, curvature feature.According to the key point of location, calculate in twos distance, will be spliced into vector apart from linearity, be called distance feature; Calculate the angle that forms angle between the key point, the angle linearity is spliced into vector, be called the angle vector.Key point to the chin position is calculated curvature, and the curvature linearity is spliced into vector, is called curvature vector.With distance vector, angle vector sum curvature vector linear mosaic, it is exactly the proper vector of overall geometric properties.Particularly:

1) result who returns according to the key point location algorithm in the facial image, select 5 key points as left eye region geometric properties point, select 5 key points as right eye region geometric properties point, select 5 key points as nasal area geometric properties point, select 4 key points as face region geometry unique point, select 7 key points as chin area geometric properties point, 26 geometric properties points represent with S set altogether.To couple together with line between these 26 geometric properties points, obtain altogether 39 line segments, calculate the length of these line segments, it is distance between the key point, here with the distance between two as gauged distance d', the impact that the distance of other calculating causes with the removal of images yardstick divided by this gauged distance, distance adopts European tolerance, as shown in the formula:

D = {d_{ij} | d_{ij} = \frac{{| | p_{i} - p_{j} | |}^{2}}{d^{'}}, p_{i}, p_{j} &Element; S}

Wherein, p _iI point among the expression S, d _IjThe distance that represents i point and j point.Have comparability during for the quantification treatment of follow-up overall geometric properties, the eigenwert of adjusting the distance is carried out normalization:

D_{norm} = \frac{d_{n}}{| D |},

| D | = \sqrt{Σ_{n = 1}^{39} d_{n}^{}}, d_{n} &Element; D

dn∈D

D wherein _NormBe the vector of the distance feature after the normalization.

2) according to each above-mentioned line segment distance, use the triangle cosine law, calculate the angle between the line segment:

θ = {θ_{ikj} | θ_{ikj} = \arccos (\frac{d_{ik}^{2} + d_{kj}^{2} - d_{ij}^{}}{2 d_{ik} \cdot d_{kj}}), p_{i}, p_{k}, p_{j} &Element; S}

θ wherein _IkjRepresent i, k, a j angle angle that point forms.Then, these angle character values are carried out normalization:

θ_{norm} = \frac{θ_{n}}{| θ |},

| θ | = \sqrt{Σ_{n = 1}^{15} θ_{n}^{2}}, θ_{n} &Element; θ

θn∈θ

3) extract the contour feature of chin area, the radius-of-curvature of using 5 geometric properties points of chin area to be linked to be curve correspondence position point here incorporates overall geometric properties as contour feature, comes the differential of this point of approximate representation with the difference of discrete point, that is:

ρ = {ρ_{i} | ρ_{i} = | \frac{{(1 + {y_{i}^{'}}^{2})}^{\frac{3}{2}}}{y_{i}^{''}} |, p_{i} &Element; S}

Y wherein _i' be the first order derivative at unique point i place, use the first order difference approximate representation, y _i" be the second derivative at unique point i place, use the second order difference approximate representation.Equally these 5 radius-of-curvature eigenwerts are carried out normalization, that is:

ρ_{norm} = \frac{ρ_{n}}{| ρ |},

| ρ | = \sqrt{Σ_{n = 1}^{5} ρ_{n}^{2}}, ρ_{n} &Element; ρ

ρn∈ρ

5. define the vision word

In a kind of extensive face retrieval method, definition vision word is＜Name ID Age ID, Gender ID, Position ID 〉, each part or overall geometric properties all are quantified as one or more vision words.In training set, each facial image has its name, age, sex information separately.Wherein Name ID represents the name of people's face in the corresponding facial image training set of this feature, Age ID represents the age bracket of people's face in the corresponding facial image training set of this feature, age bracket is divided, be divided into 8 sections, be respectively 1-15,15-22,23-30,31-38,39-46,47-54, more than the 55-62 and 62, everyone face is divided into an age bracket.Gender ID represents the sex of people's face in the corresponding facial image training set of this feature, and Position ID represents block number corresponding to facial image in the corresponding facial image training set of this feature.Especially, for local feature, because people's face is divided into 100 fritters, so the Position ID of the vision word that is quantized into of local feature is the numbering of its corresponding people's face piece, and for overall geometric properties, Position ID is unified to be set to 101.

6. local feature quantizes

Divide block number with everyone face of training set by step 2, every is extracted respectively local feature.All proper vectors of same feature kind (local binary or the conversion of yardstick invariant features), age-grade, same sex, same numbering are classified as one group.Every stack features utilizes one group of base of sparse coding model training.

\min_{D, V} Σ_{i = 1}^{n} {| | x_{i} - {Dv}_{i} | |}^{2} + λ {| | v_{i} | |}_{1}

s . t . | | v_{i} | | \leq 1, &ForAll; i

Wherein, the dictionary that D arrives for study, x _iBe the proper vector of every group of training set, v _iBe x _iCoefficient when carrying out linear reconstruction with D.λ is Sparse parameter.Can calculate dictionary D by training set, can train for every group and obtain a dictionary D.When retrieval, the age bracket of a given proper vector x' and its affiliated people's face, sex and minute block number are chosen corresponding dictionary D', with dictionary x' are reconstructed, and obtain reconstruction coefficients v', and at this moment, local feature x' just is encoded as v'.The name of people's face under the corresponding base of non-zero entry is the Name ID after x' quantizes among the v'.The age bracket of x', sex and piece label are respectively Age ID, Gender ID and the Position ID after the quantification.In practice, be not to choose all non-zero entry to obtain the vision word, but select 3 elements of coefficient maximum to obtain the vision word.

7. overall geometric properties quantizes

A given overall geometric properties F _gAnd age bracket, sex information, from training set, select the feature group of corresponding age bracket, sex, find F by neighbor search _g3 arest neighbors, neighbour's identity, age bracket, sex are F _gQuantification after Name ID, Age ID, Gender ID, the non-Position ID of overall geometric properties is unified to be set to 101.

Step 7 is set up inverted index and is comprised:

Facial images all in the database is set up inverted index, be used for retrieval.For a facial image, quantize according to above-mentioned process, obtain representing the vision word of this people's face, this facial image is indexed on the vision word.Everyone face of database is quantized, index respectively vision word separately.Index structure is the set of vision word, by given vision word, can retrieve the facial image set corresponding with it.

8. obtain the candidate face image collection

A given facial image to be retrieved, obtain its vision word by above-mentioned quantization method, find identical vision word from inverted index, facial image that will be corresponding with it takes out, these facial images have formed the candidate face image collection, and the candidate face image collection is the preliminary search result.

9. reorder

(1) the sorter training set is some facial images pair, and wherein a part is the facial image of same person all ages and classes, and a part is the facial image of different people all ages and classes in addition.

(3) facial image of same person all ages and classes to the difference characteristic that obtains as positive example, the facial image of different people all ages and classes to the difference characteristic that obtains as counter-example.Utilize support vector machine (SupportVector Machine, SVM) train classification models.

D = - β \cdot sign (f (Q, I)) + d (Q, I) + α \cdot \frac{1}{| R |} \underset{i}{Σ} d (R_{i}, I),

The distance of expression candidate face image collection image I and facial image reference set facial image, overall distance D is both sums; α, β is scale-up factor, sign (f (Q, I)) be the semantic classification item, f is the sorter of above-mentioned training, and Q and I are extracted respectively gradient face characteristic and calculated difference feature, utilize sorter to predict the outcome, sign (f (Q, I)) returns 1 expression Q and I is predicted as same person, and returning-1 expression is not same person.D (Q, I) is the distance of calculating the hamming code feature of Q and I.D (R _i, I) the hamming code distance of i people's face in calculating I and the facial image reference set.| R| represents facial image reference set size.

Each is taken turns in the iteration, at first candidate face image collection face images is calculated its corresponding D value, selects that facial image of D value minimum, he is removed from the candidate face image collection and adds the facial image reference set.After taking turns iteration through 100, choose altogether 100 candidate face image collection people faces and added the facial image reference set, and these 100 people's faces are to sort according to the sequencing of selecting in the facial image reference set, the facial image reference set is as final result for retrieval, be 100 facial images that retrieve in the database, these 100 images sort from big to small by database images and image similarity to be retrieved.Can revise the final size of facial image reference set according to the real needs of return results number in the reality.

Embodiment 2

Fig. 6 is embodiment 2 retrieval synoptic diagram, and the character image source is public LFW database among the figure.1 is original facial image among the figure, frame table among the figure face piecemeal of leting others have a look at, the feature that 2 expressions are extracted comprises that local feature and the overall situation are extraordinary, the dictionary of 3 expression training, the 4th, the preliminary search result who obtains according to dictionary, i.e. candidate face image collection, the 5th, facial image reference set, according to the facial image reference set, the result reorders to preliminary search, obtains 6, i.e. final result for retrieval.People's face in people's face and 1 in 6 is same person, and expression is retrieved successfully.

The invention provides a kind of extensive Research on face image retrieval; method and the approach of this technical scheme of specific implementation are a lot; the above only is preferred implementation of the present invention; should be understood that; for those skilled in the art; under the prerequisite that does not break away from the principle of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.In the present embodiment not clear and definite each ingredient all available prior art realized.

Claims

1. an extensive Research on face image retrieval is characterized in that, comprises following steps:

Step 5 is set up inverted index to the face images in the face database;

2. a kind of extensive face retrieval method according to claim 1 is characterized in that step 1 specifically comprises the steps:

3. a kind of extensive face retrieval method according to claim 2 is characterized in that, the feature of extracting each localized mass in the step 2 comprises local binary and two kinds of features of yardstick invariant features conversion.

4. a kind of extensive face retrieval method according to claim 3 is characterized in that, step 3 is extracted overall geometric properties and comprised that specifically the space geometry between the key point concerns, comprises distance feature, angle character and curvature feature.

5. a kind of extensive face retrieval method according to claim 4 is characterized in that, utilizes training set to obtain the local feature dictionary in the step 4 and the global characteristics dictionary comprises the steps:

It is as follows to utilize training set to generate local feature dictionary process: to facial images all in the training set carry out step 1 ~ 2, and local binary, the yardstick invariant features of training set face images are changed two kinds of features be included into separately a set, each set is called the local feature set; With the local feature of same age bracket, same sex and same numbering in the set of each local feature as one group, every group is utilized respectively the training of sparse coding model, obtain a local feature dictionary, this dictionary is comprised of a stack features vector, a local feature in the corresponding group of each proper vector is called base; Any one proper vector all is reconstructed by the linear combination of base, and the coefficient vector of linear combination is that the sparse coding of proper vector represents;

It is as follows to utilize training set to generate global characteristics dictionary process: facial images all in the training set is carried out step 1 and step 3, as one group, each group is considered as a global characteristics dictionary with the overall geometric properties of same age bracket, same sex in the training set.

6. a kind of extensive face retrieval method according to claim 5 is characterized in that, in the step 4 facial image local feature and overall geometric properties is quantified as the vision word and comprises the steps:

To everyone face image local feature F _l, be combined into line reconstruction according to the baseline of the facial image local feature dictionary of same characteristic features kind, same age section, identical sex and identical numbering, obtain base corresponding to nonzero element in the coefficient vector; The name of the facial image of base is as facial image local feature F _lThe Name ID of corresponding vision word, age bracket is as facial image local feature F _lThe Age ID of corresponding vision word, sex is as facial image local feature F _lThe Gender ID of corresponding vision word, numbering is as facial image local feature F _lThe Position ID of corresponding vision word;

7. a kind of extensive face retrieval method according to claim 6, it is characterized in that, step 5 comprises the steps: all unduplicated vision words are formed a vision set of letters, each vision word is as an index entry in the vision set of letters, and each index entry and a facial image set that comprises this vision word are bound; The facial image that is quantized into the vision word in the face database is related with the vision word in the vision set of letters, form inverted index, after being about to the facial image quantification in the face database, facial image is joined in the index entry tabulation of its corresponding vision word.

8. a kind of extensive face retrieval method according to claim 7 is characterized in that step 6 comprises the steps:

9. a kind of extensive face retrieval method according to claim 8 is characterized in that step 7 comprises the steps:

Each takes turns iteration, each facial image in the candidate face image collection is calculated the overall distance D of facial image in itself and facial image to be retrieved and the facial image reference set, the facial image that overall distance D value is minimum is removed from the candidate face image collection, and add the facial image reference set, overall distance D adopts following formula to calculate:

D = - β \cdot sign (f (Q, I)) + d (Q, I) + α \cdot \frac{1}{| R |} \underset{i}{Σ} d (R_{i}, I),

Wherein, Q is facial image to be retrieved, and I is any one facial image in the candidate face image collection, the distance of facial image I and facial image Q to be retrieved in-β sign (f (Q, I))+d (Q, I) expression candidate face image collection,

The distance of facial image in facial image I and the facial image reference set in the expression candidate face image collection, overall distance D is both sums; α, β are scale-up factor, and the span of α, β is 0 ~ 1, sign (f (Q, I)) be the semantic classification item, wherein f (Q, I) expression is carried out semantic classification to the facial image I in facial image Q to be retrieved and the candidate face image collection, judge whether it belongs to same class, f is classification function, and sign gets sign function, d (Q, I) distance of image I in facial image Q to be retrieved and the candidate face image collection is calculated in expression The mean distance of facial image I and facial image reference set in the set of expression calculated candidate facial image, | R| represents the size of facial image reference set, R _iIn the expression current iteration in the facial image reference set i open facial image.