CN115273202A

CN115273202A - Face comparison method, system, equipment and storage medium

Info

Publication number: CN115273202A
Application number: CN202210949819.7A
Authority: CN
Inventors: 韦涛; 杜欢; 梁勇; 吴康杰; 李鹏
Original assignee: China Asean Information Harbor Co ltd
Current assignee: China Asean Information Harbor Co ltd
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-11-01

Abstract

The invention discloses a face comparison method, a system, equipment and a storage medium, which belong to the technical field of computer vision and solve the technical problems of poor stability and slow comparison speed of the existing face comparison method, and the method comprises the following steps: constructing a lightweight face detection module based on a convolutional neural network, and detecting a face image to obtain a series of face candidate frames; decoding the position information of the face candidate frame, and converting the position information into face candidate frame information on the original image; screening a plurality of personal face candidate frames as detection results according to the scores of the face candidate frames for face prediction, and cutting out face parts from an original image according to the detection results to be used as input images; constructing a face feature extraction module based on a convolutional neural network, inputting an input image into the face feature extraction module, and obtaining a series of feature values of face information quantization; and calculating the similarity of the two faces according to the characteristic values of the two faces, and judging whether the two faces are the same person or not.

Description

Face comparison method, system, equipment and storage medium

Technical Field

The present invention relates to the field of computer vision technology, and more particularly, to a method, system, device and storage medium for comparing human faces.

Background

The face comparison technology is also called face verification technology, namely, whether the faces in the two images are the same person or not is judged, and the face comparison technology is widely applied to the fields of national security, military security, public security, civil affairs, economy and the like at present, for example, an entrance guard system for face brushing in and out, a financial system for face brushing payment and the like, and has very important research value and significance.

In the prior art, an LBP operator is used for extracting face image features to obtain an LBP coded image of the whole image, the LBP coded image is divided into a plurality of regions and corresponding LBP coded histograms are obtained, so that the LBP coded histograms of the whole image are obtained, when face verification is performed, feature distances of two compared images are calculated by using an image similarity calculation function based on the histograms, and if the feature distances are larger than a set threshold value, the images are regarded as faces of the same person. The method has the advantage of reducing errors caused by incomplete alignment of the face region within a certain range.

The human face comparison method has a plurality of defects, such as the influence of fuzzy images, side faces, light reflection and shielding on the human face identification process, low stability, meanwhile, the method has a low comparison speed in practical application, cannot meet the requirement of real-time performance in certain scenes, and has certain limitation.

Disclosure of Invention

The technical problem to be solved by the present invention is to solve the above-mentioned deficiencies of the prior art, and an object of the present invention is to provide a human face comparison method with good stability and high comparison speed.

The invention aims to provide a human face comparison system with good stability and high comparison speed.

The invention also provides a computer device.

The fourth object of the present invention is to provide a computer-readable storage medium.

In order to achieve the above object, the present invention provides a face comparison method, including:

s1, constructing a lightweight face detection module based on a convolutional neural network, and detecting a face image to determine the position information of a face to obtain a series of face candidate frames;

s2, decoding the position information of the face candidate frames, and converting each face candidate frame into face candidate frame information on the original image;

s3, screening a plurality of face candidate frames as detection results according to score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to serve as an input image of a subsequent face feature extraction module;

s4, constructing a face feature extraction module based on a convolutional neural network, inputting the input image into the face feature extraction module, and obtaining a series of feature values of face information quantization;

and S5, calculating the similarity of the two faces according to the feature values of the two faces, and judging whether the two faces are the same person or not according to the similarity.

As a further improvement, in step S1, the process of constructing the face detection module is as follows:

s11, constructing an input layer, and adjusting the size of the image to be 500 multiplied by 3 by the input layer in order to meet the requirement of fixed-size input of the convolutional neural network;

s12, constructing a lightweight convolution submodule which mainly comprises convolution kernels of two scales, namely a 3 x 3 convolution kernel and a 1 x 1 convolution kernel, wherein the number of the 3 x 3 convolution kernels is 64, and the number of the 1 x 1 convolution kernels is 32; the connection mode of convolution layers in the module is that a 1 multiplied by 1 convolution kernel is connected behind a 3 multiplied by 3 convolution kernel, and simultaneously, a nonlinear activation operation is connected behind the 1 multiplied by 1 convolution kernel;

step S13, connecting 3 convolution sub-modules after the input layer, flattening the feature graph output by the convolution module through a Flatten layer, integrating the information of the convolution layer through two full-connection layers, and outputting the information of 5 multiplied by 2 regression frames through a Reshape layer (b) _x ,b _y ,b _w ,b _h Confidence, score), wherein (b) _x ,b _y ) As coordinates of the center point of the regression box, b _w Is the width of the regression box, b _h As for the height of the regression frame, confidence is the confidence score of the regression frame, acore is the score of the regression frame including the human face, 5 × 5 × 2 indicates that the original image is divided into 5 × 5 regions, and the model predicts the position information of 2 regression frames in each region.

Further, in step S2, the position information of the regression frame is decoded, and for each regression frame, it is converted into the face candidate frame information on the original image using the following formula:

wherein (t) _x ,t _y ) For candidate frames of faces on the original drawingThe coordinates of the center point of (a); t is t _w And t _h Respectively width and height; w represents the original image width, h represents the original image height; s represents dividing the original image into S multiplied by S areas, wherein S is 5; x is the number of _offset Abscissa, y, representing the area to which the regression box belongs _offset And represents the ordinate of the region to which the regression box belongs.

Further, in step S3, the scores score of the face prediction is arranged from large to small according to each face candidate frame, and the top m candidate frames with larger scores are selected; and then, carrying out secondary screening on the m candidate frames by using a non-maximum value inhibition algorithm according to the confidence of the regression frame, and selecting the n screened candidate frames as the result of the face detection, wherein m and n are set values.

Further, if the number of candidate frames after the secondary screening is less than n, all the candidate frames after the secondary screening are used as the detection result.

Further, in step S4, the process of constructing the face feature extraction module is as follows:

s41, constructing an input layer, and preprocessing an input image, wherein the preprocessing process mainly adjusts the input image to be in a uniform size;

s42, connecting 3 lightweight convolution sub-modules behind an input layer, wherein the function is to accelerate the process of feature extraction and increase the nonlinear expression capability of a network, so that the human face features are better extracted;

s43, constructing an inclusion module, wherein the module is positioned behind the lightweight convolution sub-module and consists of convolution layers of 3 scales and a maximum pooling layer, convolution kernels of 3 scales are respectively 1 × 1, 3 × 3 and 5 × 5,3 and 1 pooling layer and are connected in parallel, and the outputs of 4 network layers are spliced together at the last of the module to serve as the output of the inclusion module;

s44, constructing a residual error module, wherein the residual error module is positioned behind the inclusion module and comprises two convolution layers, the sizes of convolution kernels are 3 multiplied by 3, the input feature maps of the module are convolved by the two convolution layers, then the feature maps obtained by convolution and the input feature maps are added bit by bit, and the new feature maps obtained by addition are used as the output of the residual error module;

and S45, connecting two full-connection layers behind the residual module, integrating the information of the convolutional layers, and finally outputting the information through the full-connection layers of k neurons, wherein the k neurons are also k-dimensional feature vectors extracted from the human face by the module.

Further, in step S5, the k-dimensional feature vector may be mapped to a feature point of a k-dimensional feature space, and the feature vectors of two faces may be mapped to two feature points, where the distance between the two feature points represents the similarity degree of the two faces, and a closer distance indicates that the two faces are more similar; the cosine distance is used as the similarity distance of the two feature points, and the specific calculation process is as follows:

calculating two face feature vectors (x) ₁ ,x ₂ ,...,x _k ) And (y) ₁ ,y ₂ ,...,y _k ) The dot product of (a) is shown by the following formula:

dot _XY ＝(x ₁ ,x ₂ ,...,x _k )×(y ₁ ,y ₂ ,...,y _k ) ^T

two norms of two face feature vectors are respectively calculated, and the two norms are shown as the following formula:

the cosine distance of the two eigenvectors is calculated from the dot product and the norm as shown in the following formula:

the cosine distance is the similarity distance of the two faces, and if the distance is greater than a threshold value, the two faces are considered as the same person; otherwise the two faces are considered not to be the same person.

In order to achieve the second objective, the present invention provides a face comparison system, which includes:

the face detection module is used for detecting a face image to determine the position information of a face so as to obtain a series of face candidate frames;

the decoding module is used for decoding the position information of the face candidate frames and converting each face candidate frame into face candidate frame information on the original image;

the screening module is used for screening a plurality of face candidate frames as detection results according to the score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to be used as an input image of the subsequent face feature extraction module;

the human face feature extraction module is used for extracting features of the input image to obtain a series of characteristic values of human face information quantization;

and the similarity comparison module is used for calculating the similarity of the two faces according to the characteristic values of the two faces and judging whether the two faces are the same person or not according to the similarity.

In order to achieve the third objective, the invention provides a computer device, comprising a memory, a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program to implement the above-mentioned face comparison method.

In order to achieve the fourth object, the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing a face comparison method as described above when executed by a processor.

Advantageous effects

Compared with the prior art, the invention has the advantages that:

the invention realizes the processes of face detection, face feature extraction and face comparison by constructing a series of modules based on the convolutional neural network, compared with other existing face comparison technologies, the invention uses the convolutional neural network to extract the face features, thereby improving the quality of face feature extraction, and simultaneously, a series of improvements are carried out on the network structure, thereby improving the speed of feature extraction and further improving the efficiency of face comparison.

Drawings

FIG. 1 is a general flow diagram of the present invention;

fig. 2 is a flowchart of comparing two faces in practical application of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments shown in the drawings.

Referring to fig. 1 and 2, a face comparison method includes:

s3, screening a plurality of face candidate frames as detection results according to scores score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to serve as an input image of a subsequent face feature extraction module;

s4, constructing a face feature extraction module based on a convolutional neural network, inputting an input image into the face feature extraction module, and obtaining a series of feature values of face information quantization;

and S5, calculating the similarity of the two faces according to the characteristic values of the two faces, and judging whether the two faces are the same person or not according to the similarity.

In step S1, the process of constructing the face detection module is as follows:

s12, constructing a lightweight convolution submodule which mainly comprises convolution kernels of two scales, namely a convolution kernel of 3 x 3 and a convolution kernel of 1 x 1, wherein the number of the convolution kernels of 3 x 3 is 64, and the number of the convolution kernels of 1 x 1 is 32; the connection mode of the convolution layer in the module is that a 1 multiplied by 1 convolution kernel is connected behind a 3 multiplied by 3 convolution kernel, and simultaneously, a nonlinear activation operation is connected behind the 1 multiplied by 1 convolution kernel; the advantage of this is that the 1 × 1 convolution kernel does not change the size of its input feature map (without losing the resolution of the feature map), and at the same time, the width of the input feature map can be reduced, and in addition, a nonlinear activation operation is added after the 1 × 1 convolution kernel, so that the nonlinear expression capability of the network can be increased;

step S13, connecting 3 convolution submodules after the input layer, flattening the feature graph output by the convolution module through a Flatten layer, integrating the information of the convolution layers through two full-connection layers, and finally outputting the information of 5 multiplied by 2 regression frames through a Reshape layer (b) _x ,b _y ,b _w ,b _h Confidence, score), wherein (b) _x ,b _y ) As coordinates of the center point of the regression box, b _w Is the width of the regression box, b _h As for the height of the regression frame, confidence is the confidence score of the regression frame, score is the score of the regression frame including the human face, 5 × 5 × 2 indicates that the original image is divided into 5 × 5 regions, and the model predicts the position information of 2 regression frames in each region.

In step S2, the regression frame information output by the face detection basic module is a series of face candidate frames proposed by the model preliminary prediction, and the predicted position information is a normalized value, so to obtain the actual face position information, it is necessary to decode the position information of these candidate frames and perform further screening processing on these candidate frames. Decoding the position information of the regression frame, and converting the position information of the regression frame into the face candidate frame information on the original image by using the following formula for each regression frame:

wherein (t) _x ,t _y ) Coordinates of a central point of a face candidate frame on the original image are obtained; t is t _w And t _h Respectively width and height; w represents the original image width, h represents the original image height; s represents that the original image is divided into S multiplied by S areas, the original image is divided into 5 multiplied by 5 areas by the invention, and therefore S is 5; x is the number of _offset The abscissa, y, representing the area to which the regression box belongs _offset And represents the ordinate of the region to which the regression box belongs.

In step S3, firstly, the scores score of the face prediction is arranged from large to small according to each face candidate frame, and the first m candidate frames with larger scores are selected; and then, carrying out secondary screening on the m candidate frames by using a non-maximum value inhibition algorithm according to the confidence of the regression frame, and selecting the n screened candidate frames as the result of the face detection, wherein m and n are set values. And if the candidate frames after the secondary screening are less than n, taking all the candidate frames after the secondary screening as detection results.

In step S4, comparing whether the two faces are consistent, quantizing the two face information into a series of feature values, and comparing the similarity of the two feature values to determine whether the two faces are the same person, so that a face feature extraction module is constructed to extract face features based on the convolutional neural network. The process of constructing the face feature extraction module is as follows:

s41, constructing an input layer, and preprocessing an input image, wherein the preprocessing process mainly adjusts the input image into a uniform size; because the face partial images output by the face detection module are different in size, and the feature extraction module comprises a full connection layer, the face partial images need to be adjusted to be uniform in size;

s43, constructing an Incep module, wherein the Incep module is positioned behind the lightweight convolution sub-module and consists of convolution layers with 3 scales and a maximum pooling layer, convolution kernels with 3 scales are respectively 1 × 1, 3 × 3 and 5 × 5,3 convolution layers and 1 pooling layer and are connected in parallel, and the outputs of 4 network layers are spliced together at the last of the module to serve as the output of the Incep module; the module has the advantages that the characteristics of the human face can be extracted in multiple scales, and the extracted characteristics of the human face are ensured to be rich enough, so that the comparison accuracy is improved;

s44, constructing a residual error module, wherein the residual error module is positioned behind the inclusion module and comprises two convolution layers, the sizes of convolution kernels are 3 multiplied by 3, the input feature maps of the module are convolved by the two convolution layers, then the feature maps obtained by convolution and the input feature maps are added bit by bit, and the new feature maps obtained by addition are used as the output of the residual error module; the module is introduced to enable the model to be more easily converged during training, and meanwhile, the fitting capacity of the model can be increased;

In step S5, the k-dimensional feature vector may be mapped to a feature point of a k-dimensional feature space, and the feature vectors of two faces may be mapped to two feature points, where the distance between the two feature points represents the similarity of the two faces, and the closer the distance is, the more similar the two faces are, the problem of face comparison is converted into the problem of calculating the similarity distance between the face feature points; the cosine distance is used as the similarity distance of the two feature points, and the specific calculation process is as follows:

dot _XY ＝(x ₁ ,x ₂ ,...,x _k )×(y ₁ ,y ₂ ,...,y _k ) ^T

the cosine distance is the similarity distance of two faces, and if the distance is greater than a threshold value, the two faces are considered as the same person; otherwise the two faces are considered not to be the same person.

A face alignment system, comprising:

the face detection module is used for detecting a face image to determine the position information of a face to obtain a series of face candidate frames;

the screening module is used for screening a plurality of face candidate frames as detection results according to the score sxre of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to be used as an input image of the subsequent face feature extraction module;

A computer device comprising a memory, a processor; the memory is used for storing a computer program; the processor is used for executing the computer program to realize the human face comparison method.

A computer-readable storage medium, on which a computer program is stored, the computer program being adapted to implement a face comparison method as described above when executed by a processor.

Practical application

In the following, the face comparison process of analyzing two images by taking the certificate photo and the spot photo retained when the operator website verifier transacts the service as an example, the certificate photo usually only contains 1 face, and the spot photo may contain a plurality of faces.

1. The sizes of the identification photo and the field photo are uniformly adjusted to be 500 multiplied by 3;

2. detecting face of the certificate photo, inputting the certificate photo into the face detection module, outputting 50 regression frames from the face detection module, and comparing the position information (b) of the regression frames _s ,b _y ,b _w ,b _h ) Decoding is performed using the following equation:

for this example, the taking of SThe value 5,w is 500 and the value h is 500. According to (t) _x ,t _y ,t _w ,t _h ) Obtaining the positions of candidate frames on a series of decoded original images and confidence scores confidence and face prediction scores score corresponding to the positions;

3. screening the face candidate frames of the certificate photo, arranging the candidate frames from large to small according to the score of each candidate frame, selecting the first 10 candidate frames with larger scores, then carrying out secondary screening on the 10 candidate frames by using a non-maximum suppression algorithm according to the confidence, and selecting 1 screened candidate frame as the result of face detection of the certificate photo;

4. performing face detection on the scene shot, inputting the scene shot into the face detection basic module, outputting the information of 50 regression frames from the face detection basic module, and performing position detection on the regression frames (b) _x ,b _y ,b _w ,b _h ) Decoding is performed using the following equation:

for this embodiment, S is 5,w, 500, and h is 500. According to (t) _x ,t _y ,t _w ,t _h ) Obtaining the positions of candidate frames on a series of decoded original images and confidence scores confidence and face prediction scores score corresponding to the positions;

5. screening the scene photo face candidate frames, arranging the candidate frames from large to small according to the score of each candidate frame, selecting the first 20 candidate frames with larger scores, then performing secondary screening on the 20 candidate frames by using a non-maximum suppression algorithm according to the confidence, selecting the 5 screened candidate frames as the scene photo face detection result, and if the number of the screened candidate frames is less than 5, taking the rest candidate frames as the scene photo face detection result;

6. for the face detection result of the identification photo extracted in the step 3, image preprocessing is firstly carried out on the face detection result, the size of the face detection result is adjusted to 224 multiplied by 3, then the face detection result is input into a face feature extraction module, and after a series of convolution operations of the module, 160-dimensional feature vectors (x) are output through a full connection layer of 160 neurons ₁ ,x ₂ ,...,x ₁₆₀ )；

7. Respectively preprocessing the images of the (possibly multiple) on-site shot face detection results extracted in the step 5, uniformly adjusting the sizes to 224 multiplied by 3, respectively inputting the results into a face feature extraction module, and outputting 160-dimensional feature vectors (i) through a full-connection layer of 160 neurons after a series of convolution operations of the modules ₁ ,i ₂ ,...,i ₁₆₀ )、(j ₁ ,j ₂ ,...,j ₁₆₀ )、(k ₁ ,k ₂ ,...,k ₁₆₀ ) .., putting the vectors into a field photo face feature vector set;

8. taking out a feature vector from the field photo face feature vector set without replacing the feature vector, calculating the similarity distance between the feature vector and the identification photo face feature vector through a cosine distance formula, namely the similarity degree between the field photo face and the identification photo face, and putting the result into a comparison set;

9. repeating the step 8 until all vectors in the field photo face feature vector set and the identification photo face feature vector calculate similarity distance, namely all faces in the field photo are compared with the identification photo face;

10. taking the value with the maximum similarity from the comparison set as a face comparison result of the field photo and the certificate photo, comparing the value with a set face comparison threshold value threshold, and if the value is greater than the threshold value, determining that the field photo and the certificate photo pass the face comparison; otherwise, the comparison of the two faces is not passed.

The above is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that several variations and modifications can be made without departing from the structure of the present invention, which will not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. A face comparison method is characterized by comprising the following steps:

2. The method according to claim 1, wherein in step S1, the process of constructing the face detection module is as follows:

step S11, constructing an input layer, and adjusting the size of an image to be 500 multiplied by 3 by the input layer in order to meet the requirement of fixed-size input of the convolutional neural network;

s12, constructing a lightweight convolution submodule which mainly comprises convolution kernels of two scales, namely a 3 x 3 convolution kernel and a 1 x 1 convolution kernel, wherein the number of the 3 x 3 convolution kernels is 64, and the number of the 1 x 1 convolution kernels is 32; the connection mode of the convolution layer in the module is that a 1 multiplied by 1 convolution kernel is connected behind a 3 multiplied by 3 convolution kernel, and simultaneously, a nonlinear activation operation is connected behind the 1 multiplied by 1 convolution kernel;

step S13, connecting 3 convolution submodules after the input layer, flattening the feature graph output by the convolution module through a Flatten layer, integrating the information of the convolution layers through two full-connection layers, and finally outputting the information of 5 multiplied by 2 regression frames through a Reshape layer (b) _x ,b _y ,b _w ,b _h Confirm, score), wherein (b) _x ,b _y ) As coordinates of the center point of the regression box, b _w Is the width of the regression box, b _h As for the height of the regression frame, confidence is the confidence score of the regression frame, acore is the score of the regression frame including the human face, 5 × 5 × 2 indicates that the original image is divided into 5 × 5 regions, and the model predicts the position information of 2 regression frames in each region.

3. The method of claim 2, wherein in step S2, the position information of the regression frames is decoded, and for each regression frame, the regression frame is converted into the face candidate frame information on the original image by using the following formula:

wherein (t) _x ,t _y ) Coordinates of a central point of a face candidate frame on the original image are obtained; t is t _w And t _h Respectively width and height; w represents the original image width, h represents the original image height; s represents dividing the original image into S multiplied by S areas, wherein S is 5; x is the number of _offset The abscissa, y, representing the area to which the regression box belongs _offset And represents the ordinate of the region to which the regression box belongs.

4. A face comparison method as claimed in claim 2, wherein in step S3, the scores score of face prediction are arranged from large to small according to each face candidate frame, and the first m candidate frames with larger scores are selected; and then, carrying out secondary screening on the m candidate frames by using a non-maximum value inhibition algorithm according to the confidence of the regression frame, and selecting the n screened candidate frames as the result of the face detection, wherein m and n are set values.

5. The method according to claim 4, wherein if the number of the candidate frames after the secondary screening is less than n, all the candidate frames after the secondary screening are used as the detection result.

6. The method of claim 1, wherein in step S4, the process of constructing the face feature extraction module is as follows:

s41, constructing an input layer, and preprocessing an input image, wherein the preprocessing process mainly adjusts the input image into a uniform size;

s43, constructing an Incep module, wherein the Incep module is positioned behind the lightweight convolution sub-module and consists of convolution layers with 3 scales and a maximum pooling layer, convolution kernels with 3 scales are respectively 1 × 1, 3 × 3 and 5 × 5,3 convolution layers and 1 pooling layer and are connected in parallel, and the outputs of 4 network layers are spliced together at the last of the module to serve as the output of the Incep module;

7. The method according to claim 6, wherein in step S5, the k-dimensional feature vector can be mapped to a feature point in a k-dimensional feature space, and the feature vectors of two faces can be mapped to two feature points, the distance between the two feature points represents the similarity of the two faces, and the closer the distance is, the more similar the two faces are; the cosine distance is used as the similarity distance of the two feature points, and the specific calculation process is as follows:

dot _XY ＝(x ₁ ,x ₂ ,...,x _k )×(y ₁ ,y ₂ ,...,y _k ) ^T

calculating the cosine distance of the two eigenvectors according to the dot product and the norm, as shown in the following formula:

8. A face comparison system, comprising:

the screening module is used for screening out a plurality of face candidate frames as detection results according to the score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to be used as an input image of the subsequent face feature extraction module;

9. A computer device comprising a memory, a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program to implement a face comparison method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is configured to implement a face comparison method according to any one of claims 1 to 7.