CN107577990B - Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval - Google Patents

Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval Download PDF

Info

Publication number
CN107577990B
CN107577990B CN201710675398.2A CN201710675398A CN107577990B CN 107577990 B CN107577990 B CN 107577990B CN 201710675398 A CN201710675398 A CN 201710675398A CN 107577990 B CN107577990 B CN 107577990B
Authority
CN
China
Prior art keywords
face
real
hash
gpu
valued
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710675398.2A
Other languages
Chinese (zh)
Other versions
CN107577990A (en
Inventor
邹复好
曹锋
李开
王浩
白兴强
栾朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN SHIJI JINQIAO SAFETY TECHNOLOGY Co Ltd
Huazhong University of Science and Technology
Original Assignee
WUHAN SHIJI JINQIAO SAFETY TECHNOLOGY Co Ltd
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN SHIJI JINQIAO SAFETY TECHNOLOGY Co Ltd, Huazhong University of Science and Technology filed Critical WUHAN SHIJI JINQIAO SAFETY TECHNOLOGY Co Ltd
Priority to CN201710675398.2A priority Critical patent/CN107577990B/en
Publication of CN107577990A publication Critical patent/CN107577990A/en
Application granted granted Critical
Publication of CN107577990B publication Critical patent/CN107577990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval, which relates to the field of computer vision and comprises the steps of face detection and alignment, face feature extraction, Hash feature acquisition, face index database establishment, multi-GPU accelerated rough matching, Hash-based candidate set acquisition, accurate matching based on distance measurement, voting to acquire a best-matched person and the like. The large-scale face recognition method based on GPU accelerated retrieval is based on two-stage feature matching of Hash index and multi-GPU accelerated calculation, can accelerate screening of candidate feature vectors by utilizing strong parallel computing capability of the GPU, greatly reduces retrieval time consumption on a large-scale data set, and can well meet various application requirements which are based on realization of a deep convolutional neural network and have high real-time requirements.

Description

Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval
Technical Field
The invention relates to the field of computer vision, in particular to a large-scale face recognition method based on GPU (graphics processing unit) accelerated retrieval.
Background
In recent years, with the rapid improvement of computer performance and the continuous improvement of deep learning methods, the fields of pattern recognition and artificial intelligence have made significant breakthroughs. People have excellent effects on many pattern recognition tasks through a deep learning method, and face recognition is no exception. With the advent of the big data era, the face image data is more and more abundant, and how to efficiently and accurately identify the identity information of a person in a large-scale face data set is a hotspot of research in the field of pattern recognition and information retrieval at present.
The human face recognition is an important means for identity recognition, and has extremely high theoretical and application values. The image retrieval based on the human face is a very meaningful direction in the field of information retrieval, and has very wide application. For example, in the entertainment field, the most similar star face can be found by submitting own images; in the public security field, criminals can be searched for through face comparison and retrieval; in the security field, the method can also relate to the applications of an access control system, blacklist monitoring, water visitor identification and the like; in addition, the method has great application requirements in the fields of self-service of banks, people and certificates unification of hotels, information security and the like. Therefore, the research and development of the face classifier which has both recognition efficiency and accuracy under a large-scale data environment has high practical significance.
The traditional face retrieval method is that face features are extracted manually, then face feature library is searched based on nearest neighbor, and search based on face images is converted into similarity measurement based on real-value feature vectors. This method works well on small-scale datasets, but once the dataset grows, the efficiency and accuracy of recognition drops dramatically. In addition, the feature vector of the face is usually a high-dimensional feature vector, and under the condition of relatively high dimension, if we also perform nearest neighbor search on the whole database, the efficiency is very low.
The face recognition under the condition of large-scale data is essentially a retrieval problem of multimedia data, and a plurality of data which are the closest to the face to be recognized in a feature space are returned, namely an approximate nearest neighbor search algorithm. In the field of image approximate search, two implementation methods can be used, one is to directly perform similarity search in a high-dimensional feature space, and the other is to map the high-dimensional space to a Hamming space and convert the high-dimensional space into a retrieval problem based on a semantic hash method. The former has a significant disadvantage in the case of large data volumes, namely the "dimensional disaster" problem. In order to solve the problem of dimension disaster, students make a plurality of researches on semantic hash methods, the semantic hash methods can generate very compact hash codes to directly reflect semantic information of an original feature space, and the hamming distances are close if the features are close to each other in the original feature space, otherwise, the hamming distances are far. However, most of the approximate search research works based on the semantic hash method are research on the hash generation method, and there are few researches on designing the hash index to improve the retrieval efficiency. Therefore, how to speed up the approximate search of the hash method and reduce the time consumption is a valuable research direction in large-scale hash data.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a large-scale face recognition method based on GPU (graphics processing unit) accelerated retrieval, which can accelerate the screening of candidate feature vectors by utilizing the powerful parallel computing capability of the GPU, greatly reduce the retrieval time consumption on a large-scale data set and well meet various application requirements based on the realization of a deep convolutional neural network and higher in real-time requirement.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
a large-scale face recognition method based on GPU accelerated retrieval comprises the following steps:
s1, inputting the picture to be detected into an MTCNN network, detecting the position of the face and the position of key points in the picture by adopting a face detection algorithm, and aligning the detected face;
s2, extracting real-valued feature vectors of the face picture and the picture mirror image processed in the step S1 by using a trained deep learning model, and then fusing the two real-valued feature vectors and reducing dimensions to obtain real-valued features of the face;
s3, converting the real-value features into hash features;
s4, repeating the steps S1-S3 to detect the faces to be detected one by one, using the hash features as indexes and real-value feature vectors as values, and establishing a key value type face database;
s5, using a multi-GPU accelerated hash search algorithm to obtain k hash features which are adjacent to the hash features of the picture to be detected;
s6, using the k Hash features obtained in S5 as indexes, searching in the face database to obtain a candidate set consisting of k real-value feature vectors;
s7, calculating the vector similarity measurement distance between the real-valued feature vector of the photo to be inquired and the real-valued feature vector in the candidate set;
s8, voting to obtain the score of each photo to be queried according to the vector similarity measurement distance between the real-value feature vector in the candidate set and the real-value feature vector of the photo to be queried, and taking the highest score as a face recognition result;
in step S5, obtaining the hash features of the k neighbors by using the multiple GPU accelerated hash lookup algorithm specifically includes the following steps:
s501, dividing a data set consisting of all N hash features into M parts according to the set GPU number M, wherein each part does not exceed SUBN (N + M-1)/M hash features, and copying all the hash features of the divided subdata set from a host to a corresponding device end;
s502, setting a calculation thread number for each GPU, and calculating Hamming distances between the hash code to be inquired and all hash codes in a data set in parallel;
s503, dividing all Hamming distances into SUBN/K groups according to K adjacent data as a group, executing in parallel, merging and sorting the Hamming distances of all SUBN/K groups into order in the group;
s504, sequentially finding out the K data with the minimum hamming distance in the array subscripts [ nK, (n +1) K) and [ (n +1) K, (n +2) K), and moving it to the position of [ nK, (n +1) K), where n ═ 0, 2, 4 …, each time comparing the maximum value in the array subscript [ nK, (n +1) K) with the minimum value in the array subscripts [ (n +1) K, (n +2) K), and exchanging the two values if the maximum value in [ nK, (n +1) K) is larger;
s505, repeating the steps S503 and S504, respectively sequencing [0, K ], [2K, 3K ], [4K, 5K) …, grouping the 2K arrays of [0, K) and [2K, 3K ]), finding out the first K minimum Hamming distances, moving the Hamming distances to [0, K ], and so on until the first K Hamming distances of the data packets on all M GPUs are stored at the positions of [0, K);
s506, copying the calculation results of the M GPUs to a memory of the host end, and traversing the M data by the maximum heap sorting for all the M data by the maximum heap sorting to obtain the minimum K Hamming distances in the M groups of data.
On the basis of the above technical solution, the step S1 specifically includes the following steps:
s101, inputting a picture to be detected into an MTCNN (multiple terminal connected network), generating a face candidate set window and regression position coordinates thereof by using a first CNN, and combining the face windows with high overlapping degree by using a non-maximum suppression algorithm to generate a face window candidate set;
s102, sending the result obtained in the step S101 to a second CNN which is more complex than the first CNN, and performing re-filtering and fine adjustment on the position of the face window;
s103, sending the result obtained in the step S102 to a third CNN which is more complex than the second CNN for fine adjustment, and generating the position of a final face window in each picture to be detected and coordinates of five face key points;
s104, judging whether a final face window is generated in the step S103, and if not, finishing the identification; if so, extracting a final face window image, correcting and aligning the face to the center, and storing the corrected face window image as the specified resolution.
On the basis of the technical scheme, a face feature extraction network is used for extracting a face photo and a real-value feature vector of a photo mirror image;
the face feature extraction network is a 32-layer deep convolutional neural network and comprises a convolutional layer, a down-sampling layer, a PRelu activation layer, a full connection layer and a loss function layer.
On the basis of the technical scheme, the loss function layer comprises two loss functions of softmax-loss and center-loss, and the softmax-loss function is used for improving the intra-class polymerization degree of the sample in the feature space after network mapping; the center-loss function is used for increasing the inter-class distance of the sample in the feature space after network mapping.
On the basis of the above technical solution, in the step S2, the real-valued feature vector of the face photo processed in the step S1 and the real-valued feature vector of the photo image are fused according to the following formula:
fi=max(xi,yi)i=1,2,…,n
wherein xi and yi are the ith dimension of the vector x and y to be fused respectively, n is the dimension of the real-valued eigenvector, and fi is the ith dimension of the fused real-valued eigenvector.
On the basis of the above technical solution, in the step S3, the face real-valued feature obtained in the step S2 is converted into a hash feature according to the following formula:
f(x)=0.5×(sign(x)+1)
wherein the content of the first and second substances,
Figure GDA0002275392880000051
based on the above technical solution, in the step S7, the distance between the real-valued feature vector of the face to be queried obtained in the step S5 and each real-valued feature vector in the candidate set is calculated by using a Cosine metric or a euclidean metric.
Based on the above technical solution, in step S8, the voter formula used is as follows:
Figure GDA0002275392880000061
score (ID) is the final voting score of each face ID in the candidate set, sim is the distance between the real-valued feature vector of the face to be queried obtained in step S7 and the real-valued feature vector in the candidate set, and threshold is a set threshold.
Compared with the prior art, the invention has the advantages that:
(1) the large-scale face recognition method based on GPU accelerated retrieval is based on two-stage feature matching of Hash index and multi-GPU accelerated calculation, can accelerate screening of candidate feature vectors by utilizing strong parallel computing capability of the GPU, greatly reduces retrieval time consumption on a large-scale data set, and can well meet various application requirements based on realization of a deep convolutional neural network and high in real-time requirement.
(2) The large-scale face recognition method based on GPU accelerated retrieval has the advantages that the accuracy rate on an LFW face test set reaches 99.48%, the recognition rate in a MegaFace test task reaches 72.5%, and the effect of ensuring high recognition accuracy rate while greatly reducing the retrieval time consumption on a large-scale data set is realized.
(3) The large-scale face recognition method based on GPU accelerated retrieval is relatively independent in each step, can replace and adjust one step along with technical progress or actual requirements without influencing the implementation of other steps, and is good in expansibility.
Drawings
FIG. 1 is a schematic diagram of a large-scale face recognition retrieval method based on GPU acceleration according to an embodiment of the present invention;
FIG. 2 is a MTCNN frame diagram for face detection and key point location in an embodiment of the present invention;
FIG. 3 is a diagram of a network structure for extracting facial features based on deep learning according to an embodiment of the present invention;
FIG. 4 is a frame diagram of face real-valued feature extraction and fusion proposed in the embodiment of the present invention;
fig. 5 is a flowchart of a hash lookup algorithm based on GPU acceleration according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The terms used in the examples of the present invention are explained as follows:
MTCNN: multi-task conditional neural network, multitasking convolutional neural network;
CNN: a convolutional neural network;
PReLU (parametric reconstructed Linear Unit): activation function with parameters.
Referring to fig. 1, an embodiment of the present invention provides a large-scale face recognition method based on GPU accelerated retrieval, including the following steps:
s1, inputting the picture to be detected into an MTCNN network, detecting the position of the face and the position of key points in the picture by adopting a face detection algorithm, and aligning the detected face;
s2, extracting real-valued feature vectors of the face picture and the picture mirror image processed in the step S1 by using a trained deep learning model, and then fusing the two vectors and reducing dimensions to obtain the real-valued features of the face;
s3, designing a hash function to convert the human face real value features obtained in the step S2 into hash features;
s4, repeating the steps S1-S3 to detect the faces to be detected one by one, using the hash features in the step S3 as indexes and real-value feature vectors as values, and establishing a key value type face database;
s5, processing the query photo according to the steps S1-S3 to obtain hash characteristics, and obtaining the hash characteristics of k neighbors by using a multi-GPU accelerated hash search algorithm;
s6, using the hash feature obtained in S5 as an index, searching in the face database established in the step S4 to obtain a candidate set corresponding to the real-value feature vector;
s7, calculating the Hamming distance between the real-value feature vector of the photo to be inquired and the feature vector in the candidate set;
and S8, after subtracting a threshold value according to the Hamming distance between the vector in the candidate set and the vector to be inquired, voting to obtain the score of each candidate ID, and taking the highest score as a face recognition result.
The large-scale face recognition method based on GPU accelerated retrieval provided by the invention divides the query process into a rough matching stage and a fine matching stage. In the rough matching stage, a depth hashing technology is firstly utilized to generate hash features corresponding to each face image, efficient indexes are built for all the hash features in a data set, a hash search algorithm of GPU accelerated calculation is used for inquiring the Hamming distances between the hash features of the face image to be retrieved and all the hash features, faces corresponding to the first k hash features which meet the requirements of being smaller than a specific Hamming distance and have the smallest Hamming distance are used as a candidate set, and results obtained in the rough matching stage are obtained. In the accurate matching stage, corresponding real-valued features are taken out according to the Hash features in the candidate set, a proper similarity measurement method is selected, and the real-valued features in the candidate set are compared with the real-valued features of the face to be retrieved. And sending the comparison result to a voter, and finally obtaining the face ID with the highest vote score as a face recognition result.
The following steps detail the detailed process of the large-scale face recognition method based on GPU accelerated retrieval in one embodiment of the method of the invention:
step S1, inputting the picture to be detected into an MTCNN network, detecting the position of the face and the position of key points in the picture by adopting a face detection algorithm, and aligning the detected face;
the invention adopts MTCNN method as face detection and key point positioning method, the general frame diagram is shown in figure 2, the face detection and key point positioning processing comprises the following steps:
s101, inputting a picture to be detected into an MTCNN (multiple terminal connected network), generating a face candidate set window and regression position coordinates thereof by using a first CNN, and combining the face windows with high overlapping degree by using a non-maximum suppression algorithm to generate a face window candidate set;
s102, sending the result obtained in the step S101 to a second CNN which is more complex than the first CNN, and performing re-filtering and fine adjustment on the position of the face window;
s103, fine-tuning the result obtained in the step S102 through a third CNN which is more complex than the second CNN, and generating the final face window position of each picture to be detected and the coordinates of five face key points;
s104, judging whether a final face window is generated in the step S103, and if not, finishing the identification; if so, extracting a final face window image, correcting the face to the center, and storing the face as the specified resolution.
As shown in fig. 3, the MTCNN in the embodiment of the present invention processes an image in three stages: the first CNN uses a full convolution network P-Net (Proposal network) to obtain a part of a face window candidate set, wherein a bounding box regression is used for calibrating and a NMS is used for merging candidate boxes; then sending the data into a more complex second CNN, and removing more non-face areas by using a full convolution network R-Net (refine network); and finally, inputting the result into a third more complex CNN network O-Net (output network) for fine processing, and outputting a final face frame and five face key point positions.
The method designs three network structures for cascade optimization processing. Compared with a multi-classification target detection task, the human face detection task is a two-classification problem, so that the human face detection task needs fewer filters compared with the target detection task, but needs better discrimination, and a deeper network structure is designed in the O-Net to extract better semantic features. In order to achieve the real-time performance, the sizes of the designed convolution kernels are 3 × 3 and 2 × 2, so that the computation amount can be reduced greatly, and the three CNN structures are shown in fig. 3.
Inputting the picture to be detected into the MTCNN network, obtaining whether the input picture has a face, the position of a face window and the coordinates of key points of the face after the processing in the steps S101-S103, and obtaining the processed face picture required in the step S2 after the processing in the step S104.
Step S2, extracting real-valued feature vectors of the face picture and the picture mirror image processed in the step S1 by using a trained deep learning model, and then fusing the two vectors and reducing dimensions to be used as real-valued features of the face;
the human face feature extraction network designed by the invention is formed by stacking the structures of the residual blocks according to the residual network-Resnet, and a 32-layer deep convolutional neural network is designed, and comprises a convolutional layer, a down-sampling layer, a PRelu activation layer, a full connection layer and other different types of structures, complex nonlinear transformation is fitted through the combination of the structures, and the whole network structure is shown in figure 4.
The specific configuration and parameter settings of the network are shown in the following table:
Figure GDA0002275392880000101
Figure GDA0002275392880000111
the input to the network is an image with a resolution of 96 × 112 × 3, and 512-dimensional features are output. The network structure has 32 layers, Conv represents a convolutional layer, MP represents a down-sampling layer (adopting a maximum pooling method), and FC represents a full connection layer. The repetition represents the number of times the structure overlaps and the output is the output size of the feature after passing through the layer. It can be seen from the table that the more the number of parameters at the back of the network structure is, the number of parameters of the last fully-connected layer is half of the total number of parameters, and the final output feature vector is 512 dimensions. The loss function layer is arranged behind the last FC layer, and the feature extraction network used by the method simultaneously uses two loss functions of softmax-loss and center-loss to improve the cluster aggregation and the cluster distance separation and finally improve the accuracy. On the basis of softmax-loss, the Center-loss records a class Center in a feature space of each class of a training set respectively, in the training process, distance constraint between a sample and the class Center in the feature space after network mapping is increased, the aggregation degree of the mapped features in the classes is improved, and meanwhile, the distance between the classes is increased by combining softmax-loss, so that the learned features have better generalization and discrimination capability.
S4, repeating the steps S1-S3 to detect the faces to be detected one by one, using the hash features in the step S3 as indexes and real-value feature vectors as values, and establishing a key value type face database;
because the data scale processed by the method is in hundred million level, in order to search the real-valued eigenvector corresponding to the hash characteristic more quickly, Redis can be adopted to store the real-valued eigenvector, each hash index corresponds to a plurality of eigenvectors, if the hash characteristic generated by a certain eigenvector does not exist in the database, the corresponding hash index is added, otherwise, the eigenvector is added into the corresponding hash index. In order to store information about a human face, the present invention uses three tables to store corresponding information, which are hash _ set, face _ info _ hash, and person _ info _ hash. Wherein hash _ set is a set type data structure that stores all hash indices. The face _ info _ Hash and person _ info _ Hash are Hash type data structures in the Redis, and store data in the form of key-value pairs, wherein the face _ info _ Hash stores related information of each face, and the person _ info _ Hash stores information of each person, each person has a unique ID, and each person can have multiple faces.
The concrete structure of the person _ info _ hash key is as follows:
Figure GDA0002275392880000121
Figure GDA0002275392880000131
the specific structure of the face _ info _ hash key is as follows:
because both tables are key value type data structures, new information can be freely added, and each human face sheet stores the corresponding hash index in the face _ info _ hash table, the corresponding hash index is a key with a key name, and real-value feature vectors of a plurality of human faces are stored in the key with the key name, and the key name of each feature vector is formed by the id and the number of the human face as shown in the following table:
Figure GDA0002275392880000133
and step S5, performing rough matching on the real-value feature vector and the hash feature vector corresponding to the face photo to be inquired obtained in the pre-order step. In the large-scale face recognition method, the rough matching stage is to retrieve K hash features closest to the Hamming distance of the picture to be inquired. As is well known, a GPU is a graphics processor for image rendering that integrates a very large number of computational cores, commonly used for data processing and scientific computing. The strong parallel computing capability of the GPU meets the application scene of large-scale feature distance computing, namely K hash features meeting query conditions are searched in the large-scale hash features. Therefore, the invention designs a Top K Hash search algorithm based on multi-GPU acceleration. The main flow of the algorithm is shown in fig. 5, and specifically comprises the following steps:
s501, dividing a data set consisting of all N hash features into M parts according to the set GPU number M, wherein each part does not exceed SUBN (N + M-1)/M hash features, and copying all the hash features of the divided sub data set from a host to a corresponding device end;
s502, setting a calculation thread number for each GPU, and calculating Hamming distances between the hash code to be inquired and all hash codes in a data set in parallel;
s503, dividing all Hamming distances into SUBN/K groups according to K adjacent data as a group, executing in parallel, merging and sorting the Hamming distances of all SUBN/K groups into order in the group;
s504, sequentially finding out the K data with the minimum hamming distance in the array subscripts [ nK, (n +1) K) and [ (n +1) K, (n +2) K), and moving it to the position of [ nK, (n +1) K), where n ═ 0, 2, 4 …, each time comparing the maximum value in the array subscript [ nK, (n +1) K) with the minimum value in the array subscripts [ (n +1) K, (n +2) K), and exchanging the two values if the maximum value in [ nK, (n +1) K) is larger;
s505, repeating the steps S503 and S504, respectively sequencing [0, K ], [2K, 3K ], [4K, 5K) …, grouping the 2K arrays of [0, K) and [2K, 3K ]), finding out the first K minimum Hamming distances, moving the Hamming distances to [0, K ], and so on until the first K Hamming distances of the data packets on all M GPUs are stored at the positions of [0, K);
s506, copying the calculation results of the M GPUs to a memory of the host end, and traversing the M data by the maximum heap sorting for all the M data by the maximum heap sorting to obtain the minimum K Hamming distances in the M groups of data.
The algorithm aims at searching K hash features with the minimum Hamming distance from the query hash feature in the N hash features, firstly dividing all the N hash features into M parts according to the available GPU number M, wherein each part contains (N + M-1)/M hash features, respectively copying the hash features of each part from a host to a device end, and then, respectively carrying out the following operations on each GPU, setting the Block (Block) and Grid (Grid) size used for calculation of each GPU, quickly calculating the Hamming distance between the hash code to be inquired and all the hash codes in the data set by using the parallel calculation capacity of the GPU, finally selecting the first K minimum Hamming distances from all the Hamming distances and returning the Hamming distances, and then combining the results calculated by the M GPUs at the host end according to the distance merging and sorting idea to obtain the K hash features with the minimum Hamming distance from the inquired hash features. The specific flow is as described in the above steps S501 to S506.
The whole algorithm realizes the quick calculation and retrieval of the hash on the GPU by using the ideas of Merge (Merge Sort) and Bitonic Sort (Bitonic Sort). In the Hash indexing method based on GPU acceleration, each GPU needs to maintain an index structure the same as a Hamming distance list, when the Hamming distance changes, the position of the index is moved at the same time, and finally the position of the Top K Hamming distance is the position of the corresponding index.
The main video memory overhead of the algorithm is to store all Hamming distances, and the space complexity is O(N)In terms of time overhead, the method mainly comprises the calculation of Hamming distance and the overhead of bitone merging and sorting, and in the two processes, the time complexity of the Hamming distance is O(N)The latter time complexity is O(NlogN)Thus overall time complexity of O(NlogN)
After candidate hash indexes are found out by using a GPU accelerated hash lookup algorithm, face feature vectors which are correspondingly stored in all the hash indexes are obtained from a face database according to the hash indexes, and the face feature vectors form a candidate set.
In step S6, the hash feature vector obtained in step S5 is used as a key name, a key value corresponding to the key name in Redis is queried, so as to obtain a candidate feature vector, and according to the establishment process of the face database in step S4, the sub-key name of the feature vector stored in each hash index contains Id of a person corresponding to the feature vector, and according to Id and a corresponding real-valued feature vector, all hash indexes obtained in step S5 in Redis sequentially queried, and all obtained sub-key value pairs form a feature vector candidate set with a map structure, where the map structure is as follows:
Id feature vector
Id1_face_feature_0 Id1 real-valued feature of the person's photograph 1
Id2_face_feature_0 Id2 real-valued feature of the person's photograph 1
Id2_face_feature_1 Id2 real-valued feature of person's photograph 2
Step S7, calculating the Hamming distance between the real-valued eigenvector of the photo to be inquired and the eigenvector in the candidate set;
in this step, the distance between the real-valued feature vector of the face to be queried obtained in step S5 and each feature vector in the candidate set is calculated, and Cosine is used as the similarity measure in the implementation of the present invention, but the present invention is not limited to the Cosine measure, and euclidean measure may also be used. When the Cosine distance of two vectors is closer to 1, it indicates that the two vectors are more similar. And storing the key name and the distance obtained by calculation according to each feature vector in a map structure for further processing.
And step S8, after subtracting a threshold value according to the Hamming distance between the vector in the candidate set and the vector to be inquired, voting to obtain the score of each candidate ID, and taking the highest score as a face recognition result.
After the similarity scores between the face features to be queried and all people in the candidate set are obtained, since each person in the candidate set may have more than one picture, a voter needs to be designed to vote for the face ID, and the voter in this embodiment is designed as follows:
Figure GDA0002275392880000171
score (ID) is the final voting score of each face ID in the candidate set, sim is the distance between the real-valued feature vector of the face to be queried obtained in step S7 and the feature vector in the candidate set, and threshold is a set threshold. It can be seen that when the cosine distance is greater than the threshold, the score of the person corresponding to the picture is increased, otherwise, the score of the corresponding person is decreased, and the id with the largest voting score is the final recognition result.
The present invention is not limited to the above-described embodiments, and it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements are also considered to be within the scope of the present invention. Those not described in detail in this specification are within the skill of the art.

Claims (8)

1. A large-scale face recognition method based on GPU accelerated retrieval is characterized by comprising the following steps:
s1, inputting the picture to be detected into an MTCNN network, detecting the position of the face and the position of key points in the picture by adopting a face detection algorithm, and aligning the detected face;
s2, extracting real-valued feature vectors of the face picture and the picture mirror image processed in the step S1 by using a trained deep learning model, and then fusing the two real-valued feature vectors and reducing dimensions to obtain real-valued features of the face;
s3, converting the real-value features into hash features;
s4, repeating the steps S1-S3 to detect the faces to be detected one by one, using the hash features as indexes and real-value feature vectors as values, and establishing a key value type face database;
s5, using a multi-GPU accelerated hash search algorithm to obtain k hash features which are adjacent to the hash features of the picture to be detected;
s6, using the k Hash features obtained in S5 as indexes, searching in the face database to obtain a candidate set consisting of k real-value feature vectors;
s7, calculating the vector similarity measurement distance between the real-valued feature vector of the photo to be inquired and the real-valued feature vector in the candidate set;
s8, voting to obtain the score of each photo to be queried according to the vector similarity measurement distance between the real-value feature vector in the candidate set and the real-value feature vector of the photo to be queried, and taking the highest score as a face recognition result;
in step S5, obtaining the hash features of the k neighbors by using the multiple GPU accelerated hash lookup algorithm specifically includes the following steps:
s501, dividing a data set consisting of all N hash features into M parts according to the set GPU number M, wherein each part does not exceed SUBN (N + M-1)/M hash features, and copying all the hash features of the divided subdata set from a host to a corresponding device end;
s502, setting a calculation thread number for each GPU, and calculating Hamming distances between the hash code to be inquired and all hash codes in a data set in parallel;
s503, dividing all Hamming distances into SUBN/K groups according to K adjacent data as a group, executing in parallel, merging and sorting the Hamming distances of all SUBN/K groups into order in the group;
s504, sequentially finding out K data with the minimum hamming distance in array subscripts [ nK, (n +1) K) and [ (n +1) K, (n +2) K), and moving the K data to the position of [ nK, (n +1) K), where n ═ 0, 2, 4 … n is a non-negative even number, and comparing the size of the maximum value in the interval [ nK, (n +1) K) and the minimum value in the interval [ (n +1) K, (n +2) K) each time the array subscripts are located, and exchanging the two data if the maximum value in [ nK, (n +1) K) is larger;
s505, repeating the steps S503 and S504, respectively sequencing [0, K ], [2K, 3K ], [4K, 5K) …, grouping the 2K arrays of [0, K) and [2K, 3K ]), finding out the first K minimum Hamming distances, moving the Hamming distances to [0, K ], and so on until the first K Hamming distances of the data packets on all M GPUs are stored at the positions of [0, K);
s506, copying the calculation results of the M GPUs to a memory of the host end, and traversing the M data by the maximum heap sorting for all the M data by the maximum heap sorting to obtain the minimum K Hamming distances in the M groups of data.
2. The large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 1, wherein: the step S1 specifically includes the following steps:
s101, inputting a picture to be detected into an MTCNN (multiple terminal connected network), generating a face candidate set window and regression position coordinates thereof by using a first CNN, and combining the face windows with high overlapping degree by using a non-maximum suppression algorithm to generate a face window candidate set;
s102, sending the result obtained in the step S101 to a second CNN which is more complex than the first CNN, and performing re-filtering and fine adjustment on the position of the face window;
s103, sending the result obtained in the step S102 to a third CNN which is more complex than the second CNN for fine adjustment, and generating the position of a final face window in each picture to be detected and coordinates of five face key points;
s104, judging whether a final face window is generated in the step S103, and if not, finishing the identification; if so, extracting a final face window image, correcting and aligning the face to the center, and storing the corrected face window image as the specified resolution.
3. The large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 1, wherein:
extracting a face photo and a real-value feature vector of a photo mirror image by using a face feature extraction network;
the face feature extraction network is a 32-layer deep convolutional neural network and comprises a convolutional layer, a down-sampling layer, a PRelu activation layer, a full connection layer and a loss function layer.
4. The large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 3, wherein: the loss function layer comprises two loss functions of softmax-loss and center-loss, and the softmax-loss function is used for improving the intra-class polymerization degree of the sample in the feature space after network mapping; the center-loss function is used for increasing the inter-class distance of the sample in the feature space after network mapping.
5. The large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 1, wherein: in step S2, the real-valued eigenvector of the face photograph processed in step S1 and the real-valued eigenvector of the mirror image of the photograph are fused according to the following formula:
fi=max(xi,yi)i=1,2,…,n
wherein x isiAnd yiRespectively, the ith dimension of the vector x, y to be fused, n the dimension of the real-valued eigenvector, fiIs the ith dimension of the fused real-valued eigenvector.
6. The large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 1, wherein: in the step S3, the face real-valued features obtained in the step S2 are converted into hash features according to the following formula:
f(x)=0.5×(sign(x)+1)
wherein the content of the first and second substances,
Figure FDA0002275392870000041
7. the large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 1, wherein: in step S7, the distance between the real-valued feature vector of the face to be queried obtained in step S5 and each real-valued feature vector in the candidate set is calculated by using Cosine metric or euclidean metric.
8. The large-scale face recognition method based on GPU-accelerated retrieval as recited in claim 1, wherein: in step S8, the voter formula used is as follows:
score (ID) is the final voting score of each face ID in the candidate set, sim is the distance between the real-valued feature vector of the face to be queried obtained in step S7 and the real-valued feature vector in the candidate set, and threshold is a set threshold.
CN201710675398.2A 2017-08-09 2017-08-09 Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval Active CN107577990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710675398.2A CN107577990B (en) 2017-08-09 2017-08-09 Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710675398.2A CN107577990B (en) 2017-08-09 2017-08-09 Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval

Publications (2)

Publication Number Publication Date
CN107577990A CN107577990A (en) 2018-01-12
CN107577990B true CN107577990B (en) 2020-02-18

Family

ID=61034399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710675398.2A Active CN107577990B (en) 2017-08-09 2017-08-09 Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval

Country Status (1)

Country Link
CN (1) CN107577990B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390344B (en) * 2018-04-19 2021-10-26 华为技术有限公司 Alternative frame updating method and device
CN110263603B (en) * 2018-05-14 2021-08-06 桂林远望智能通信科技有限公司 Face recognition method and device based on central loss and residual error visual simulation network
US11443176B2 (en) * 2018-05-17 2022-09-13 International Business Machines Corporation Acceleration of convolutional neural networks on analog arrays
CN108921065A (en) * 2018-06-21 2018-11-30 北京陌上花科技有限公司 The method and apparatus for establishing property data base
CN109062942A (en) * 2018-06-21 2018-12-21 北京陌上花科技有限公司 Data query method and apparatus
CN108920720B (en) * 2018-07-30 2021-09-07 电子科技大学 Large-scale image retrieval method based on depth hash and GPU acceleration
CN109034119A (en) * 2018-08-27 2018-12-18 苏州广目信息技术有限公司 A kind of method for detecting human face of the full convolutional neural networks based on optimization
CN109241325B (en) * 2018-09-11 2020-12-08 武汉魅瞳科技有限公司 Large-scale face retrieval method and device based on depth features
CN111382287A (en) * 2018-12-30 2020-07-07 浙江宇视科技有限公司 Picture searching method and device, storage medium and electronic equipment
CN109783692B (en) * 2019-01-08 2021-12-31 深圳英飞拓科技股份有限公司 Target feature code comparison method and device combining fast data with slow data
CN110059644A (en) * 2019-04-23 2019-07-26 杭州智趣智能信息技术有限公司 A kind of biopsy method based on facial image, system and associated component
CN110110125A (en) * 2019-04-28 2019-08-09 重庆学析优科技有限公司 A kind of quick accurately picture searching matching process and system
CN110110113A (en) * 2019-05-20 2019-08-09 重庆紫光华山智安科技有限公司 Image search method, system and electronic device
CN110659290B (en) * 2019-09-20 2021-06-11 中科寒武纪科技股份有限公司 Data processing method and device and related product
CN110647722B (en) * 2019-09-20 2024-03-01 中科寒武纪科技股份有限公司 Data processing method and device and related products
CN110879984A (en) * 2019-11-18 2020-03-13 上海眼控科技股份有限公司 Face comparison method and device
CN111310732A (en) * 2020-03-19 2020-06-19 广东宜教通教育有限公司 High-precision face authentication method, system, computer equipment and storage medium
CN112000845B (en) * 2020-08-19 2021-07-20 东北大学 Hyperspatial hash indexing method based on GPU acceleration
CN112527855B (en) * 2020-09-23 2024-05-03 广东协城信息科技有限公司 Face vector quick comparison technology
CN112685580A (en) * 2020-12-25 2021-04-20 公安部第三研究所 Social network head portrait comparison distributed detection system, method and device based on deep learning, processor and storage medium thereof
CN112434678B (en) * 2021-01-27 2021-06-04 成都无糖信息技术有限公司 Face measurement feature space searching system and method based on artificial neural network
CN113516002A (en) * 2021-03-05 2021-10-19 武汉特斯联智能工程有限公司 Face recognition method and device based on face recognition model and applying smart community
CN112907810A (en) * 2021-04-02 2021-06-04 吉林大学 Face recognition temperature measurement campus access control system based on embedded GPU
CN113254686B (en) * 2021-04-02 2023-08-01 青岛以萨数据技术有限公司 Personnel behavior detection method, device and storage medium
CN113469350B (en) * 2021-07-07 2023-03-24 武汉魅瞳科技有限公司 Deep convolutional neural network acceleration method and system suitable for NPU
CN114064948A (en) * 2021-10-15 2022-02-18 西安深信科创信息技术有限公司 Hash image retrieval method and device based on generalized average pooling strategy
CN114048344A (en) * 2021-11-25 2022-02-15 天翼数字生活科技有限公司 Similar face searching method, device, equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691126B1 (en) * 2000-06-14 2004-02-10 International Business Machines Corporation Method and apparatus for locating multi-region objects in an image or video database
CN102521366A (en) * 2011-12-16 2012-06-27 华中科技大学 Image retrieval method integrating classification with hash partitioning and image retrieval system utilizing same
CN106599830A (en) * 2016-12-09 2017-04-26 中国科学院自动化研究所 Method and apparatus for positioning face key points

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691126B1 (en) * 2000-06-14 2004-02-10 International Business Machines Corporation Method and apparatus for locating multi-region objects in an image or video database
CN102521366A (en) * 2011-12-16 2012-06-27 华中科技大学 Image retrieval method integrating classification with hash partitioning and image retrieval system utilizing same
CN106599830A (en) * 2016-12-09 2017-04-26 中国科学院自动化研究所 Method and apparatus for positioning face key points

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"FaceHunter:A multi-task convolutional neural network based face detector";Dong Wang etc.;《Signal Processing:Image Communication》;20160419;论文第1-2,3.1节,图1 *
"图像相似性计算及其GPU加速的若干研究";陈伟平;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131015(第10期);论文第1.1,5.3节 *
"基于哈希算法的图像检索***";倪康康;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160315(第03期);论文第3.3.1-3.3.2节,表3.1 *

Also Published As

Publication number Publication date
CN107577990A (en) 2018-01-12

Similar Documents

Publication Publication Date Title
CN107577990B (en) Large-scale face recognition method based on GPU (graphics processing Unit) accelerated retrieval
Yu et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition
CN111198959B (en) Two-stage image retrieval method based on convolutional neural network
Paisitkriangkrai et al. Pedestrian detection with spatially pooled features and structured ensemble learning
Lynen et al. Placeless place-recognition
CN111126360A (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN108280187B (en) Hierarchical image retrieval method based on depth features of convolutional neural network
CN107832335B (en) Image retrieval method based on context depth semantic information
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
Huang et al. Sketch-based image retrieval with deep visual semantic descriptor
CN112597324A (en) Image hash index construction method, system and equipment based on correlation filtering
CN112084895B (en) Pedestrian re-identification method based on deep learning
Zhang et al. Loop closure detection via maximization of mutual information
Zhou et al. Retrieval and localization with observation constraints
CN115830637B (en) Method for re-identifying blocked pedestrians based on attitude estimation and background suppression
Gao et al. Efficient view-based 3-D object retrieval via hypergraph learning
CN112084353A (en) Bag-of-words model method for rapid landmark-convolution feature matching
Yu et al. A DenseNet feature-based loop closure method for visual SLAM system
Schall et al. Deep aggregation of regional convolutional activations for content based image retrieval
Ameur et al. Unconstrained face verification based on monogenic binary pattern and convolutional neural network
Cheng et al. Research on feasibility of convolution neural networks for rock thin sections image retrieval
CN115100694A (en) Fingerprint quick retrieval method based on self-supervision neural network
Wu et al. Visual loop closure detection by matching binary visual features using locality sensitive hashing
Cui et al. A face alignment method based on SURF features
Wang et al. Visual Loop Closure Detection Based on Stacked Convolutional and Autoencoder Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant