CN111652260B - Face clustering sample number selection method and system - Google Patents

Face clustering sample number selection method and system Download PDF

Info

Publication number
CN111652260B
CN111652260B CN201910363240.0A CN201910363240A CN111652260B CN 111652260 B CN111652260 B CN 111652260B CN 201910363240 A CN201910363240 A CN 201910363240A CN 111652260 B CN111652260 B CN 111652260B
Authority
CN
China
Prior art keywords
face
clustering
training set
feature vector
cosine distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910363240.0A
Other languages
Chinese (zh)
Other versions
CN111652260A (en
Inventor
薛圆圆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Re Sr Information Technology Co ltd
Original Assignee
Shanghai Re Sr Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Re Sr Information Technology Co ltd filed Critical Shanghai Re Sr Information Technology Co ltd
Priority to CN201910363240.0A priority Critical patent/CN111652260B/en
Publication of CN111652260A publication Critical patent/CN111652260A/en
Application granted granted Critical
Publication of CN111652260B publication Critical patent/CN111652260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of face recognition, and discloses a method for selecting the number of face clustering samples, which comprises the following steps: a face test set is built, a plurality of face training sets are built, and the number of face images of each face training set is different; clustering the face training sets to obtain a plurality of corresponding clustering centers; calculating the cosine distance between each clustering center and the feature vector of each face image in the face test set, and obtaining the mean value and the root mean square value of the cosine distance corresponding to each clustering center; and acquiring the number range of the face clustering samples according to the cosine distance mean value and the root mean square value corresponding to each clustering center. Correspondingly, the invention also discloses a system for selecting the number of the face clustering samples. The invention provides a method for selecting the number of face clustering samples, and can ensure a good clustering effect.

Description

Face clustering sample number selection method and system
Technical Field
The invention relates to the technical field of face recognition, in particular to a method and a system for selecting the number of face clustering samples.
Background
The face recognition technology is a biological recognition technology for carrying out identity recognition based on facial feature information of a person, and a camera is used for collecting images or video streams containing the face and automatically detecting and tracking the face in the images so as to further recognize the detected face. In the face recognition product, a plurality of face pictures are required to be added to register a face model. The registered face model is generally a feature vector, and a problem that one person corresponds to a plurality of face feature vectors occurs, so that the face feature vectors need to be clustered, so that one person corresponds to a unique feature vector. The face clustering algorithm aims at finding the clustering centers of feature vectors extracted from a plurality of photos of the same person through clustering, so that the sum of squares of distances from the clustering centers to respective images is minimized. The patent application with publication number of CN 108875778A discloses a face clustering method, which comprises the following steps: determining a clustering mode based on the number of images to be clustered, wherein the clustering mode describes how many images are acquired from the images to be clustered each time to perform face clustering once; and acquiring a corresponding number of images from the images to be clustered each time based on the determined clustering mode to perform face clustering until the images to be clustered are all completed.
Generally, the clustering algorithm has better clustering effect under the condition of more images. But the user only provides a small number of photo images for clustering at the time of face registration. The above patent application provides a technical solution for face clustering, and does not provide a method for selecting the number of face clustering samples.
Therefore, how to select and evaluate the number of face clustering samples and ensure a good clustering effect becomes a technical problem that needs to be solved.
Disclosure of Invention
The invention aims to provide a method and a system for selecting the number of face clustering samples, and provides a method for selecting the number of face clustering samples, which can ensure a good clustering effect.
In order to achieve the above object, the present invention provides a method for selecting the number of face clustering samples, the method comprising: a face test set is built, a plurality of face training sets are built, and the number of face images of each face training set is different; clustering the face training sets to obtain a plurality of corresponding clustering centers; calculating the cosine distance between each clustering center and the feature vector of each face image in the face test set, and obtaining the mean value and the root mean square value of the cosine distance corresponding to each clustering center; and acquiring the number range of the face clustering samples according to the cosine distance mean value and the root mean square value corresponding to each clustering center. Based on the technical scheme, the technical scheme is provided that an average value and a root mean square value of cosine distances between a clustering center and the images of the test set are used as indexes for evaluating the number of face clustering samples.
Preferably, the step S1 includes: constructing a face image original face image set with a plurality of people, and carrying out face detection and clipping on all face images in the original face image set; and selecting a preset number of face images of the same person from the original face image set, wherein the selected face images form the face test set.
Preferably, the step S1 includes: constructing a first face training set, wherein the number of face images of the first face training set is N1; constructing a second face training set, wherein the number of face images of the second face training set is N2; constructing a third face training set, wherein the number of face images of the third face training set is N3; constructing a fourth face training set, wherein the number of face images of the fourth face training set is N4; wherein N1> N2> N3> N4, N1>10, N2 less than or equal to 10, and N4 more than or equal to 3.
Preferably, the step S2 includes: carrying out convolution and feature extraction on each face image in the first face training set according to a convolution neural network model, generating a first feature vector group corresponding to the first face training set, and carrying out K-means clustering on the first feature vector group to obtain a first clustering center corresponding to the first face training set; carrying out convolution and feature extraction on each face image in the second face training set according to a convolution neural network model, generating a second feature vector group corresponding to the second face training set, and carrying out K-means clustering on the second feature vector group to obtain a second aggregation center corresponding to the second face training set; carrying out convolution and feature extraction on each face image in the third face training set according to a convolution neural network model, generating a third feature vector group corresponding to the third face training set, and carrying out K-means clustering on the third feature vector group to obtain a third polymerization center corresponding to the third face training set; and carrying out convolution and feature extraction on each face image in the fourth face training set according to the convolution neural network model, generating a fourth feature vector group corresponding to the fourth face training set, and carrying out K-means clustering on the fourth feature vector group to obtain a fourth clustering center corresponding to the fourth face training set.
Preferably, the step S3 includes: the calculation formula of the mean value of the cosine distance is as formula 1:
Figure BDA0002047474020000031
wherein mean is the mean value of cosine distance, n is the number of face images in the face test set, and d i The cosine distance between the clustering center and the feature vector of the ith face image in the face test set is obtained.
Preferably, the step S3 further includes: the root mean square value of the cosine distance is calculated as formula 2:
Figure BDA0002047474020000032
wherein var is root mean square value of cosine distance, n is number of face images in the face test set, and d i The cosine distance between the clustering center and the feature vector of the ith face image in the face test set is obtained.
Preferably, the step S3 further includes: calculating the cosine distance between the first clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the first clustering center according to the formulas 1 and 2; calculating the cosine distance between the second center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the second center according to the formulas 1 and 2; calculating the cosine distance between the third class center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the third class center according to the formulas 1 and 2; and calculating the cosine distance between the fourth clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the fourth clustering center according to the formulas 1 and 2.
Preferably, the step S3 further includes: and respectively calculating the cosine distance between the feature vector of each face image in the fourth face training set and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to each face image in the fourth face training set according to the formulas 1 and 2.
Preferably, the step S4 includes: the number of face cluster samples ranges from [3,10 ]. According to the technical scheme, the number range of the face clustering samples is provided, a good clustering effect can be obtained, and the number of the samples of the face photos required to be provided by a user during face registration is guided.
In order to achieve the above object, the present invention provides a system for selecting the number of face clustering samples, the system comprising: the training set module is used for constructing a face test set and a plurality of face training sets, and the number of face images of each face training set is different; the clustering module is used for clustering the face training sets to obtain a plurality of corresponding clustering centers; the computing module is used for computing the cosine distance between each clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to each clustering center; and the evaluation module is used for acquiring the number range of the face clustering samples according to the cosine distance mean value and the root mean square value corresponding to each clustering center. Based on the technical scheme, the technical scheme is provided that an average value and a root mean square value of cosine distances between a clustering center and the images of the test set are used as indexes for evaluating the number of face clustering samples.
Compared with the prior art, the method and the system for selecting the number of the face clustering samples have the beneficial effects that: the technical scheme of taking the average value and the root mean square value of the cosine distance between the clustering center and the test set image as the index for evaluating the number of face clustering samples is provided, the number range of the face clustering samples is provided, a good clustering effect can be obtained, and the number of the samples of the face photos required to be provided by a user during face registration is guided; the scene that a user provides a small amount of images for face clustering during face registration is met, the practicability is high, and the experience effect of the user is improved.
Drawings
Fig. 1 is a flow chart of a method for selecting a number of face clustering samples according to an embodiment of the present invention.
Fig. 2 is a block diagram showing the components of a system for selecting the number of face clustering samples in one embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments shown in the drawings. In the drawings, like structural elements are referred to by like reference numerals and components having similar structure or function are referred to by like reference numerals. The dimensions and thickness of each component shown in the drawings are arbitrarily shown, and the present invention is not limited to the dimensions and thickness of each component. The thickness of the components is exaggerated in some places in the drawings for clarity of illustration.
In one embodiment of the present invention as shown in fig. 1, the present invention provides a method for selecting a number of face clustering samples, the method comprising:
s1, constructing a face test set and constructing a plurality of face training sets, wherein the number of face images of each face training set is different;
s2, clustering the face training sets to obtain a plurality of corresponding clustering centers;
s3, calculating cosine distances between each clustering center and the feature vector of each face image in the face test set, and obtaining the mean value and the root mean square value of the cosine distances corresponding to each clustering center;
s4, obtaining the number range of the face clustering samples according to the cosine distance mean value and the root mean square value corresponding to each clustering center.
The step S1 is as follows: a face test set is constructed, a plurality of face training sets are constructed, and the number of face images of each face training set is different. Specifically, an original face image set is constructed, the original face image set is provided with face images of a plurality of persons, and face detection and cutting are carried out on all face images in the original face image set to form face images with consistent standard sizes. And selecting a preset number of face images of the same person from the original face image set, wherein the selected face images form the face test set. According to a specific embodiment of the present invention, the number of samples of each face in the original face image set is greater than 112, and 96 face images of the same person are selected from the original face image set as the face test set.
According to an embodiment of the present invention, the step S1 further includes: the step S1 includes: constructing a first face training set, wherein the number of face images of the first face training set is N1; constructing a second face training set, wherein the number of face images of the second face training set is N2; constructing a third face training set, wherein the number of face images of the third face training set is N3; constructing a fourth face training set, wherein the number of face images of the fourth face training set is N4; wherein N1> N2> N3> N4, N1>10, N2 less than or equal to 10, and N4 more than or equal to 3. According to a preferred embodiment of the present invention, N1 is set to 16, N2 is set to 10, N3 is set to 5, and N4 is set to 3. And selecting 16 face images from the original face image set, wherein the 16 face images belong to the same person, and the 16 face images form a first face training set. Similarly, 10 face images are selected from the original face image set, the 10 face images belong to the same person, and the 10 face images form a second face training set. And selecting 5 face images from the original face image set, wherein the 5 face images belong to the same person, and the 5 face images form a third face training set. And selecting 3 face images from the original face image set, wherein the 3 face images belong to the same person, and the 3 face images form a fourth face training set.
The step S2 is as follows: and clustering the face training sets to obtain a plurality of corresponding clustering centers. The face feature extraction is carried out on the face training set, and the extracted face feature is clustered to generate a clustering center. According to an embodiment of the present invention, the step S2 includes: carrying out convolution and feature extraction on each face image in the first face training set according to a convolution neural network model, obtaining a feature vector corresponding to each face image, and generating a first feature vector group corresponding to the first face training set; and carrying out K-means clustering on the first feature vector group to obtain a first clustering center corresponding to the first face training set. Carrying out convolution and feature extraction on each face image in the second face training set according to the convolution neural network model, obtaining a feature vector corresponding to each face image, and generating a second feature vector group corresponding to the second face training set; and carrying out K-means clustering on the second feature vector group to obtain a second aggregation center corresponding to the second face training set. Carrying out convolution and feature extraction on each face image in the third face training set according to the convolution neural network model, obtaining a feature vector corresponding to each face image, and generating a third feature vector group corresponding to the third face training set; and carrying out K-means clustering on the third feature vector group to obtain a third class center corresponding to the third face training set. Carrying out convolution and feature extraction on each face image in the fourth face training set according to the convolution neural network model, obtaining a feature vector corresponding to each face image, and generating a fourth feature vector group corresponding to the fourth face training set; and carrying out K-means clustering on the fourth feature vector group to obtain a fourth clustering center corresponding to the fourth face training set.
The step S3 is as follows: and calculating the cosine distance between each clustering center and the feature vector of each face image in the face test set, and obtaining the mean value and the root mean square value of the cosine distance corresponding to each clustering center. According to an embodiment of the present invention, the step S3 includes: the calculation formula of the mean value of the cosine distance is as formula 1:
Figure BDA0002047474020000071
wherein mean is the mean value of cosine distance, n is the number of face images in the face test set, and d i The cosine distance between the clustering center and the feature vector of the ith face image in the face test set is set;
according to an embodiment of the present invention, the step S3 further includes: the root mean square value of the cosine distance is calculated as formula 2:
Figure BDA0002047474020000072
wherein var is root mean square value of cosine distance, n is number of face images in the face test set, and d i The cosine distance between the clustering center and the feature vector of the ith face image in the face test set is obtained.
According to an embodiment of the present invention, the step S3 further includes: and calculating the cosine distance between the first clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the first clustering center according to the formulas 1 and 2. And calculating the cosine distance between the second center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the second center according to the formulas 1 and 2. And calculating the cosine distance between the third class center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the third class center according to the formulas 1 and 2. And calculating the cosine distance between the fourth clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the fourth clustering center according to the formulas 1 and 2.
According to an embodiment of the present invention, the step S3 further includes: and respectively calculating the cosine distance between the feature vector of each face image in the fourth face training set and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to each face image in the fourth face training set according to the formulas 1 and 2.
The step S4 is as follows: and acquiring the number range of the face clustering samples according to the cosine distance mean value and the root mean square value corresponding to each clustering center.
According to an embodiment of the present invention, table 1 is a maximum cosine distance, a minimum cosine distance, a mean value of the cosine distances, and a root mean square value of the cosine distances corresponding to each cluster center. Setting N1 to 16, N2 to 10, N3 to 5 and N4 to 3. The first clustering center corresponds to a first face training set, and the number of face images of the first face training set is 16. The second face training set corresponds to the second face training set, and the number of face images of the second face training set is 10. The third class center corresponds to a third face training set, and the number of face images of the third face training set is 5. The fourth clustering center corresponds to a fourth face training set, and the number of face images of the fourth face training set is 3.
Figure BDA0002047474020000081
Figure BDA0002047474020000091
TABLE 1
As can be seen from table 1, the larger the number of clustering samples, the smaller the average value and the root mean square value of the corresponding cosine distances, and the smaller the clustering effect. The average value and the root mean square of the cosine distances between the clustering center and the face images in the face test set are used as evaluation indexes of the number of face clustering samples, and the more the clustering samples are, the smaller the average value and the root mean square value of the cosine distances are, so that the clustering effect is better. As can be seen from the table, after the number of samples of face clusters exceeds 10, the speed at which the average value of the cosine distance and the root mean square value decrease significantly decreases, and therefore, the upper limit of the number of samples provided at the time of user face registration using the face is set to 10, that is, the upper limit of the number range of the face cluster samples is 10.
According to an embodiment of the present invention, table 2 is a maximum cosine distance, a minimum cosine distance, a mean value of the cosine distances, and a root mean square value of the cosine distances corresponding to each face image in the fourth face training set.
Figure BDA0002047474020000092
TABLE 2
As can be seen from table 2, the clustering effect is better when the number of face clustering samples is small than when the clustering is not performed. The clustering effect is obviously better when the number of the face clustering samples is 3 than that of the face clustering samples which are not clustered, so that the lower limit of the number range of the face clustering samples is 3. The number of face clustering samples is thus in the range of [3,10], and the larger the number of face clustering samples, the better the clustering.
According to the technical scheme, the average value and the root mean square value of the cosine distance between the clustering center and the test set image are used as indexes for evaluating the number of face clustering samples, the number range of the face clustering samples is provided, a good clustering effect can be obtained, and the number of the face photos required to be provided by a user during face registration is guided; the scene that a user provides a small amount of images for face clustering during face registration is met, the practicability is high, and the experience effect of the user is improved.
In another embodiment, as shown in fig. 2, the present invention further provides a system for selecting a number of face clustering samples, where the system includes:
the training set module 20 is configured to construct a face test set, and construct a plurality of face training sets, each of which has a different number of face images;
a clustering module 21, configured to cluster the plurality of face training sets to obtain a plurality of corresponding clustering centers;
the calculating module 22 is configured to calculate cosine distances between each cluster center and feature vectors of each face image in the face test set, and obtain an average value and a root mean square value of the cosine distances corresponding to each cluster center;
and the evaluation module 23 is configured to obtain a number range of face clustering samples according to the cosine distance average value and the root mean square value corresponding to each clustering center.
The training set module 20 is configured to construct a face test set, and construct a plurality of face training sets, each of which has a different number of face images. According to a specific embodiment of the invention, a training set module constructs a first face training set, wherein the number of face images of the first face training set is N1; constructing a second face training set, wherein the number of face images of the second face training set is N2; constructing a third face training set, wherein the number of face images of the third face training set is N3; constructing a fourth face training set, wherein the number of face images of the fourth face training set is N4; wherein N1> N2> N3> N4, N1>10, N2 less than or equal to 10, and N4 more than or equal to 3.
The clustering module 21 is configured to cluster the face training sets to obtain a plurality of corresponding clustering centers. The face feature extraction is carried out on the face training set, and the extracted face feature is clustered to generate a clustering center. According to a specific embodiment of the present invention, the clustering module clusters the four face training sets respectively to generate respective corresponding clustering centers, that is, a first clustering center corresponding to the first face training set, a second clustering center corresponding to the second face training set, a third clustering center corresponding to the third face training set, and a fourth clustering center corresponding to the fourth face training set.
The calculating module 22 is configured to calculate cosine distances between each cluster center and feature vectors of each face image in the face test set, and obtain an average value and a root mean square value of the cosine distances corresponding to each cluster center. According to a specific embodiment of the present invention, the calculating module calculates cosine distances between each cluster center and a feature vector of each face image in the face test set, and obtains an average value and a root mean square value of the cosine distances corresponding to each cluster center according to equations 1 and 2 in the above method embodiment. According to a specific embodiment of the present invention, the calculation module calculates cosine distances between the feature vector of each face image in the fourth face training set and the feature vector of each face image in the face test set, and obtains an average value and a root mean square value of the cosine distances corresponding to each face image in the fourth face training set according to the formulas 1 and 2.
The evaluation module 23 is configured to obtain a number range of face clustering samples according to the cosine distance average value and the root mean square value corresponding to each clustering center. According to the method, the average value and the root mean square of the cosine distances between the clustering center and the face images of the face test set are used as the evaluation indexes of the number of face clustering samples, and the more the clustering samples are, the smaller the average value and the root mean square value of the cosine distances are, so that the clustering effect is better. The number of face clustering samples is in the range of [3,10], and the more the number of face clustering samples is in the number range, the better the clustering is.
According to the technical scheme, the average value and the root mean square value of the cosine distance between the clustering center and the test set image are used as indexes for evaluating the number of face clustering samples, the number range of the face clustering samples is provided, a good clustering effect can be obtained, and the number of the face photos required to be provided by a user during face registration is guided.
While the invention has been described in detail in the foregoing drawings and embodiments, such illustration and description are to be considered illustrative or exemplary and not restrictive. The invention is not limited to the disclosed embodiments. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" or "a particular" plurality should be understood as at least one or at least a particular plurality. Any reference signs in the claims shall not be construed as limiting the scope. Other variations to the above-described embodiments can be understood and effected by those skilled in the art in light of the figures, the description, and the appended claims, without departing from the scope of the invention as defined in the claims.

Claims (7)

1. A method for selecting a number of face clustering samples, the method comprising the steps of:
s1, constructing a face test set and constructing a plurality of face training sets, wherein the number of face images of each face training set is different;
s2, clustering the face training sets to obtain a plurality of corresponding clustering centers;
s3, calculating cosine distances between each clustering center and the feature vector of each face image in the face test set, and obtaining the mean value and the root mean square value of the cosine distances corresponding to each clustering center;
s4, acquiring the number range of face clustering samples according to the cosine distance mean value and the root mean square value corresponding to each clustering center;
wherein, the step S1 includes:
constructing an original face image set with face images of a plurality of people, and carrying out face detection and clipping on all face images in the original face image set;
selecting a preset number of face images of the same person from the original face image set, wherein the selected face images form the face test set;
constructing a first face training set, wherein the number of face images of the first face training set is N1;
constructing a second face training set, wherein the number of face images of the second face training set is N2;
constructing a third face training set, wherein the number of face images of the third face training set is N3;
constructing a fourth face training set, wherein the number of face images of the fourth face training set is N4;
wherein N1 is greater than N2 and N3 is greater than N4, N1 is greater than 10, N2 is less than or equal to 10, and N4 is greater than or equal to 3;
the step S2 includes:
carrying out convolution and feature extraction on each face image in the first face training set according to a convolution neural network model, generating a first feature vector group corresponding to the first face training set, and carrying out K-means clustering on the first feature vector group to obtain a first clustering center corresponding to the first face training set;
carrying out convolution and feature extraction on each face image in the second face training set according to a convolution neural network model, generating a second feature vector group corresponding to the second face training set, and carrying out K-means clustering on the second feature vector group to obtain a second aggregation center corresponding to the second face training set;
carrying out convolution and feature extraction on each face image in the third face training set according to a convolution neural network model, generating a third feature vector group corresponding to the third face training set, and carrying out K-means clustering on the third feature vector group to obtain a third polymerization center corresponding to the third face training set;
and carrying out convolution and feature extraction on each face image in the fourth face training set according to the convolution neural network model, generating a fourth feature vector group corresponding to the fourth face training set, and carrying out K-means clustering on the fourth feature vector group to obtain a fourth clustering center corresponding to the fourth face training set.
2. The method for selecting the number of face clustering samples according to claim 1, wherein the step S3 includes:
the calculation formula of the mean value of the cosine distance is as formula 1:
Figure FDA0004142281830000021
wherein mean is the mean value of cosine distances, n is the face test setNumber of face images, d i The cosine distance between the clustering center and the feature vector of the ith face image in the face test set is obtained.
3. The method for selecting the number of face clustering samples according to claim 2, wherein the step S3 further comprises:
the root mean square value of the cosine distance is calculated as formula 2:
Figure FDA0004142281830000022
wherein var is root mean square value of cosine distance, n is number of face images in the face test set, and d i The cosine distance between the clustering center and the feature vector of the ith face image in the face test set is obtained.
4. The method for selecting a number of face clustering samples according to claim 3, wherein the step S3 further comprises:
calculating the cosine distance between the first clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the first clustering center according to the formulas 1 and 2;
calculating the cosine distance between the second center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the second center according to the formulas 1 and 2;
calculating the cosine distance between the third class center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the third class center according to the formulas 1 and 2;
and calculating the cosine distance between the fourth clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to the fourth clustering center according to the formulas 1 and 2.
5. The method for selecting the number of face clustering samples according to claim 4, wherein the step S3 further comprises:
and respectively calculating the cosine distance between the feature vector of each face image in the fourth face training set and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to each face image in the fourth face training set according to the formulas 1 and 2.
6. The method for selecting the number of face clustering samples according to claim 5, wherein the step S4 includes:
the number of face cluster samples ranges from [3,10 ].
7. A system for selecting a number of face clustering samples, wherein the system performs a method for selecting a number of face clustering samples according to any one of claims 1 to 6, the system comprising: the training set module is used for constructing a face test set and a plurality of face training sets, and the number of face images of each face training set is different;
the clustering module is used for clustering the face training sets to obtain a plurality of corresponding clustering centers;
the computing module is used for computing the cosine distance between each clustering center and the feature vector of each face image in the face test set, and acquiring the mean value and the root mean square value of the cosine distance corresponding to each clustering center;
an evaluation module, configured to determine, according to the cosine distance average value and the root mean square value corresponding to each cluster center,
and obtaining the number range of the face clustering samples.
CN201910363240.0A 2019-04-30 2019-04-30 Face clustering sample number selection method and system Active CN111652260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910363240.0A CN111652260B (en) 2019-04-30 2019-04-30 Face clustering sample number selection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910363240.0A CN111652260B (en) 2019-04-30 2019-04-30 Face clustering sample number selection method and system

Publications (2)

Publication Number Publication Date
CN111652260A CN111652260A (en) 2020-09-11
CN111652260B true CN111652260B (en) 2023-06-20

Family

ID=72346281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910363240.0A Active CN111652260B (en) 2019-04-30 2019-04-30 Face clustering sample number selection method and system

Country Status (1)

Country Link
CN (1) CN111652260B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101238A (en) * 2020-09-17 2020-12-18 浙江商汤科技开发有限公司 Clustering method and device, electronic equipment and storage medium
CN113052079B (en) * 2021-03-26 2022-01-21 重庆紫光华山智安科技有限公司 Regional passenger flow statistical method, system, equipment and medium based on face clustering
CN116541726B (en) * 2023-07-06 2023-09-19 中国科学院空天信息创新研究院 Sample size determination method, device and equipment for vegetation coverage estimation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013109625A1 (en) * 2012-01-17 2013-07-25 Alibaba Group Holding Limited Image index generation based on similarities of image features
CN105512620A (en) * 2015-11-30 2016-04-20 北京天诚盛业科技有限公司 Convolutional neural network training method and apparatus for face recognition
CN106250821A (en) * 2016-07-20 2016-12-21 南京邮电大学 The face identification method that a kind of cluster is classified again
CN106845421A (en) * 2017-01-22 2017-06-13 北京飞搜科技有限公司 Face characteristic recognition methods and system based on multi-region feature and metric learning
CN108549883A (en) * 2018-08-06 2018-09-18 国网浙江省电力有限公司 A kind of face recognition methods again
WO2019011093A1 (en) * 2017-07-12 2019-01-17 腾讯科技(深圳)有限公司 Machine learning model training method and apparatus, and facial expression image classification method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013109625A1 (en) * 2012-01-17 2013-07-25 Alibaba Group Holding Limited Image index generation based on similarities of image features
CN105512620A (en) * 2015-11-30 2016-04-20 北京天诚盛业科技有限公司 Convolutional neural network training method and apparatus for face recognition
CN106250821A (en) * 2016-07-20 2016-12-21 南京邮电大学 The face identification method that a kind of cluster is classified again
CN106845421A (en) * 2017-01-22 2017-06-13 北京飞搜科技有限公司 Face characteristic recognition methods and system based on multi-region feature and metric learning
WO2019011093A1 (en) * 2017-07-12 2019-01-17 腾讯科技(深圳)有限公司 Machine learning model training method and apparatus, and facial expression image classification method and apparatus
CN108549883A (en) * 2018-08-06 2018-09-18 国网浙江省电力有限公司 A kind of face recognition methods again

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李振东 ; 钟勇 ; 张博言 ; 曹冬平 ; .基于深度特征聚类的海量人脸图像检索.哈尔滨工业大学学报.2018,(11),全文. *
黎明 ; 吴陈 ; .基于改进聚类算法的图像特征提取.信息通信.2017,(03),全文. *

Also Published As

Publication number Publication date
CN111652260A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN108229674B (en) Training method and device of neural network for clustering, and clustering method and device
CN108304435B (en) Information recommendation method and device, computer equipment and storage medium
CN111652260B (en) Face clustering sample number selection method and system
CN106778604B (en) Pedestrian re-identification method based on matching convolutional neural network
CN104933414B (en) A kind of living body faces detection method based on WLD-TOP
US10820296B2 (en) Generating wireless network access point models using clustering techniques
CN107909104A (en) The face cluster method, apparatus and storage medium of a kind of picture
CN106575280B (en) System and method for analyzing user-associated images to produce non-user generated labels and utilizing the generated labels
CN109214428A (en) Image partition method, device, computer equipment and computer storage medium
CN108960260B (en) Classification model generation method, medical image classification method and medical image classification device
CN103262118A (en) Attribute value estimation device, attribute value estimation method, program, and recording medium
CN103353881B (en) Method and device for searching application
CN103020589B (en) A kind of single training image per person method
CN105631404B (en) The method and device that photo is clustered
CN107886507A (en) A kind of salient region detecting method based on image background and locus
CN110751069A (en) Face living body detection method and device
CN106897700B (en) Single-sample face recognition method and system
US8953877B2 (en) Noise estimation for images
CN110489659A (en) Data matching method and device
CN111339884A (en) Image recognition method and related equipment and device
CN116052218B (en) Pedestrian re-identification method
CN108960142A (en) Pedestrian based on global characteristics loss function recognition methods again
CN113705310A (en) Feature learning method, target object identification method and corresponding device
CN108491883A (en) A kind of conspicuousness inspection optimization method based on condition random field
CN110263643B (en) Quick video crowd counting method based on time sequence relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant