CN111291678B

CN111291678B - Face image clustering method and device based on multi-feature fusion

Info

Publication number: CN111291678B
Application number: CN202010081619.5A
Authority: CN
Inventors: 周峰; 杜康; 呼延康
Original assignee: Beijing Aibee Technology Co Ltd
Current assignee: Beijing Aibee Technology Co Ltd
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2024-01-12
Anticipated expiration: 2040-02-06
Also published as: CN111291678A

Abstract

The invention discloses a face image clustering method based on multi-feature fusion, which comprises the following steps: acquiring a face feature set of each face image in a face image set to be clustered, wherein the face feature set comprises feature vectors extracted by each feature extractor; calculating the similarity vector of each face image pair in the face image set to be clustered according to each face feature set; determining an initial cluster of each face image according to each similarity vector; and merging the initial clusters with the communication relation in each initial cluster to obtain a target cluster. In the clustering method, the similarity vector of each image pair in the face image set to be clustered is calculated for each face image face feature set, clustering is carried out according to the similarity vector, clustering results are combined to obtain target clustering, a plurality of feature vectors are adopted for calculation, and compared with clustering carried out by single face features, clustering accuracy is improved.

Description

Face image clustering method and device based on multi-feature fusion

Technical Field

The invention relates to the technical field of faces, in particular to a face image clustering method and device based on multi-feature fusion.

Background

In the face recognition process, the face images need to be clustered and archived according to the corresponding identity Information (ID) so that a plurality of face images from the same person have the same ID.

However, most of the existing face image clustering methods only use one face feature for clustering, and due to limited description capability of a single face feature, the problems of more people, multiple grades and multiple people and one grade in a clustering result occur, and the clustering precision is low.

Disclosure of Invention

In view of the above, the invention provides a face image clustering method and device based on multi-feature fusion, which are used for solving the problems that the existing face image clustering method mainly utilizes only one face feature for clustering, and the clustering accuracy is low because the description capability of a single face feature is limited, so that more people and multiple people exist in a clustering result. The specific scheme is as follows:

a face image clustering method based on multi-feature fusion is characterized by comprising the following steps:

acquiring a face feature set of each face image in a face image set to be clustered, wherein the face feature set comprises feature vectors extracted by each feature extractor;

calculating the similarity vector of each face image pair in the face image set to be clustered according to each face feature set;

determining an initial cluster of each face image according to each similarity vector;

and merging the initial clusters with the communication relation in each initial cluster to obtain a target cluster.

The above method, optionally, determines an initial cluster of each face image according to each similarity vector, including:

determining the distance of each image pair in the face image set to be clustered according to each similarity vector;

and classifying the face image pairs with the distance smaller than or equal to a preset distance threshold into the same cluster to obtain each initial cluster.

According to the above method, optionally, determining the distance between each image pair in the face image set to be clustered according to each similarity vector includes:

transmitting each similarity vector to a preset neural network model for calculation to obtain a first probability and a second probability;

calculating the target similarity of each face image pair according to each first probability and each second probability;

and determining the distance of each image pair according to the similarity of each target.

In the above method, optionally, merging the initial clusters with the communication relationship in each initial cluster to obtain the target cluster, including:

combining the initial clusters with intersections in each initial cluster to obtain each combined cluster;

and continuing to merge the merged vectors with the intersections in each merged cluster until each merged cluster does not have the intersections, thereby obtaining the target cluster.

The method, optionally, the feature extractors include: a face feature extractor and a body feature extractor.

A face image clustering device based on multi-feature fusion comprises:

the acquisition module is used for acquiring a face feature set of each face image in the face image set to be clustered, wherein the face feature set comprises feature vectors extracted by each feature extractor;

the computing module is used for computing the similarity vector of each face image pair in the face image set to be clustered according to each face feature set;

the determining module is used for determining the initial cluster of each face image according to each similarity vector;

and the merging module is used for merging the initial clusters with the communication relation in the initial clusters to obtain target clusters.

The above apparatus, optionally, the determining module includes:

the determining unit is used for determining the distance between each image pair in the face image set to be clustered according to each similarity vector;

and the dividing unit is used for classifying the face image pairs with the distance smaller than or equal to a preset distance threshold value into the same cluster to obtain each initial cluster.

The above apparatus, optionally, the determining unit includes:

the first calculation subunit is used for transmitting each similarity vector to a preset neural network model for calculation to obtain a first probability and a second probability;

the second calculating subunit is used for calculating the target similarity of each face image pair according to each first probability and each second probability;

and the determining subunit is used for determining the distance of each image pair according to each target similarity.

The above apparatus, optionally, the combining module includes:

the first merging unit is used for merging the initial clusters with intersections in each initial cluster to obtain each merged cluster;

and the second merging unit is used for continuously merging the merged vectors with the intersections in each merged cluster until each merged cluster does not have the intersections, so as to obtain a target cluster.

The above apparatus, optionally, the feature extractors in the clustering apparatus include: a face feature extractor and a body feature extractor.

Compared with the prior art, the invention has the following advantages:

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a face image clustering method based on multi-feature fusion, which is disclosed in an embodiment of the present application;

FIG. 2 is a flowchart of a face image clustering method based on multi-feature fusion according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a base neural network model according to an embodiment of the present disclosure;

fig. 4 is a structural block diagram of a face image clustering device based on multi-feature fusion disclosed in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The invention discloses a face image clustering method and a device based on multi-feature fusion, which are applied to the face image clustering in the face recognition process, wherein in the face image clustering analysis process, only one face feature is utilized for clustering in the prior art, and as the description capability of a single face feature is limited, the problem that more people and more people occur in the clustering result and the problem that the clustering result is in one step occurs, and the clustering precision is low, the invention provides the face image clustering method based on multi-feature fusion, which is used for solving the problems, and the execution flow of the clustering method is shown in the figure 1 and comprises the following steps:

s101, acquiring a face feature set of each face image in a face image set to be clustered, wherein the face feature set comprises feature vectors extracted by each feature extractor;

in the embodiment of the invention, the face images to be clustered collectively comprise a plurality of face images, and the face images are clustered according to different people when the face images are the same person. The feature extractors can be Convolutional Neural Network (CNN) models, the CNN models are trained face recognition or pedestrian re-recognition CNN models, the input of the models is an image, and through a series of operations such as convolution, activation, pooling and the like of the CNN models, the output of the CNN models is a feature vector with fixed dimension, and the feature vector can be used for describing the input image. The feature extraction by using the CNN model is to take a face image as input, and finally obtain the feature vector describing the image.

The generation process of the face feature set is as follows:

using a plurality of different Convolutional Neural Network (CNN) models G _i Extracting facial image characteristics:

f _ij ＝G _i (I _j ),i＝1,2,3,...,m,j＝1,2,3,...n (1)

wherein G is _i Representing the ith CNN model, and m kinds of CNN models are used; i _j Representing the j-th image, wherein n total images in the face image set to be clustered are represented; and f _ij Representing the feature vectors extracted on the j Zhang Tuzai ith CNN model. Finally for each face image I _j Corresponding face feature sets which are all reached:

F _j ＝{f _1j ,f _2j ,f _3j ,...,f _mj } (2)

further, as the face image of the same person may have illumination changes, occlusion and posture changes, the face features of the same person may be greatly different due to such changes. However, in some cases, the dress of the person is not changed too much, so that although the images cannot be clustered through the face features, the images can be clustered through the body features of the person, and therefore, the feature extractor in the embodiment of the invention can be all the face feature extractor, and the complementary relation existing between various features can be mined in a mode of mixing the face feature extractor and the body feature extractor, so that the purpose of improving the face clustering precision is achieved.

S102, calculating a similarity vector of each face image pair in the face image set to be clustered according to each face feature set;

in the embodiment of the invention, for each image I _j And obtaining a face characteristic image set, dividing each face image in the face image set to be clustered into each face image pair according to a preset classification rule in advance, wherein each image in the face image set to be clustered needs to form an image pair with all other images except the image pair and calculate a similarity vector, and if the similarity vector of the image a and the image b is already calculated, the similarity vector of the image b and the image a can be directly used without calculation, and the preset classification rule is not limited in the embodiment of the invention.

And calculating the similarity vector of each face image pair in the face image set to be clustered according to each face feature set, wherein the calculation process of each face image pair is the same. Let us assume that we want to calculate the image I in the face image pair _a And image I _b For image I _a F of face feature set of (2) _a ＝{f _1a ,f _2a ,f _3a ,...,f _ma -a }; for image I _b F of face feature set of (2) _b ＝{f _1b ,f _2b ,f _3b ,...,f _mb Calculation mapImage I _a And image I _b The similarity on each feature is to calculate the similarity on each feature vector:

the similarity calculation method is shown in formula (3),

the specific calculation process is as follows:

...

thus, for each image pair, a set of similarity vectors is obtained

S103, determining an initial cluster of each face image according to each similarity vector;

in the embodiment of the invention, firstly, the distance between each image pair in the face image set to be clustered is determined according to each similarity vector, the face image pairs with the distance smaller than or equal to a preset distance threshold are classified into the same cluster, and each initial cluster is obtained, wherein the preset distance threshold can be set according to experience or specific conditions, and the specific value of the preset distance threshold in the embodiment of the invention is not takenLimiting. In image I _a And image I _b For example, assume image I _a And image I _b Distance d of (2) _ab The preset distance threshold is t, when d _ab When t is less than or equal to t, judging the image I _a And image I _b For the same cluster, when d _ab At > t, image I is determined _a And image I _b Not the same cluster, so that each image forms an initial cluster c _j ＝{I _j ,I _a ,I _b ,I _c ,...}，j＝1,2,3,...n

S104, merging the initial clusters with the communication relation in the initial clusters to obtain a target cluster.

In the embodiment of the invention, the initial cluster c is aimed at _i If c _a ＝{I _a ,I _b ,I _c ,I _d .., and c } _e ＝{I _e ,I _a ,I _b ,I _f ,...},c _a ∩c _e Not equal to phi, consider c _a Class c _e Class communication, combining the two classes into a combined cluster c= { I _a ,I _b ,I _c ,I _d ,I _e ,I _f ...}. Subclass c to be merged according to the connected subgraph rule _i Sequentially merging to obtain a plurality of merged clusters, continuously merging merged vectors with intersections in all merged clusters until no intersection exists in all merged clusters to obtain target clusters, and finally forming a target cluster C with k target clusters _h H=1, 2,3,..k, k, clustering is completed.

In the embodiment of the present invention, determining the distance between each image pair in the face image set to be clustered according to each similarity vector includes:

s201, transmitting each similarity vector to a preset neural network model for calculation to obtain a first probability and a second probability;

in the embodiment of the invention, a preset neural network model is pre-built, wherein the input of the preset neural network model is each vector in similarity vectors, and the output is a first probability P ₁ And a second probability P ₂ Wherein the first probability P ₁ Representing the probability that two images are not of the same class, the first probability P ₁ The larger indicates that the two images are less likely to be of the same class; second probability P ₂ Representing the probability that two images are of the same class, the second probability P ₂ The larger the two images are, the more likely they are to be of the same category; p (P) ₁ ,P ₂ ＝H(S _ab ) Representing a preset neural network model H to S _ab For input, two similarity values P of output are obtained ₁ And P ₂ 。

The preset neural network model is shown in fig. 3, and the implementation process is as follows:

assuming that the input of the preset neural network model is an image pair I _a And image I _b Vector formed by similarity on different featuresThe expected network output is a first probability P ₁ And a second probability P ₂ . The neural network parameters learned through training are layer 1 network weights W ⁽¹⁾ Bias term B ⁽¹⁾ Layer two network weights W ⁽²⁾ And bias term B ⁽²⁾ ，

Wherein,

the activation function relu (x) =max (0, x). Let z= [ Z ] ₁ ,z ₂ ,z ₃ ...z _m ] ^T ，A＝[a ₁ ,a ₂ ,a ₃ ,...,a _m ] ^T Then:

Z＝W ⁽¹⁾ S _ab +B ⁽¹⁾ (4)

A＝[a ₁ ,a ₂ ,a ₃ ,...,a _m ] ^T ＝relu(Z)＝[relu(z ₁ ),relu(z ₂ ),relu(z ₃ ),...,relu(z _m )] ^T (5)

[P ₁ ,P ₂ ] ^T ＝W ⁽²⁾ A+B ⁽²⁾ (6)

thus, the output first probability p of the preset neural network model can be calculated ₁ And a second probability p ₂ 。

S202, calculating the target similarity of each face image pair according to each first probability and each second probability;

in the embodiment of the present invention, the first probability P is output due to the preset neural network model ₁ And a second probability P ₂ Is not strictly in [0,1 ]]In the method, for each first probability and the corresponding second probability, the similarity of the corresponding face image pair is calculated, and a sigmoid function is adopted to calculate the first probability P ₁ And a second probability P ₂ All transition into the (0, 1) interval and remain monotonically increasing, the transition proceeds as follows:

purpose of equation (7)Is to make the first probability P ₁ And a second probability P ₂ The unified value is convenient for determining a preset distance threshold value when clustering.

S203, determining the distance of each image pair according to the similarity of each target.

In the embodiment of the invention, the distance between two face images is calculated by the following formula:

in the embodiment of the invention, the clustering method utilizes the neural network to dig out the association of various face features on the similarity level, so that the complementarity between different features can be effectively utilized, and the defect of insufficient face description capability of a single feature can be overcome. Meanwhile, the clustering precision is further improved by fusion of various features.

Based on the above-mentioned face image clustering method based on multi-feature fusion, the embodiment of the invention also provides a face image clustering device based on multi-feature fusion, and the structural block diagram of the clustering device is shown in fig. 4, and the clustering device comprises:

an acquisition module 301, a calculation module 302, a determination module 303 and a combination module 304.

Wherein,

the acquiring module 301 is configured to acquire a face feature set of each face image in the set of face images to be clustered, where the face feature set includes feature vectors extracted by each feature extractor;

the computing module 302 is configured to compute a similarity vector of each face image pair in the face image set to be clustered according to each face feature set;

the determining module 303 is configured to determine an initial cluster of each face image according to each similarity vector;

the merging module 304 is configured to merge the initial clusters having the communication relationship in each initial cluster to obtain a target cluster.

The invention discloses a face image clustering device based on multi-feature fusion, which comprises: acquiring a face feature set of each face image in a face image set to be clustered, wherein the face feature set comprises feature vectors extracted by each feature extractor; calculating the similarity vector of each face image pair in the face image set to be clustered according to each face feature set; determining an initial cluster of each face image according to each similarity vector; and merging the initial clusters with the communication relation in each initial cluster to obtain a target cluster. In the clustering device, the similarity vector of each image pair in the face image set to be clustered is calculated for each face image face feature set, clustering is carried out according to the similarity vector, clustering results are combined to obtain target clustering, a plurality of feature vectors are adopted for calculation, and compared with clustering carried out by single face features, clustering accuracy is improved.

In the embodiment of the present invention, the determining module 303 includes:

a determining unit 305 and a dividing unit 306.

Wherein,

the determining unit 305 is configured to determine, according to the respective similarity vectors, a distance between each image pair in the face image set to be clustered;

the dividing unit 306 is configured to divide the face image pairs with the distance less than or equal to a preset distance threshold into the same clusters, and obtain each initial cluster.

In the embodiment of the present invention, the determining unit 305 includes:

a first calculation subunit 307, a second calculation subunit 308, and a determination subunit 309.

Wherein,

the first calculating subunit 307 is configured to transmit each similarity vector to a preset neural network model for calculation, so as to obtain a first probability and a second probability;

the second calculating subunit 308 is configured to calculate the target similarity of each face image pair according to each first probability and each second probability;

the determining subunit 309 is configured to determine a distance between the respective image pairs according to the respective object similarities.

In the embodiment of the present invention, the merging module 304 includes:

a first merging unit 310 and a second merging unit 311.

Wherein,

the first merging unit 310 is configured to merge initial clusters with intersections in each initial cluster to obtain each merged cluster;

the second merging unit 311 is configured to continue merging the merged vectors with the intersections in the merged clusters until no intersection exists in the merged clusters, so as to obtain a target cluster.

In an embodiment of the present invention, each feature extractor in the clustering device includes: a face feature extractor and a body feature extractor.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

The above describes in detail a face image clustering method and device based on multi-feature fusion, and specific examples are applied to illustrate the principle and implementation of the invention, and the above description of the examples is only used to help understand the method and core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A face image clustering method based on multi-feature fusion is characterized by comprising the following steps:

acquiring a face feature set of each face image in a face image set to be clustered, wherein the face feature set comprises feature vectors extracted by each feature extractor, and each feature extractor comprises: a face feature extractor and a body feature extractor;

wherein, the determining the initial cluster of each face image according to each similarity vector comprises: transmitting the similarity vector of each face image pair to a preset neural network model for calculation to obtain a first probability and a second probability; calculating the target similarity of each face image pair according to each first probability and each second probability; determining the distance of each face image pair according to the similarity of each target; classifying the face image pairs with the distance smaller than or equal to a preset distance threshold into the same cluster to obtain an initial cluster of each face image;

the calculation process of the first probability and the second probability comprises the following steps: obtaining a first intermediate parameter according to a first layer network weight, a first layer bias item and the similarity vector in parameters of the neural network model; substituting the first intermediate parameter into an activation function to obtain a second intermediate parameter; obtaining the first probability and the second probability according to a second-layer network weight, a second-layer bias term and the second intermediate parameter in parameters of the neural network model;

the calculating process for calculating the target similarity of each face image pair comprises the following steps: substituting each first probability and each second probability into a formula to obtain the target similarity of each face image pair, wherein the formula is as follows:

wherein,for the target similarity, p ₁ For the first probability, p ₂ Is the second probability;

2. The method of claim 1, wherein merging the initial clusters having a communication relationship among the initial clusters to obtain the target cluster comprises:

3. The utility model provides a facial image clustering device based on multi-feature fusion which characterized in that includes:

the device comprises an acquisition module, a clustering module and a clustering module, wherein the acquisition module is used for acquiring a face feature set of each face image in a face image set to be clustered, the face feature set comprises feature vectors extracted by each feature extractor, and each feature extractor comprises: a face feature extractor and a body feature extractor;

wherein the determining module comprises: the determining unit is used for determining the distance between each image pair in the face image set to be clustered according to each similarity vector; the dividing unit is used for classifying the face image pairs with the distance smaller than or equal to a preset distance threshold value into the same cluster to obtain each initial cluster; wherein the determining unit includes: the first calculation subunit is used for transmitting each similarity vector to a preset neural network model for calculation to obtain a first probability and a second probability; a second calculating subunit, configured to calculate, according to each first probability and each second probability, a target similarity of each face image pair; a determining subunit, configured to determine a distance between each image pair according to each target similarity;

wherein the first computing subunit is specifically configured to: obtaining a first intermediate parameter according to a first layer network weight, a first layer bias item and the similarity vector in parameters of the neural network model; substituting the first intermediate parameter into an activation function to obtain a second intermediate parameter; obtaining the first probability and the second probability according to a second-layer network weight, a second-layer bias term and the second intermediate parameter in parameters of the neural network model; the second computing subunit is specifically configured to: substituting each first probability and each second probability into a formula to obtain the target similarity of each face image pair, wherein the formula is as follows:

4. The apparatus of claim 3, wherein the combining module comprises: