CN111626091B

CN111626091B - Face image labeling method and device and computer readable storage medium

Info

Publication number: CN111626091B
Application number: CN202010155962.XA
Authority: CN
Inventors: 程星星
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2023-09-22
Anticipated expiration: 2040-03-09
Also published as: CN111626091A

Abstract

The embodiment of the application relates to the field of computer machine learning, and discloses a face image labeling method, a device and a computer-readable storage medium, wherein the face image labeling method comprises the following steps: acquiring a plurality of face area images of an original image of a person; extracting features of a plurality of face area images to obtain a plurality of face feature vectors used for representing the identities of people, wherein one face area image corresponds to one face feature vector; performing feature clustering on the plurality of face feature vectors to obtain a category of each face feature vector in the plurality of face feature vectors, wherein the category comprises a positive category and a negative category; and labeling the face region image corresponding to the face feature vector belonging to the positive class. The face image labeling method, the face image labeling device and the computer readable storage medium can improve the labeling efficiency of the image, ensure the labeling accuracy and reduce the labor cost of image labeling.

Description

Face image labeling method and device and computer readable storage medium

Technical Field

The embodiment of the application relates to the field of computer machine learning, in particular to a face image labeling method, a face image labeling device and a computer readable storage medium.

Background

In large-scale face recognition application, to ensure high recognition accuracy, face images of the same person under various ages, angles, illumination and contrast are accurately recognized, a large amount of data cleaning and labeling work is required in an application development stage, and hundreds of standard face images (112 x 112) are required to be prepared for one person. For the collection of standard face data, the existing solution mainly comprises the steps of crawling a large number of public pictures from the Internet through a crawler tool, then cutting out all detected face images in batches by using a face detection algorithm, and then completing image screening work by a professional data labeling team or a data labeling crowdsourcing platform. Taking 200 images of a person as an example, assuming that 5 people exist on each image, 1000 face images with the size of 112 x 112 can be cut out in the detection stage. Of the 1000 images, at least 800 images are invalid and need to be manually deleted by manual labeling.

The inventor finds that at least the following problems exist in the prior art: manual deleting is performed through manual annotating, so that the manual face recognition method is high in labor cost, low in labeling efficiency, and insufficient in supporting rapid deployment of large-scale face recognition application, and labeling quality cannot be effectively guaranteed.

Disclosure of Invention

The embodiment of the application aims to provide a face image labeling method, a face image labeling device and a computer readable storage medium, which can improve the labeling efficiency of images, ensure the labeling accuracy and reduce the labor cost of image labeling.

In order to solve the above technical problems, an embodiment of the present application provides a face image labeling method, including:

acquiring a plurality of face area images of an original image of a person; extracting features of the face region images to obtain a plurality of face feature vectors used for representing the identities of the people, wherein one face region image corresponds to one face feature vector; performing feature clustering on the face feature vectors to obtain a category of each face feature vector in the face feature vectors, wherein the category comprises a positive category used for representing that the person identity corresponding to the face feature vector is a target person and a negative category used for representing that the person identity corresponding to the face feature vector is a non-target person; and labeling the face region image corresponding to the face feature vector belonging to the positive class.

The embodiment of the application also provides a device for labeling the face image, which comprises the following steps: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the facial image labeling method described above.

The embodiment of the application also provides a computer readable storage medium which stores a computer program, and the computer program realizes the face image labeling method when being executed by a processor.

Compared with the prior art, the embodiment of the application obtains a plurality of face feature vectors used for representing the identity of the person by extracting the features of the face region images, namely, the face region images which are difficult to be applied to calculation are digitized, so that the subsequent steps can be smoothly carried out; by carrying out feature clustering on the face feature vectors to obtain the category of each face feature vector in the face feature vectors, whether the person identity corresponding to the face feature vector is a target person or not can be judged according to a clustering result, and therefore noise data and effective data in a face image can be rapidly and accurately identified; finally, labeling face region images corresponding to the face feature vectors belonging to the positive class is completed, so that labeling efficiency is improved, labeling accuracy can be effectively guaranteed, time cost of manual labeling is reduced, labor cost is reduced, and support is provided for rapid construction of large-scale face recognition application.

In addition, before the labeling of the face region image corresponding to the face feature vector belonging to the positive class, the method further comprises: deleting the face feature vectors belonging to the negative class, carrying out the feature clustering on the face feature vectors belonging to the positive class again, and judging whether the face feature vectors belonging to the negative class exist in the face feature vectors subjected to the feature clustering again; if the face feature vector exists, repeating the steps until the face feature vector subjected to the feature clustering again does not exist the face feature vector belonging to the negative class.

In addition, the feature clustering of the face feature vectors specifically includes: taking each face feature vector in N face feature vectors as a clustering center, and calculating the measurement distance from the other N-1 face feature vectors in the N face feature vectors to the clustering center when the ith face feature vector is taken as the clustering center, wherein N is an integer greater than 1, and i is an integer less than or equal to N; judging whether the measurement distances are smaller than a preset threshold value in a preset number or not; if the face feature vector does not exist, judging that the ith face feature vector belongs to the negative class; if so, judging whether the number of measurement distances smaller than a preset threshold is larger than a preset number, and if so, judging that the ith face feature vector belongs to the positive class; and if the face feature vector is smaller than the negative type, judging that the ith face feature vector belongs to the negative type.

In addition, before each face feature vector of the N face feature vectors is used as a cluster center, the method further includes: setting the size of a sliding window and the sliding step length; each face feature vector in the N face feature vectors is used as a clustering center, and the method specifically includes: establishing a plurality of sliding windows according to the size of the sliding window, the sliding step length and the N face feature vectors, wherein the number of the face feature vectors in each sliding window is equal to the size of the sliding window; and taking each face feature vector in each sliding window as the clustering center in turn.

In addition, before establishing a plurality of sliding windows according to the sliding window size, the sliding step length and the N face feature vectors, the method further includes: and carrying out randomization processing on the N face feature vectors.

In addition, the feature extraction is performed on the face region images to obtain a plurality of face feature vectors for representing the identity of the person, which specifically includes: and sequentially inputting the face region images into a preset neural network model to obtain the face feature vector.

In addition, the preset neural network model comprises a first-level neural network and a second-level neural network; the face feature vector is calculated by the following method: inputting the face region image into the first-stage neural network to obtain an initial vector; and inputting the initial vector into the second-stage neural network, and training the initial vector through a weight vector and preset characteristic parameters in the second-stage neural network to obtain the face characteristic vector, wherein the characteristic parameters are constants larger than 0.

In addition, before extracting the features of the face area images, the method further comprises: preprocessing the face image data of the face area images to obtain face area images with resolution meeting preset requirements; the feature extraction of the face region images specifically includes: and extracting the characteristics of the facial area image with the resolution meeting the preset requirement.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

Fig. 1 is a flowchart of a face image labeling method according to a first embodiment of the present application;

fig. 2 is a flowchart of MTCNN face detection provided according to a first embodiment of the present application;

fig. 3 is a flowchart of face region image feature extraction provided according to a first embodiment of the present application;

fig. 4 is a schematic diagram of face identification according to a first embodiment of the present application;

fig. 5 is a flowchart of a face image labeling method according to a second embodiment of the present application;

fig. 6 is a flowchart of a face image labeling method according to a third embodiment of the present application;

FIG. 7 is a clustering schematic diagram of a K-nearest neighbor algorithm provided in accordance with a third embodiment of the present application;

fig. 8 is a schematic structural diagram of a facial image labeling apparatus according to a fourth embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".

In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.

The first embodiment of the application relates to a face image labeling method, and the specific flow is shown in fig. 1, comprising the following steps:

s101: a plurality of face area images of an original image of a person are acquired.

Specifically, in this embodiment, the crawler tool crawls the original images of the person (such as the actor's theatre, the photo, and the work, etc.), where the original images of the person include face images, the face images in the original images of the person may be one or more, and the face images in the original images of the person may belong to the same identity or may belong to different identities.

It should be noted that, as shown in fig. 2, in this embodiment, a cascade face detection algorithm MTCNN (Multi-Task Cascaded Convolutional Networks) based on a Multi-level neural network is used to detect a face region image from an original image of a person, and for convenience of understanding, the following details of MTCNN are described below:

the P-Net network predicts a binding box (boundary box) of a face region in an original image of a person, cuts and scales the image in the binding box region to a size of 24 x 24 and inputs the image to the R-Net network, and generates a corrected binding box. And cutting and scaling an image in a binding box area generated by the R-Net network to 48 x 48 size, inputting the image into the O-Net network, and generating corrected binding box coordinates and facial feature position coordinates and a probability value that the binding box area contains a human face. The main steps of the MTCNN face detection are as follows:

(1) Judging whether a face area exists in an original image of a person:

for whether a face exists in a region is a classification problem, the logistic regression loss function is used for evaluation:

wherein,,real face region probability, which is a model training sample, < ->p _i Face probability, p, which is model prediction _i ∈[0，1]；/>Representing probability value->And probability value p _i The greater the offset, the +.>The larger.

(2) And judging whether the face area is accurate.

Specifically, whether the face region position is accurate or not is judged through the following formula:

wherein (1)>Real face region coordinates, which are model training samples, < +.>Face region coordinates predicted by the model, +.>And->Real coordinates +.A square measure of Euclidean distance is used, defined by the starting vertex coordinates of the corresponding region and the width and height of the region>And predictive coordinates->Is not limited by the degree of offset of (a).

(3) Judging whether the facial features coordinate position is accurate.

Specifically, whether the facial feature coordinate position is accurate is judged through the following formula:

wherein (1)>Facial five-sense organ coordinates which are model training samples, < ->Is the facial five sense organs coordinate of model prediction, uses the square measurement real coordinate of Euclidean distance +.>And predictive coordinates->Is not limited by the degree of offset of (a). And extracting the position of the face image on the image and the position of the facial features in the processing procedure.

Preferably, after the face region is detected, face correction may also be performed on the detected face region in the present embodiment. Specifically, face correction is also called face alignment, that is, the face head portrait is rotated to a horizontal position in a unified manner. Based on the face region detected in the previous step and the coordinate position of the facial features (left eye, right eye, nose, left mouth corner, right mouth corner), affine transformation is performed on the face image, the face image after transformation is in a horizontal state, that is, the line connecting the two eyes is kept horizontal, and the corrected image is scaled to 112 x 112 size.

More preferably, since the minimum size of the face image detected by MTCNN is 12 pixels, scaling the 12×12 image to 112×112 will generate serious distortion, and the distorted image is not beneficial to face recognition accuracy. Therefore, the present embodiment further performs face image data preprocessing on the face area images to obtain a face area image with a resolution meeting a preset requirement. Specifically, in this embodiment, the image may be filtered according to the size of 4×1024 (i.e., 4 kb) of the image bytes, and the image with low resolution may be removed (i.e., the image with the size of less than 4kb may be removed), which is not limited to the criterion of the resolution of the image, and the image with the size of less than 4kb may be removed, or the image with the size of less than 5kb or less than 6kb may be removed, which may achieve the same technical effects. By the method, the distorted image can be removed before the feature extraction of the face region image, so that the workload of the subsequent steps is reduced, and the working efficiency of the face image labeling method is further improved.

S102: and extracting features of the face region images to obtain face feature vectors for representing the identities of the people.

Specifically, in this embodiment, the face feature vector may be obtained as follows: inputting the face region image into the first-stage neural network to obtain an initial vector; and inputting the initial vector into the second-stage neural network, and training the initial vector through a weight vector and preset characteristic parameters in the second-stage neural network to obtain the face characteristic vector, wherein the characteristic parameters are constants larger than 0.

For ease of understanding, the following describes in detail how the second level neural network trains the initial vector:

assuming that the initial vector is a 512-dimensional vector, x is as shown in FIG. 3 _i Is a 512-dimensional eigenvector output by using a convolutional neural network, w _j Is a weight vector. Training w by iteration _j And x _i Reduce w _j And x _i The included angle theta between the vectors is increased, so that the cosine value cos theta is increased to increase w _j x _i The purpose of the vector product is to make the weight vector w _j The represented person identities obtain higher prediction probability, and a parameter m is additionally added in the training process, so that the discrimination of the algorithm on different person identities is increased.

Specifically, w _j Is a randomly generated set of vector values that are used to determine the identity of the person inputting the face feature vector of the ArcFace algorithm. For example, feature vectors x are input separately ₁ And x ₂ If the two vectors are both equal to w in the weight vector group ₁ Infinitely close, x can be determined ₁ And x ₂ Is the identity of the same person; if a larger separation boundary occurs, it may be determined that the person belongs to a different person identity.

As shown in fig. 4, it is further illustrated that the face features extracted by the ArcFace algorithm have the characteristics of high cohesion (belonging to the same identity) and large separation boundaries (belonging to different identities). Vector x represents the feature vector of a face image, w ₁ And w ₂ Respectively the weight vector after the ArcFace algorithm is trained, the vector x and the weight w ₁ Is included angle theta ₁ Vector x and weight w ₁ Is included angle theta ₂ ，θ ₁ <θ ₂ . The person identity probability calculation process to which the feature vector x belongs is as follows:

w ₁ x＝||w ₁ ||||x||cos(θ ₁ )；w ₂ x＝||w ₂ ||||x||cos(θ ₂ )；||w ₁ ||||x||cos(θ ₁ +m)>||w ₂ ||||x||cos(θ ₂ )；

||w ₁ ||||x||cos(θ ₁ )>||w ₂ ||||x||cos(θ ₂ )。

wherein w is ₁ x and w ₂ x represents the probability that the feature vector x belongs to two person identities respectively, and a blank area in the figure represents the improvement of the distinguishing performance of the person identity by the additionally added parameter m. The algorithm is implemented by means of cosine function cosine at [0, pi ]]The monotonically decreasing characteristic in the interval is that a non-negative m parameter is additionally added during training, so that the separation boundary between faces with different identities is increased. The feature vectors extracted by the ArcFace algorithm have higher cohesiveness among the features of the same identity, and have larger separation boundaries among the features of different identities.

S103: and carrying out feature clustering on the plurality of face feature vectors to obtain the category of each face feature vector in the plurality of face feature vectors.

Specifically, the categories include a positive category for representing that the person identity corresponding to the face feature vector is a target person, and a negative category for representing that the person identity corresponding to the face feature vector is a non-target person. In the embodiment, a feature clustering method based on a sliding window can be adopted, the average intra-class distance of the face feature vector is gradually reduced by adjusting the sliding step length and the window size, and the accuracy of a clustering result is gradually improved.

S104: and labeling the face region image corresponding to the face feature vector belonging to the positive class.

Specifically, the embodiment adopts the deep neural network to extract the face feature vector, adopts the statistical learning method to perform feature clustering in the sliding window according to the principle that the feature vector of the face with the same identity has higher similarity, and the clustering result identifies the face image which does not belong to the same identity, so that the face image which belongs to a certain identity can be rapidly and accurately screened out from a large number of face images, and other noise data are cleared.

The second embodiment of the application relates to a face image labeling method, which is further improved based on the first embodiment, and is specifically improved in that: in the second embodiment, the face feature vector belonging to the negative class is deleted, and whether the face feature vector belonging to the negative class exists in the face feature vector is judged for multiple times until the face feature vector belonging to the negative class does not exist in the finally obtained face feature vector, so that the accuracy of the standard can be further improved, and the labeling quality is ensured.

The specific flow of this embodiment is shown in fig. 5, and includes:

s201: a plurality of face area images of an original image of a person are acquired.

S202: and extracting features of the face region images to obtain face feature vectors for representing the identities of the people.

S203: and carrying out feature clustering on the plurality of face feature vectors to obtain the category of each face feature vector in the plurality of face feature vectors.

S204: and deleting the face feature vectors belonging to the negative class, and carrying out feature clustering on the face feature vectors belonging to the positive class again.

S205: judging whether the face feature vectors belonging to the negative class exist in the face feature vectors subjected to the feature clustering again, if so, executing step S204; if not, step S206 is performed.

Specifically, in this embodiment, when the face feature vector is judged to be the negative face feature vector for the first time, the feature clustering may be performed again on the face feature vector judged this time, and whether the face feature vector is the negative face feature vector may be judged again, and the process may be repeated multiple times until the face feature vector is not the negative face feature vector in the face feature vector as a result of the multiple judgments. By the method, the accuracy of the face image labeling method can be further improved.

S206: and labeling the face region image corresponding to the face feature vector belonging to the positive class.

A third embodiment of the present application relates to a face image labeling method, and this embodiment is an illustration of the first embodiment, specifically explaining: in the first embodiment, feature clustering is performed on the face feature vectors to obtain a category of each face feature vector in the face feature vectors.

Specifically, as shown in fig. 6, the present embodiment includes steps S301 to S310, wherein steps S301 to S302 are substantially the same as steps S101 to S102 in the first embodiment, and are not described herein. The differences are mainly described below:

step S301 to step S302 are performed.

S303: and respectively taking the ith face feature vector in the N face feature vectors as a clustering center.

S304: and calculating the measurement distance from the other N-1 face feature vectors in the N face feature vectors to the clustering center.

S305: judging whether the measurement distance is smaller than a preset threshold value or not, and if so, executing step S306; if the face feature vector does not exist, the ith face feature vector is judged to belong to the negative class.

S306: judging whether the number of measurement distances smaller than a preset threshold is larger than the preset number, if so, judging that the ith face feature vector belongs to a positive class; if not, judging that the ith face feature vector belongs to the negative class; judging whether i is smaller than N, if i is smaller than N, making i=i+1, executing step S303; otherwise, the flow ends.

It should be noted that, because the feature clustering is directly performed on the N face feature vectors, there may be a situation that the face feature vectors corresponding to a plurality of continuous non-target person images affect the clustering result, so that the accuracy of the face image labeling method is not high. According to the feature clustering method based on the sliding window, the average intra-class distance of the face feature vector is gradually reduced by adjusting the sliding step length and the window size, and the accuracy of the clustering result is gradually improved.

That is, before the step of using each of the N face feature vectors as a cluster center, the method further includes: setting the size of a sliding window and the sliding step length; each face feature vector in the N face feature vectors is used as a clustering center, and the method specifically includes: establishing a plurality of sliding windows according to the size of the sliding window, the sliding step length and the N face feature vectors, wherein the number of the face feature vectors in each sliding window is equal to the size of the sliding window; and taking each face feature vector in each sliding window as the clustering center in turn.

For easy understanding, the following describes face feature clustering based on sliding window in this embodiment in detail:

first, the principle of feature clustering in the present embodiment will be briefly described: the clustering process adopts a sliding window to carry out local clustering in the window, and each face feature f is respectively adopted in the sliding window _i ∈{f ₁ ,f ₂ ,...,f _k The vector is used as a cluster center, and other characteristic vectors f in the window are calculated _j And cluster center f _i And judging the clustering center f according to the threshold K _i Belonging to a category (positive or negative).

The clustering principle of the K nearest neighbor algorithm is shown in fig. 7, each triangle and each square in the graph represent a feature vector, the triangles and the squares respectively represent the categories to which the feature vector belongs, and under the condition of given measurement distance and threshold K, the number of triangles similar to the circular feature is large, so that the circular feature is identified as the triangle category.

The core three-element distance measurement, K value and classification decision rule of the K nearest neighbor algorithm adopted in the embodiment are set as follows:

(1) Distance measurement

The distance measurement adopts L in the proposal ₂ The norm, i.e. euclidean distance (euclidean distance), is measured and the euclidean distance between feature vectors is expressed as follows:

(2) Selection of K values

The corresponding relation between the threshold K and the sliding window size and the sliding step length is as follows:

TABLE 1

(3) Classification decision rule

If a cluster center f _i ∈{f ₁ ,f ₂ ,...,f _k Clustering results of the feature vectors under the set sliding window size, sliding step length and threshold K, if the clustering results are equal to f _i If the number of similar features is smaller than the threshold value K, clustering the cluster center f _i Marking as negative, i.e. noise data; if the clustering result is equal to f _i If the number of similar features is greater than the threshold value K, clustering the cluster center f _i The tag is a positive class, i.e., valid data.

Based on the above principle, the face feature clustering steps of the sliding window in this embodiment can be obtained as follows:

1. all face image lists belonging to the same identity are randomized, and influence of continuously appearing noise images on a clustering result is reduced.

2. And performing sliding window calculation according to the sliding step size window size.

3. The feature clustering in the sliding window can be divided into the following sub-steps:

step A: each image in the vectorization window is a 512-dimensional feature vector.

And (B) step (B): and carrying out feature clustering based on a minimum distance principle by using a K nearest neighbor algorithm, circularly calculating the measurement distance between each feature vector and the rest feature vectors, and taking top N of the arrangement from small to large as a calculation result. The clustering calculation process is as follows:

1) Randomly selecting a feature vector as a clustering center, and marking the feature vector as f _i ∈{f ₁ ,f ₂ ,...,f _k Setting f _i Belonging to the P category.

2) Calculate the next feature vector f _j F to _i Measure distance dist, if dist<0.95, f _j Fall into P category, otherwise f _i Belonging to the P category, f _j Belonging to the N category.

3) Sequentially combining each feature vector f _i ∈{f ₁ ,f ₂ ,...,f _k Using the cluster center to calculate the rest characteristic vector f _j To each center f _i This step is repeated until all cluster centers are traversed.

4) For each cluster center f _i ∈{f ₁ ,f ₂ ,...,f _k Clustering results of the clustering, and according to a clustering classification rule, if the size of the P class set is smaller than a threshold K, clustering the clustering center f _i Judging as negative class, otherwise, clustering the center f _i And judging as positive class. Positive class is effective data, and negative class is noise data.

Step C: and C, calculating a round of sliding window to represent one round of feature clustering iteration, repeating the step A and the step B until the iteration is finished, wherein the iteration convergence condition is that a set which is judged to be negative is an empty set in the clustering results of 3 rounds of iteration.

4. And (3) sequentially selecting different sliding step sizes, sliding window sizes and clustering threshold K, carrying out feature clustering according to the steps 1, 2 and 3, deleting all noise data identified as negative types based on the classification decision result of the step 3, and retaining the effective data identified as positive types.

The clustering process pseudo-code based on sliding windows is described as follows:

A fourth embodiment of the present application relates to a facial image labeling apparatus, as shown in fig. 8, including:

at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the face image labeling method.

Where the memory 402 and the processor 401 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 is transmitted over a wireless medium via an antenna, which further receives and transmits the data to the processor 401.

The processor 401 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by processor 401 in performing operations.

A fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims

1. The face image labeling method is characterized by comprising the following steps of:

acquiring a plurality of face area images of an original image of a person;

extracting features of the face region images to obtain a plurality of face feature vectors used for representing the identities of the people, wherein one face region image corresponds to one face feature vector;

performing feature clustering on the face feature vectors to obtain a category of each face feature vector in the face feature vectors, wherein the category comprises a positive category used for representing that the person identity corresponding to the face feature vector is a target person and a negative category used for representing that the person identity corresponding to the face feature vector is a non-target person;

labeling a face region image corresponding to the face feature vector belonging to the positive class;

the feature extraction is performed on the face region images to obtain face feature vectors for representing the identity of the person, and the feature extraction specifically comprises the following steps:

sequentially inputting a plurality of face area images into a preset neural network model to obtain the face feature vector;

the preset neural network model comprises a first-level neural network and a second-level neural network; the face feature vector is calculated by the following method:

inputting the face region image into the first-stage neural network to obtain an initial vector;

inputting the initial vector into the second-stage neural network, training the initial vector through a weight vector and preset characteristic parameters in the second-stage neural network to obtain the face characteristic vector, wherein the characteristic parameters are constants larger than 0, the first-stage neural network and the second-stage neural network are convolutional neural networks, and the second-stage neural network extracts the face characteristic vector through an ArcFace algorithm.

2. The face image labeling method according to claim 1, further comprising, before the labeling of the face region image corresponding to the face feature vector belonging to the positive class:

deleting the face feature vectors belonging to the negative class, carrying out the feature clustering on the face feature vectors belonging to the positive class again, and judging whether the face feature vectors belonging to the negative class exist in the face feature vectors subjected to the feature clustering again;

if the face feature vector exists, repeating the steps until the face feature vector subjected to the feature clustering again does not exist the face feature vector belonging to the negative class.

3. The method for labeling a face image according to claim 1 or 2, wherein the feature clustering is performed on a plurality of face feature vectors, and specifically includes:

taking each face feature vector in N face feature vectors as a clustering center, and calculating the measurement distance from the other N-1 face feature vectors in the N face feature vectors to the clustering center when the ith face feature vector is taken as the clustering center, wherein N is an integer greater than 1, and i is an integer less than or equal to N;

judging whether a measurement distance smaller than a preset threshold exists in the measurement distances;

if the face feature vector does not exist, judging that the ith face feature vector belongs to the negative class;

if yes, judging whether the number of the measurement distances smaller than a preset threshold is larger than or equal to the preset number, and if yes, judging that the ith face feature vector belongs to the positive class; if not, judging that the ith face feature vector belongs to the negative class.

4. A face image labeling method according to claim 3, further comprising, before said respectively taking each of the N face feature vectors as a cluster center:

setting the size of a sliding window and the sliding step length;

each face feature vector in the N face feature vectors is used as a clustering center, and the method specifically includes:

establishing a plurality of sliding windows according to the size of the sliding window, the sliding step length and the N face feature vectors, wherein the number of the face feature vectors in each sliding window is equal to the size of the sliding window;

and taking each face feature vector in each sliding window as the clustering center in turn.

5. The face image labeling method of claim 4, further comprising, prior to establishing a plurality of sliding windows based on the sliding window size, the sliding step size, and the N face feature vectors:

and carrying out randomization processing on the N face feature vectors.

6. The face image labeling method according to claim 1, further comprising, before feature extraction is performed on a plurality of the face region images:

preprocessing the face image data of the face area images to obtain face area images with resolution meeting preset requirements;

the feature extraction of the face region images specifically includes:

and extracting the characteristics of the facial area image with the resolution meeting the preset requirement.

7. A facial image labeling device, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the face image labeling method of any of claims 1-6.

8. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the face image labeling method of any one of claims 1 to 6.