CN113947140A

CN113947140A - Training method of face feature extraction model and face feature extraction method

Info

Publication number: CN113947140A
Application number: CN202111193958.3A
Authority: CN
Inventors: 彭楠; 李弼; 希滕; 张刚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2022-01-18

Abstract

The disclosure provides a training method and device for a face feature extraction model, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as face image processing, face recognition and the like. The specific implementation scheme is as follows: the method comprises the steps of obtaining a sample image including a human face, inputting the sample image into a human face feature extraction model, obtaining a covariance matrix corresponding to human face features, obtaining a first loss function according to the covariance matrix, carrying out classification and identification according to the human face features, obtaining a second loss function according to classification and identification results, adjusting model parameters of the human face feature extraction model according to the first loss function and the second loss function, avoiding under-fitting risks and over-fitting risks caused by the fact that different sample images are collected under different scenes to carry out model training, and improving the training effect of the human face feature extraction model.

Description

Training method of face feature extraction model and face feature extraction method

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, and in particular, to a training method and apparatus for a face feature extraction model, an electronic device, and a storage medium.

Background

In recent years, artificial intelligence technology has been widely applied in the scenes of machine vision, biological feature recognition, automatic planning, intelligent control, robotics, language image understanding and the like, and the informatization and intelligent experience level of users is improved.

At present, in the field of artificial intelligence, there are some problems that need to be improved continuously, such as how to improve the accuracy of biometric recognition, especially the accuracy of face recognition, and optimize the following model training process or recognition scheme.

However, for application scenarios such as biometric recognition, the related model training or recognition method is not ideal.

Disclosure of Invention

The disclosure provides a training method of a human face feature extraction model and a human face feature extraction method.

According to a first aspect, a training method for a face feature extraction model is provided, which includes: the method comprises the steps of obtaining a sample image including a human face, inputting the sample image into a human face feature extraction model, extracting human face features of the sample image through a backbone network in the human face feature extraction model, obtaining a covariance matrix corresponding to the human face features, obtaining a first loss function according to the covariance matrix, carrying out classification recognition according to the human face features, obtaining a second loss function according to a classification recognition result, adjusting model parameters of the human face feature extraction model according to the first loss function and the second loss function until a training end condition is met, and obtaining a target human face feature extraction model.

According to a second aspect, a face feature extraction method is provided, including: the method comprises the steps of obtaining an image to be extracted, inputting the image to be extracted into a target face feature extraction model, and outputting the target face feature of the image to be extracted by the target face feature extraction model, wherein the target face feature extraction model is a model trained by adopting the training method of the first aspect of the disclosure.

According to a third aspect, there is provided a training apparatus for a face feature extraction model, comprising: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample image comprising a human face; the extraction module is used for inputting the sample image into the face feature extraction model and extracting the face feature of the sample image by a main network in the face feature extraction model; the second acquisition module is used for acquiring a covariance matrix corresponding to the face features and acquiring a first loss function according to the covariance matrix; the third acquisition module is used for carrying out classification and identification according to the human face characteristics and acquiring a second loss function according to a classification and identification result; and the generating module is used for adjusting the model parameters of the face feature extraction model according to the first loss function and the second loss function until the training end condition is met to obtain the target face feature extraction model.

According to a fourth aspect, there is provided a face feature extraction device, comprising: the acquisition module is used for acquiring an image to be extracted; and the output module is used for inputting the image to be extracted into the target face feature extraction model and outputting the target face feature of the image to be extracted by the target face feature extraction model, wherein the target face feature extraction model is a model trained by adopting the training device according to the third aspect of the disclosure.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of training a face feature extraction model according to the first aspect of the present disclosure or the method of extracting a face feature according to the second aspect of the present disclosure.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the training method of the face feature extraction model according to the first aspect of the present disclosure or the face feature extraction method according to the second aspect of the present disclosure.

According to a seventh aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of training a face feature extraction model according to the first aspect of the disclosure or the method of face feature extraction according to the second aspect of the disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart of a training method of a face feature extraction model according to a first embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a training method of a face feature extraction model according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a covariance matrix and a corresponding identity matrix;

FIG. 4 is a flowchart illustrating a training method of a face feature extraction model according to a third embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a training method for a face feature extraction model;

fig. 6 is a schematic flow chart of a face feature extraction method according to a fourth embodiment of the present disclosure;

fig. 7 is a schematic flow chart of a face feature extraction method according to a fifth embodiment of the present disclosure;

FIG. 8 is a block diagram of a training apparatus for a face feature extraction model to implement the face feature extraction method of the disclosed embodiments;

fig. 9 is a block diagram of a face feature extraction device for implementing the face feature extraction method of the embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device for implementing a training method of a face feature extraction model or a face feature extraction method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.

Computer Vision (CV) is a science for researching how to make a machine "see", and further refers to using a camera and a Computer to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing image processing, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire 'information' from images or multidimensional data.

Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), and learns the intrinsic rules and representation levels of sample data, and information obtained in the Learning process is helpful for interpreting data such as text, images, and sound. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds. As for specific research content, the method mainly comprises a neural network system based on convolution operation, namely a convolution neural network; a multilayer neuron based self-coding neural network; and pre-training in a multilayer self-coding neural network mode, and further optimizing the deep confidence network of the neural network weight by combining the identification information. Deep learning has achieved many achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization technologies, and other related fields. The deep learning enables the machine to imitate human activities such as audio-visual and thinking, solves a plurality of complex pattern recognition problems, and makes great progress on the artificial intelligence related technology.

The following describes a training method of a face feature extraction model and a face feature extraction method according to an embodiment of the present disclosure with reference to the drawings.

Fig. 1 is a schematic flow chart of a training method of a face feature extraction model according to a first embodiment of the present disclosure.

As shown in fig. 1, the training method of the face feature extraction model according to the embodiment of the present disclosure may specifically include the following steps:

s101, obtaining a sample image comprising a human face.

Specifically, the execution subject of the training method for a facial feature extraction model in the embodiment of the present disclosure may be the training device for a facial feature extraction model provided in the embodiment of the present disclosure, and the training device for a facial feature extraction model may be a hardware device with data information processing capability and/or necessary software for driving the hardware device to work. Alternatively, the execution body may include a workstation, a server, a computer, a user terminal, and other devices. The user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like.

It should be noted that the face recognition technology is widely applied in many scenes, so that a large amount of different data from different scenes can be accumulated, and the data can be used for training various models, particularly for training a face feature extraction model. For example, under the scenes of airports, railway stations and the like, a large amount of training data collected in the witness comparison process can be generated; for another example, under the scenes of private domain management (such as villa) and the like, a large amount of training data collected in the process of access control inspection through face recognition can be generated; for another example, in a financial site, qualification testing, etc., a large amount of training data collected during a live person verification process can be generated.

Under general conditions, data collected under different landing scenes have different distribution characteristics, and the data of different scenes often have great difference under multiple dimensions, such as data quantity, race, age, ambient light, camera visual angle, picture resolution and other dimensions of the data.

Thus, if the model such as the face feature extraction model is trained by directly using the data acquired in different scenes, the model may eventually learn scene data information acquired in a scene with a large amount of data, and the face recognition accuracy of a scene with a small amount of data may be damaged. That is to say, the face feature extraction model obtained by training in this case cannot obtain the gain of the face recognition accuracy in all scenes.

In the related art, the extraction effect of the face feature extraction model is usually optimized by using two ways, one of which is to use an equalization scheme to process and obtain a sample image and perform model training based on the sample image, and the other one is to use an equalization weight mode to obtain the sample image and perform model training based on the sample image.

For processing and acquiring the sample image by adopting the equalization scheme, in the training process, sampling is performed by adopting a mode of sampling with a smaller probability for a scene with a larger data number, and sampling is performed by adopting a mode of sampling with a larger probability for a scene with a smaller data number.

The method for acquiring the sample image by adopting the balance weight means that in the training process, the sampling of the sample image is performed by multiplying a scene with a larger data number by a smaller weight factor when calculating the loss, and multiplying a scene with a smaller data number by a larger weight factor.

The two schemes aim to balance the extraction accuracy of the face feature extraction model obtained by training in each scene to a certain extent by balancing the data (sample image) distribution of each scene in the training stage. However, the face feature extraction model obtained by acquiring the sample images and training based on the two schemes still has the under-fitting risk and the over-fitting risk, wherein the under-fitting risk often exists for scenes with a large amount of data, and the over-fitting risk often exists for scenes with a small amount of data.

Based on this, the embodiment of the present disclosure provides a training method for a face feature extraction model, which applies unit matrix supervision to a covariance matrix of feature dimensions in a training process, so that the face features extracted by the model decouple acquired scene data information, thereby solving the problems of face recognition accuracy under all scenes and over-fitting or under-fitting under different data quantities.

In the embodiment of the present disclosure, a sample image including a human face is obtained, and the sample image, that is, any human face picture used for training, may be obtained in a variety of ways in a variety of scenes, which is not limited in the embodiment of the present disclosure. For example, images acquired in the process of witness comparison in the scene of an airport or a train station can be acquired; for another example, in a private domain management scene, images acquired in the entrance guard inspection process are performed; for another example, images collected during a human verification process are performed at a financial site, in a qualification testing scenario.

S102, inputting the sample image into a face feature extraction model, and extracting the face features of the sample image through a main network in the face feature extraction model.

Specifically, the sample image including the face acquired in step S101 is input into the face feature extraction model, and the face features of the sample image are extracted through the backbone network in the face feature extraction model. For example, the face sample image is used as the input of the Backbone network (Backbone) and the output is the corresponding face feature f_iWherein, the face feature refers to a d-dimensional vector (d is a face feature dimension).

S103, obtaining a covariance matrix corresponding to the face features, and obtaining a first loss function according to the covariance matrix.

Specifically, according to the face features of the sample image obtained in step S102, a corresponding covariance matrix is obtained, and a first loss function is obtained according to the covariance matrix.

The covariance matrix refers to a matrix formed by covariance between any characteristic dimension of the face features and other characteristic dimensions.

For example, the face feature f_iThe n-dimensional random vector can be used as an n-dimensional random vector, and each dimension can be used as a random variable, so that an n × n matrix formed by the covariance between n random variables is a covariance matrix.

It should be noted that, in the present disclosure, a specific manner of obtaining the covariance matrix corresponding to the face features and obtaining the first loss function according to the covariance matrix is not limited, and the first loss function may be obtained according to an actual situation.

As a possible implementation manner, the face features output by the backbone network may be input to a first network in the face feature extraction model to obtain a covariance matrix corresponding to the face features, and a first loss function may be obtained according to the covariance matrix.

And S104, carrying out classification and identification according to the human face features, and acquiring a second loss function according to a classification and identification result.

Specifically, the face features of the sample image obtained in step S102 are classified and identified to obtain a classification and identification result, and a second loss function is obtained through the classification and identification result.

It should be noted that, in the present disclosure, a specific manner of performing classification and identification according to the human face features and acquiring the second loss function according to the classification and identification result is not limited, and the second loss function may be acquired according to an actual situation.

As a possible implementation manner, the face features output by the backbone network may be input to a second network in the face feature extraction model to perform classification and recognition according to the face features, and a second loss function may be obtained according to a classification and recognition result.

As another possible implementation manner, the face features output by the backbone network may be input to a third network (such as a classifier) in the face feature extraction model, so as to perform classification and recognition according to the face features and obtain a classification result. Further, the classification result output by the third network may be input to a fourth network in the face feature extraction model, and a second loss function may be obtained according to the classification recognition result.

And S105, adjusting model parameters of the face feature extraction model according to the first loss function and the second loss function until a training end condition is met to obtain a target face feature extraction model.

Specifically, the model parameters of the face feature extraction model are adjusted according to the first loss function obtained in step S103 and the second loss function obtained in step S104 until the training end condition is met, so as to obtain the target face feature extraction model.

It should be noted that, in the present disclosure, a specific manner of adjusting the model parameters of the face feature extraction model according to the first loss function and the second loss function is not limited, and may be obtained according to an actual situation.

As a possible implementation manner, the first loss function and the second loss function may be added, and a mapping relationship between a preset sum of the first loss function and the second loss function and the model parameter is queried, to determine a model parameter adjustment strategy corresponding to the current training turn, and adjust the model parameter according to the adjustment strategy.

As another possible implementation manner, the first loss function and the second loss function may be weighted, a model parameter adjustment strategy corresponding to the current training turn is determined by querying a mapping relationship between a preset weighting result and a model parameter, and the model parameter is adjusted according to the adjustment strategy.

In the present disclosure, the specific setting of the training end condition is not limited, and may be selected according to actual situations.

As a possible implementation manner, the training end condition may be set to determine that the similarity between the model output result and the feature extraction result of the labeled sample image reaches a preset similarity threshold; for example, the training end condition may be set to determine that the number of times of adjusting the parameters of the face feature extraction model reaches a preset number threshold.

In summary, according to the training method for the face feature extraction model in the embodiment of the present disclosure, the first loss function is obtained by performing covariance matrix calculation on the extracted face features, the second loss function is obtained by performing classification and identification on the extracted face features, and then the model parameters are adjusted according to the first loss function and the second loss function until the model converges, so that under-fitting risks and over-fitting risks caused when model training is performed on different sample images acquired in different scenes are avoided, so that when model training is performed on different sample images acquired in different scenes, gains of face recognition accuracy can be obtained, and the training effect of the face feature extraction model is improved.

Fig. 2 is a flowchart illustrating a training method of a face feature extraction model according to a second embodiment of the present disclosure.

As shown in fig. 2, on the basis of the embodiment shown in fig. 1, the training method for a face feature extraction model according to the embodiment of the present disclosure may specifically include the following steps:

s201, a sample image comprising a human face is obtained.

S202, inputting the sample image into a face feature extraction model, and extracting the face features of the sample image through a backbone network in the face feature extraction model.

Specifically, steps S201 to S202 in this embodiment are the same as steps S101 to S102 in the above embodiment, and are not described again here.

The step S203 "acquiring a covariance matrix corresponding to the face feature, and acquiring the first loss function according to the covariance matrix" in the foregoing embodiment may specifically include the following steps S203 and S204.

S203, acquiring covariance between any feature dimension and other feature dimensions of the human face features.

Specifically, according to the face features obtained in step 201, a covariance between any feature dimension and other feature dimensions of the face features is calculated.

For example, if the face features have three feature dimensions, which are feature dimensions 1 to 3, in this case, the covariance between feature dimension 1 and feature dimension 2, the covariance between feature dimension 3, and the covariance between feature dimension 2 and feature dimension 3 may be obtained.

And S204, generating a covariance matrix and an identity matrix corresponding to the covariance matrix according to all the covariance.

Specifically, from all the covariances acquired in step S203, a covariance matrix can be generated.

For example, if the face features are a matrix F with dimension (b, d), a covariance matrix X with dimension (d, d) can be generated by the following formula_ij：

Further, a corresponding identity matrix may be generated according to the covariance matrix.

For example, for the covariance matrix shown in fig. 3(a), the corresponding identity matrix is shown in fig. 3 (b).

S205, a first loss function is obtained according to the covariance matrix and the identity matrix.

As a possible implementation, the difference between the covariance matrix and the identity matrix may be obtained, and the square of the difference may be obtained as the first loss function. The first loss function is also called a loss function based on a covariance matrix.

For example, in obtaining the covariance matrix X_ijAnd an identity matrix I_ijThereafter, the first Loss function Loss1 may be obtained by the following equation:

wherein, the covariance matrix X_ijAnd an identity matrix I_ijThe matrix dimensions of (c) are all (d, d).

The step S104 "performing classification and recognition according to the facial features, and acquiring the second loss function according to the classification and recognition result" in the foregoing embodiment may specifically include the following steps S206 and S207.

And S206, inputting the human face features into a classification network in the human face feature extraction model, and performing classification and identification on the human face features by the classification network to obtain a classification and identification result of the human face features.

Specifically, the facial features obtained in step S202 may be input into a facial feature extraction model, and the classification network in the facial feature extraction model performs classification and identification on the facial features to obtain a corresponding classification and identification effect.

In the embodiment of the disclosure, for the classification network, the face features may be classified and identified by the classifier, and a classification and identification result is obtained. The classifier is used for judging the category to which a new observation sample (human face feature) belongs on the basis of the training data with the labeled category.

And S207, acquiring a second loss function according to the classification recognition result and the mark type of the face feature of the sample image.

As a possible implementation manner, as shown in fig. 4, on the basis of the foregoing embodiment, a specific process of obtaining the second loss function according to the classification recognition result and the label category of the face feature of the sample image in the foregoing step S207 includes the following steps:

s401, obtaining the similarity between the face feature category and each candidate face feature category.

For example, for the following three candidate face feature categories 1 to 3, the similarity 1 to 3 between the face feature category and each candidate face feature category 1 to 3 may be obtained as 95%, 35%, and 51%, respectively.

And S402, obtaining a classification recognition result of the human face features according to the similarity.

As a possible implementation, can be based onThe softmax function obtains the probability that the face feature belongs to each candidate face feature category, and optionally obtains the face feature f through the following formula_iProbability p of belonging to class j_ij：

Wherein s is_ijFor face feature fi and j-type candidate face feature center f_jThe similarity of (c).

And S403, acquiring a cross entropy loss function as a second loss function according to the mark type and the classification recognition result.

Wherein the second loss function is also called cross entropy loss function.

As a possible implementation, the second loss function/may be obtained by the following formula_i：

Wherein, the human face characteristics f_iThe judgment result of whether the object belongs to the category j is y_ijWherein, if the face feature f_iBelong to class j, then y_ijThe value is 1; if the face feature f_iNot belonging to class j, then y_ijThe value is 0.

The step S105 of adjusting the model parameters of the face feature extraction model according to the first loss function and the second loss function in the above embodiment may specifically include the following steps S308 and S309.

And S208, acquiring the sum of the first loss function and the second loss function as a target loss function.

In the embodiment of the present disclosure, the first loss function and the second loss function may be directly added, and the sum of the two may be used as the target loss function.

Further, in order to further improve the training effect of the face feature extraction model, the first loss function and the second loss function may be weighted, and the weighted result may be used as the target loss function. The weight can be set and adjusted according to actual conditions.

And S209, adjusting the model parameters of the face feature extraction model according to the target loss function.

In the embodiment of the present disclosure, after the target loss function is obtained, a mapping relationship between a preset target loss function and a model parameter adjustment policy may be queried according to the target loss function, so as to obtain the model parameter adjustment policy. Furthermore, the model parameters of the face feature extraction model can be adjusted according to the matched model parameter adjustment strategy.

The following explains the overall process of the training method of the face feature extraction model as an example.

As shown in fig. 5, in the training method of the facial feature extraction model provided by the present disclosure, the acquired multi-domain facial image (sample image) is input into the facial feature extraction model to be trained, and the facial feature f of the sample image is extracted by the backbone network in the facial feature extraction model_i. Further, according to the extracted human face features f_iPerforming covariance matrix calculation to obtain covariance matrix-based loss (first loss function), and extracting face features f_iAnd carrying out classification identification to obtain cross entropy loss (a second loss function), and further adjusting model parameters according to the first loss function and the second loss function until the model converges.

In summary, the training method for the face feature extraction model according to the embodiment of the present disclosure takes the matrix as a basis, and obtains the first loss function based on the covariance matrix and the second loss function based on the cross entropy according to the face feature to train the face feature extraction model, thereby further improving the training effect of the face feature extraction model.

Fig. 6 is a flowchart illustrating a face feature extraction method according to a fourth embodiment of the present disclosure.

As shown in fig. 6, the method for extracting a face feature according to the embodiment of the present disclosure may specifically include the following steps:

s601, acquiring an image to be extracted.

Specifically, the execution subject of the facial feature extraction method according to the embodiment of the present disclosure may be the facial feature extraction device provided in the embodiment of the present disclosure, and the facial feature extraction device may be a hardware device with data information processing capability and/or necessary software for driving the hardware device to work. Alternatively, the execution body may include a workstation, a server, a computer, a user terminal, and other devices. The user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like.

The image to be extracted may be any image used for face recognition. Optionally, an image to be extracted including a human face may be directly acquired; optionally, any image may be acquired and screened to acquire an image to be extracted including a human face.

S602, inputting the image to be extracted into the target human face feature extraction model, and outputting the target human face feature of the image to be extracted by the target human face feature extraction model.

As a possible implementation mode, the facial feature extraction can be carried out on the image to be extracted based on the trained target facial feature extraction model. Optionally, the image to be extracted may be input into the trained target face feature extraction model, and the target face feature of the image to be extracted may be output.

In summary, the face feature extraction method according to the embodiment of the disclosure may perform face feature extraction on an image to be extracted based on a trained target face feature extraction model. Optionally, the image to be extracted may be acquired, the image to be extracted is input to the target face feature extraction model, and the target face feature of the image to be extracted is output by the target face feature extraction model.

Fig. 7 is a flowchart illustrating a face feature extraction method according to a fifth embodiment of the present disclosure.

As shown in fig. 7, on the basis of the embodiment shown in fig. 6, the method for extracting a face feature according to the embodiment of the present disclosure may specifically include the following steps:

and S701, acquiring an image to be extracted.

Specifically, step S701 in this embodiment is the same as step S61 in the above embodiment, and is not repeated here.

The step S602 "inputting the image to be extracted into the target face feature extraction model, and outputting the target face feature of the image to be extracted by the target face feature extraction model" in the foregoing embodiment may specifically include the step S702.

And S702, extracting the target face features of the image to be extracted by a backbone network in the target face feature extraction model.

Specifically, an image to be extracted is input into a target face feature extraction model, and face feature extraction is performed on the image to be extracted through a backbone network in the target face feature extraction model, so that target face features of the image to be extracted are output.

Further, after the target face features are obtained, face recognition can be performed on the face in the image to be extracted according to the target face features. As a possible implementation manner, the following steps S703 to S705 are specifically included.

S703, at least one candidate image and the candidate face feature of each candidate image are obtained.

The candidate images and the candidate face features of the candidate images may be obtained in various ways, for example, the corresponding candidate images and the candidate face features of the candidate images may be selected from a storage area in a face recognition database.

S704, obtaining the similarity between the target face features and all the candidate face features.

It should be noted that, in the present disclosure, a specific manner for obtaining the similarity between the target face feature and all the candidate face features is not limited, and may be selected according to an actual situation. For example, cosine similarity (cosine similarity) between the target face feature and all candidate face features may be obtained.

S705, in response to the fact that the similarity reaches a preset similarity threshold, it is determined that the face corresponding to the image to be recognized is consistent with the face corresponding to the candidate image.

The preset similarity threshold may be set according to actual conditions, for example, the preset similarity threshold may be set to 90%, 95%, 98%, and the like.

In the embodiment of the disclosure, after the similarity is obtained, the similarity may be compared with a preset similarity threshold, and in response to the similarity reaching the preset similarity threshold, it is determined that the face corresponding to the image to be recognized is consistent with the face corresponding to the candidate image; and in response to the fact that the similarity does not reach the preset similarity threshold, determining that the face corresponding to the image to be recognized is inconsistent with the face corresponding to the candidate image.

Further, if it is determined that the face corresponding to the image to be recognized is not consistent with the face corresponding to the candidate image, then matching prompting information, such as alarm information, prompting to perform face recognition again, and the like, may be generated.

In summary, the face feature extraction method of the embodiment of the disclosure can perform face feature extraction on an image to be extracted based on a trained target face feature extraction model, and then perform face recognition according to a more accurate face feature extraction result, so as to realize high-accuracy face recognition, and further lay a safety foundation for various application scenarios that rely on face recognition for security control.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the good custom of the public order.

Fig. 8 is a schematic structural diagram of a training apparatus for a face feature extraction model according to an embodiment of the present disclosure.

As shown in fig. 8, the training apparatus 800 for a face feature extraction model includes: a first obtaining module 810, an extracting module 820, a second obtaining module 830, a third obtaining module 840 and a generating module 850. Wherein:

a first obtaining module 810, configured to obtain a sample image including a human face;

an extraction module 820, configured to input the sample image into a face feature extraction model, and extract a face feature of the sample image through a backbone network in the face feature extraction model;

the second obtaining module 830 is configured to obtain a covariance matrix corresponding to the face feature, and obtain a first loss function according to the covariance matrix;

the third obtaining module 840 is configured to perform classification and identification according to the human face features, and obtain a second loss function according to a classification and identification result;

and the generating module 850 is configured to adjust model parameters of the face feature extraction model according to the first loss function and the second loss function until a training end condition is met to obtain a target face feature extraction model.

The second obtaining module 830 is further configured to:

acquiring covariance between any feature dimension and other feature dimensions of the human face features;

generating a covariance matrix and an identity matrix corresponding to the covariance matrix according to all the covariance;

and acquiring a first loss function according to the covariance matrix and the identity matrix.

The second obtaining module 830 is further configured to:

a difference between the covariance matrix and the identity matrix is obtained, and a square of the difference is obtained as a first loss function.

The third obtaining module 840 is further configured to:

inputting the human face features into a classification network in a human face feature extraction model, and performing classification and identification on the human face features by the classification network to obtain a classification and identification result of the human face features;

and acquiring a second loss function according to the classification recognition result and the mark type of the face feature of the sample image.

The third obtaining module 840 is further configured to:

acquiring the similarity between the face feature category and each candidate face feature category;

according to the similarity, obtaining a classification recognition result of the human face features;

and acquiring a cross entropy loss function as a second loss function according to the mark category and the classification recognition result.

Wherein the generating module 850 is further configured to:

acquiring the sum of the first loss function and the second loss function as a target loss function;

and adjusting the model parameters of the face feature extraction model according to the target loss function.

It should be noted that the explanation of the above embodiment of the training method for a face feature extraction model is also applicable to the training device for a face feature extraction model in the embodiment of the present disclosure, and the specific process is not described herein again.

In summary, the training device for the face feature extraction model according to the embodiment of the present disclosure obtains the first loss function by performing covariance matrix calculation on the extracted face features, obtains the second loss function by performing classification and identification on the extracted face features, and then adjusts model parameters according to the first loss function and the second loss function until the model converges, thereby avoiding under-fitting risks and over-fitting risks caused when model training is performed by acquiring different sample images in different scenes, so that when model training is performed by acquiring different sample images in different scenes, gains of face recognition accuracy can be obtained, and the training effect of the face feature extraction model is improved.

Fig. 9 is a schematic structural diagram of a face feature extraction device according to an embodiment of the present disclosure.

As shown in fig. 9, the training apparatus 900 for a face feature extraction model includes: an acquisition module 910 and an output module 920. Wherein:

an obtaining module 910, configured to obtain an image to be extracted;

the output module 920 is configured to input the image to be extracted into the target face feature extraction model, and output the target face feature of the image to be extracted by the target face feature extraction model, where the target face feature extraction model is a model trained by using the training apparatus according to the first aspect of the disclosure.

Wherein, the output module 920 is further configured to:

and extracting the target face features of the image to be extracted by a backbone network in the target face feature extraction model.

Wherein, the output module 920 is further configured to:

acquiring at least one candidate image and candidate face characteristics of each candidate image;

acquiring the similarity between the target face features and all the candidate face features;

and in response to the similarity reaching a preset similarity threshold, determining that the face corresponding to the image to be recognized is consistent with the face corresponding to the candidate image.

It should be noted that the above explanation of the embodiment of the face feature extraction method is also applicable to the face feature extraction device in the embodiment of the present disclosure, and the specific process is not described herein again.

In summary, the facial feature extraction device according to the embodiment of the present disclosure may perform facial feature extraction on an image to be extracted based on a trained target facial feature extraction model. Optionally, the image to be extracted may be acquired, the image to be extracted is input to the target face feature extraction model, and the target face feature of the image to be extracted is output by the target face feature extraction model.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as a training method of a face feature extraction model or a face feature extraction method. For example, in some embodiments, a training method of a face feature extraction model or a face feature extraction method. Can be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the model training or image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the training method of the face feature extraction model or the face feature extraction method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

The present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of training a face feature extraction model or a method of face feature extraction as described above.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a face feature extraction model comprises the following steps:

acquiring a sample image comprising a human face;

inputting the sample image into a face feature extraction model, and extracting the face features of the sample image through a backbone network in the face feature extraction model;

acquiring a covariance matrix corresponding to the face features, and acquiring a first loss function according to the covariance matrix;

carrying out classification recognition according to the human face features, and acquiring a second loss function according to a classification recognition result;

and adjusting the model parameters of the face feature extraction model according to the first loss function and the second loss function until the training end condition is met to obtain a target face feature extraction model.

2. The training method according to claim 1, wherein the obtaining a covariance matrix corresponding to the face feature according to the face feature and obtaining a first loss function according to the covariance matrix comprises:

acquiring covariance between any feature dimension and other feature dimensions of the face features;

generating the covariance matrix and an identity matrix corresponding to the covariance matrix according to all the covariances;

and acquiring the first loss function according to the covariance matrix and the identity matrix.

3. The training method of claim 2, wherein said obtaining the first loss function from the covariance matrix and the identity matrix comprises:

obtaining a difference between the covariance matrix and the identity matrix, and obtaining a square of the difference as the first loss function.

4. The training method according to claim 1, wherein the performing classification recognition according to the facial features and obtaining a second loss function according to a classification recognition result comprises:

inputting the human face features into a classification network in the human face feature extraction model, and performing classification and identification on the human face features by the classification network to obtain the classification and identification result of the human face features;

and acquiring the second loss function according to the classification recognition result and the mark type of the face feature of the sample image.

5. The training method of claim 4, wherein the obtaining the second loss function according to the face feature class comprises:

according to the similarity, the classification recognition result of the face features is obtained;

and acquiring a cross entropy loss function as the second loss function according to the mark category and the classification recognition result.

6. The training method according to any one of claims 1-5, wherein the adjusting model parameters of the face feature extraction model according to the first loss function and the second loss function comprises:

and adjusting the model parameters of the human face feature extraction model according to the target loss function.

7. A face feature extraction method comprises the following steps:

acquiring an image to be extracted;

inputting the image to be extracted into a target face feature extraction model, and outputting the target face feature of the image to be extracted by the target face feature extraction model, wherein the target face feature extraction model is a model trained by the training method according to any one of claims 1 to 6.

8. The extraction method according to claim 7, wherein the outputting, by the target face feature extraction model, the target face feature of the image to be extracted comprises:

9. The extraction method according to claim 7 or 8, wherein after the acquiring the target face feature, further comprising:

acquiring at least one candidate image and candidate face features of each candidate image;

and determining that the face corresponding to the image to be recognized is consistent with the face corresponding to the candidate image in response to the similarity reaching a preset similarity threshold.

10. A training device for a face feature extraction model comprises:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample image comprising a human face;

the extraction module is used for inputting the sample image into the face feature extraction model and extracting the face feature of the sample image by a main network in the face feature extraction model;

the second acquisition module is used for acquiring a covariance matrix corresponding to the face features and acquiring a first loss function according to the covariance matrix;

the third acquisition module is used for carrying out classification and identification according to the human face characteristics and acquiring a second loss function according to a classification and identification result;

and the generating module is used for adjusting the model parameters of the face feature extraction model according to the first loss function and the second loss function until the training end condition is met to obtain the target face feature extraction model.

11. The training device of the face feature extraction model according to claim 10, wherein the second obtaining module is further configured to:

12. The training device of the face feature extraction model according to claim 11, wherein the second obtaining module is further configured to:

13. The training device of the face feature extraction model according to claim 10, wherein the third obtaining module is further configured to:

14. The training device of the face feature extraction model according to claim 13, wherein the third obtaining module is further configured to:

15. The apparatus for training a human face feature extraction model according to claims 10-14, wherein the generating module is further configured to:

16. A facial feature extraction apparatus comprising:

the acquisition module is used for acquiring an image to be extracted;

an output module, configured to input the image to be extracted into the target face feature extraction model, and output the target face feature of the image to be extracted by the target face feature extraction model, where the target face feature extraction model is a model trained by using the training apparatus according to any one of claims 10 to 15.

17. The facial feature extraction apparatus of claim 16, wherein the output module is further configured to:

18. The facial feature extraction apparatus according to claim 16 or 17, wherein the output module is further configured to:

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6 or claims 7-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-6 or claims 7-9.

21. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of any of claims 1-6 or claims 7-9.