WO2023125654A1 - 人脸识别模型的训练方法、装置、电子设备及存储介质 - Google Patents

人脸识别模型的训练方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023125654A1
WO2023125654A1 PCT/CN2022/142777 CN2022142777W WO2023125654A1 WO 2023125654 A1 WO2023125654 A1 WO 2023125654A1 CN 2022142777 W CN2022142777 W CN 2022142777W WO 2023125654 A1 WO2023125654 A1 WO 2023125654A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
face image
features
recognition model
source domain
Prior art date
Application number
PCT/CN2022/142777
Other languages
English (en)
French (fr)
Inventor
张烁
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2023125654A1 publication Critical patent/WO2023125654A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the present application relates to the technical field of deep learning, in particular to a training method, device, electronic equipment and storage medium for a face recognition model.
  • the face recognition model based on deep learning needs to use the full amount of face data for training.
  • the trained face recognition model can perform identity recognition based on human facial feature information, but the face recognition model trained using source domain Domain performance is poor.
  • the face recognition model will forget the recognition performance in the source domain after training with the target domain data alone.
  • Embodiments of the present application provide a face recognition model training method, device, electronic equipment, and storage medium, which can at least maintain the performance of the face recognition model in the source domain and improve its performance in the target domain.
  • the specific technical scheme is as follows:
  • the embodiment of the present application provides a method for training a face recognition model, the method comprising:
  • the step of obtaining face features in the source domain includes:
  • the full amount of human face image samples are screened to obtain the filtered full amount of human face image samples, wherein the preset screening strategy makes the number of the screened full amount of human face image samples unchanged In the case of , the number of identity information corresponding to the screened full amount of face image samples is not less than the preset number;
  • the source domain facial features are determined.
  • the initialization recognition model includes a fixed parameter part and a part to be trained
  • the step of adjusting some model parameters of the initialization recognition model based on the target face image sample and the source domain face features includes:
  • the method before the step of inputting the target face image sample into the parameter fixed part and the part to be trained to obtain the first predicted label, the method further includes:
  • the target face image samples are clustered, and a pseudo-label corresponding to each target face image sample is determined, wherein the pseudo-label is used to identify the identity of the person to which the target face image sample corresponds.
  • the step of adjusting model parameters of the part to be trained based on the first classification loss, the second classification loss and the constraint loss includes:
  • the loss function value L is calculated according to the following formula:
  • L c1 is the first classification loss
  • L c2 is the second classification loss
  • L kd is the constraint loss
  • is a preset parameter
  • the step of determining a constraint loss based on the estimated features and the initial features includes,
  • the constraint loss L kd is calculated according to the following formula:
  • n is the number of face features in the source domain
  • F i is the initial feature corresponding to the i-th source domain face feature
  • the step of determining the source domain facial features based on the facial features output by the intermediate layer includes:
  • the method further includes:
  • the method also includes:
  • the face image to be recognized is recognized based on the face recognition model, and an identity corresponding to the face image to be recognized is determined.
  • the embodiment of the present application provides a training device for a face recognition model, the device comprising:
  • the initialization training module is used to obtain the face features of the source domain and initialize the recognition model, wherein the initialization recognition model is obtained based on the training of the full amount of face image samples in the source domain, and the source domain face features are obtained through the initialization recognition model Facial features of the obtained full amount of human face image samples;
  • a target domain sample acquisition module configured to acquire a target face image sample in the target domain, wherein the identity label corresponding to the target face image sample is unknown;
  • An incremental training module configured to adjust some model parameters of the initialization recognition model based on the target face image sample and the source domain face features, until the initialization recognition model converges to obtain face recognition model.
  • the initialization training module includes:
  • the sample screening unit is configured to screen the full amount of human face image samples according to a preset screening strategy to obtain the filtered full amount of human face image samples, wherein the preset screening strategy makes the screened full amount of human face images When the number of image samples remains unchanged, the number of identity information corresponding to the screened full face image samples is not less than a preset number;
  • a feature acquisition unit configured to input the screened full amount of face image samples into the initialization recognition model, and obtain the face features output by the middle layer of the initialization recognition model;
  • a feature determination unit configured to determine source domain face features based on the face features output by the intermediate layer.
  • the initialization recognition model includes a fixed parameter part and a part to be trained
  • the incremental training modules include:
  • a first input unit configured to input the target face image sample into the parameter fixing part and the part to be trained to obtain a first prediction label, and based on the first prediction label and the target face image sample The corresponding pseudo-label determines the first classification loss;
  • the second input unit is configured to input the source domain facial features into the part to be trained to obtain a second predicted label, and determine based on the second predicted label and the identity label corresponding to the source domain facial features second classification loss;
  • the third input unit is configured to input the source domain face features into the part to be trained and the initial part corresponding to the part to be trained to obtain estimated features and initial features, and based on the estimated features and the obtained Described initial feature, determine constraint loss, wherein, described initial part is the corresponding model part when the model parameter of described to-be-trained part is fixed as the model parameter after training based on described full face image sample;
  • a parameter adjustment unit configured to adjust model parameters of the part to be trained based on the first classification loss, the second classification loss, and the constraint loss.
  • the device also includes:
  • the target domain sample aggregation module is used to perform the target human face image sample before the step of inputting the target human face image sample into the parameter fixed part and the to-be-trained part to obtain the first predicted label Clustering, determining a pseudo-label corresponding to each target face image sample, wherein the pseudo-label is used to identify the identity of the person to which the target face image sample belongs.
  • the parameter adjustment unit includes:
  • the loss function value calculation subunit is used to calculate the loss function value L according to the following formula based on the first classification loss, the second classification loss and the constraint loss:
  • L c1 is the first classification loss
  • L c2 is the second classification loss
  • L kd is the constraint loss
  • is a preset parameter
  • the parameter adjustment subunit is configured to adjust the model parameters of the part to be trained based on the loss function value.
  • the third input unit includes:
  • the constraint loss calculation subunit is used to calculate the constraint loss L kd according to the following formula based on the estimated features and the initial features:
  • n is the number of face features in the source domain
  • F i is the initial feature corresponding to the i-th source domain face feature
  • the feature determination unit includes:
  • the feature dimensionality reduction subunit is used to perform dimensionality reduction processing on the facial features output by the intermediate layer, and obtain the dimensionality-reduced facial features as source domain facial features;
  • the device also includes:
  • a feature recovery module configured to dimension the source domain face features before the step of adjusting some model parameters of the initialization recognition model based on the target face image sample and the source domain face features Restoration processing to obtain the restored source domain face features.
  • the device also includes:
  • a face image acquisition module to be recognized configured to obtain a face image to be recognized in the target domain
  • An identity determining module configured to identify the face image to be recognized based on the face recognition model, and determine the identity corresponding to the face image to be recognized.
  • an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;
  • the processor is configured to implement the method steps described in any one of the above-mentioned first aspects when executing the program stored in the memory.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, any of the above-mentioned first aspects can be implemented. Method steps.
  • an embodiment of the present application provides a computer program product containing instructions, which, when the computer program product is run on a computer, cause the computer to execute the method steps described in any one of the above first aspects.
  • the electronic device can obtain the face features of the source domain and initialize the recognition model, wherein the initialization recognition model is obtained based on the training of the full amount of face image samples in the source domain, and the face features of the source domain are obtained through the initialization of the recognition model Obtain the face features of the full amount of face image samples; obtain the target face image samples in the target domain, where the identity label corresponding to the target face image samples is unknown; based on the target face image samples and source domain face features, adjust the initialization Identify part of the model parameters of the model until the initialization of the recognition model converges, and the face recognition model for the source domain and the target domain is obtained.
  • the initialization model After the initialization model is trained using the full amount of face image samples in the source domain, some face features in the source domain are saved, and some parameters of the initialization model are fixed. Furthermore, after the initialization model is further trained using the target face image samples in the target domain and the face feature training in the source domain, the face recognition models for the source domain and the target domain are obtained.
  • the face recognition model not only maintains the ability to recognize the full amount of face images in the source domain, but also can accurately identify the target face images in the target domain. In the case of training, the face image can be accurately recognized, and the recognition ability and accuracy of the face recognition model are improved.
  • any product or method of the present application does not necessarily need to achieve all the above-mentioned advantages at the same time.
  • Fig. 1 is the flowchart of the training method of a kind of face recognition model provided by the embodiment of the present application;
  • Fig. 2 is a kind of specific flow chart of acquiring source domain face feature in step S101 in the embodiment shown in Fig. 1;
  • Fig. 3 is a kind of specific flow chart of the part model parameter of the adjustment initialization recognition model of step S103 in the embodiment shown in Fig. 1;
  • Fig. 4 is a schematic diagram of training an initialization recognition model using target face image samples and source domain face features based on the embodiment shown in Fig. 1;
  • Fig. 5 is a schematic diagram of feature extraction and feature dimensionality reduction processing based on the embodiment shown in Fig. 1;
  • Fig. 6 is a kind of specific flowchart of determining the identity of a face image based on the embodiment shown in Fig. 1;
  • FIG. 7 is a schematic structural diagram of a training device for a face recognition model provided in an embodiment of the present application.
  • Fig. 8 is a kind of specific structural schematic diagram of the incremental training module in the embodiment shown in Fig. 7;
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application provides a training method, device, electronic equipment, computer readable storage medium and computer program product of a face recognition model, the following first describes the embodiment of the present application The training method of a face recognition model provided is introduced.
  • the face recognition model training method provided in the embodiment of the present application can be applied to any electronic device capable of face recognition model training, for example, it can be various computing devices used for model training, and it can be the entrances and exits of various parks
  • the server corresponding to the gate can be the server of the face recognition device in the security system, etc., which is not specifically limited here. For clarity of description, it is subsequently referred to as an electronic device.
  • the initialization recognition model is trained based on the full amount of face image samples in the source domain, and the source domain face features are the face features of the full amount of face image samples obtained through the initialization recognition model.
  • the identity label corresponding to the target face image sample is unknown.
  • the electronic device can obtain the face features of the source domain and initialize the recognition model, wherein the initialization recognition model is obtained based on the training of the full amount of face image samples in the source domain, and the face features of the source domain are obtained through initialization
  • the face features of the full face image samples obtained by the recognition model the target face image samples in the target domain are obtained, and the identity label corresponding to the target face image samples is unknown; based on the target face image samples and the source domain face features, Adjust some model parameters of the initialization recognition model until the initialization recognition model converges, and the face recognition models for the source domain and the target domain are obtained.
  • the initialization model After the initialization model is trained using the full amount of face image samples in the source domain, some face features in the source domain are saved, and some parameters of the initialization model are fixed. Furthermore, after the initialization model is further trained using the target face image samples in the target domain and the face feature training in the source domain, the face recognition models for the source domain and the target domain are obtained.
  • the face recognition model not only maintains the ability to recognize the full amount of face images in the source domain, but also can accurately identify the target face images in the target domain. In the case of training, the face image can be accurately recognized, and the recognition ability and accuracy of the face recognition model are improved.
  • the face recognition model based on deep learning has excellent face recognition performance in normal scenes, but in some special scenes, such as child face recognition, low-quality face recognition and face recognition with masks, etc.
  • the convolutional neural network is a commonly used deep learning network for face recognition.
  • the training of convolutional neural network also has the characteristics of catastrophic forgetting. Catastrophic forgetting means that the face recognition model that has acquired part of the face recognition ability through training forgets or loses the previously acquired face images when learning to recognize new face images.
  • the face recognition model In order for the face recognition model to accurately recognize face images, it is necessary to use the full amount of face data to train the model. However, due to data privacy and other reasons, it may not be possible to obtain the full amount of face data, so the model can only be trained with incomplete face data, and the model is trained using such incomplete face data.
  • the face recognition performance of the face recognition model will be very poor.
  • the face recognition model is a model based on convolutional neural network, and when the face data is a face image wearing a mask, due to the characteristic of catastrophic forgetting, if only these face images wearing a mask are used for the face recognition model After conventional training, the obtained face recognition model can hardly recognize face images without masks, and the recognition ability and accuracy are very poor.
  • the electronic device can use the method of domain adaptive incremental learning to recognize the face The model is trained.
  • the face data used to train the face recognition model can be divided into the full amount of face image samples in the source domain and the target face image samples in the target domain, wherein the full amount of face image samples is known identity information
  • the target face image sample in the target domain is the face image sample of the person to be identified, and the identity label corresponding to the target face image sample is unknown.
  • Domain adaptation is a transfer learning method in which the data distributions corresponding to the source domain and the target domain are different, but the tasks are the same.
  • the tasks are both used to train the face recognition model.
  • Incremental learning refers to a learning method that can continuously learn new knowledge from new samples and preserve most of the learned knowledge.
  • the face recognition model can be The recognition ability of the face image in the source domain is preserved, and the recognition ability of the target face image in the target domain is also enhanced.
  • the electronic device may acquire face features in the source domain and initialize a recognition model, wherein the initialization recognition model is obtained through training based on a full amount of face image samples in the source domain.
  • the full face image samples may be CASIA, VGGFace2 or MS1MV2, etc., which are not limited here.
  • the identity information of the full amount of face image samples in the source domain is known, that is, the identity labels of the full amount of face image samples are known.
  • the initialization model can obtain the predicted labels of the full amount of face image samples, based on the predicted labels and the identity labels of the full amount of face image samples, electronic equipment
  • the classification loss can be calculated according to the following formula:
  • L c is the cross-entropy loss used
  • m represents the size of the margin
  • s represents the size of the scale value
  • represents the angle between the weight and the feature
  • i represents the index of the input sample
  • y i represents the input sample with the index i
  • N represents the number of samples.
  • the electronic device can continuously reduce the classification loss by adjusting the model parameters of the face recognition model until the number of iterations of the full face image samples in the source domain reaches the preset number, and then determine the initialization The model converges, and the initialization recognition model is obtained.
  • the trained initialization recognition model has the ability to recognize face images in the source domain, that is, it has the ability to recognize faces of ordinary people.
  • the face features of the full amount of face image samples in the source domain can be extracted, and the source domain face images can be used in the incremental training process.
  • the face features train the initialization recognition model, so that the initialization recognition model can better retain the recognition ability of the source domain face image.
  • the electronic device can only store a small amount of face features in the source domain. For example, when there is only a small amount of storage space, the electronic device can input a part of the full face image samples randomly selected into the initialization recognition model to obtain Initialize the facial features output by the recognition model as the source domain facial features.
  • the initialization recognition model obtained through initialization training already has the ability to recognize face images in the source domain.
  • the initialization recognition model can be incremented by using the target face image samples in the target domain. train. After the electronic device acquires the face features in the source domain and initializes the recognition model, it can acquire target face image samples in the target domain, that is, execute the above step S102.
  • the target face image samples in the target domain may be collected by an electronic device, or may be input into the electronic device by an external device, which is not limited here.
  • the identity label corresponding to the target face image sample is unknown, that is, the specific identity information corresponding to the target face image sample in the target domain cannot be determined.
  • the target face image samples in the target domain obtained by the electronic device are the face images of multiple employees in an industrial park. Due to privacy protection, these face images cannot determine the real identity, and the electronic device can aggregate these face images.
  • Class operations respectively record the labels of each type of face image as "employee A", "employee B", etc., so as to judge the identity of the person in the process of face recognition.
  • the electronic device can adjust some model parameters of the initialization recognition model based on the target face image sample and the face features of the source domain until the initialization recognition model converges, and the face recognition for the source domain and the target domain is obtained. Model.
  • the initialization recognition model can include a parameter fixed part and a parameter adjustable part, and the model parameters of the parameter fixed part remain In this way, the ability of the initialization recognition model to recognize face images in the source domain can be maintained.
  • the electronic device can input the target face image samples in the target domain and the face features in the source domain into the initialization recognition model, and train the initialization recognition model, and the electronic device can adjust some model parameters of the initialization recognition model, that is, the parameters can be adjusted Part of the model parameters, until the initialization recognition model converges, the face recognition model for the source domain and the target domain is obtained, and the face recognition model for the source domain and the target domain also has the ability to recognize the target face image in the target domain .
  • the electronic device can use the full amount of face image samples in the source domain to train the face recognition model, and the obtained initialization recognition model has the ability to recognize face images in the source domain.
  • the obtained results for the source domain and the target domain The face recognition model not only maintains the recognition ability of the face image in the source domain, but also can accurately identify the target face image in the target domain.
  • the face recognition model for the source domain and the target domain can be obtained by only using a small number of source domain face features.
  • the face recognition model can not only maintain the performance of the source domain, but also improve its performance in the target domain, and can accurately identify face images, which improves the recognition ability and accuracy of the face recognition model.
  • the above-mentioned steps of acquiring face features in the source domain may include:
  • S201 According to a preset screening strategy, screen the full amount of human face image samples to obtain the screened full amount of human face image samples;
  • the preset screening strategy can make the number of identity information corresponding to the screened full face image samples not less than the preset number under the condition that the number of screened full face image samples remains unchanged.
  • the data scale of the full face image samples in the source domain for training the face recognition model is usually very large, which is not conducive to storage.
  • the step of adjusting some model parameters of the initialization recognition model based on the target face image samples and source domain face features it is not necessary to use the source domain face features corresponding to the full amount of face image samples in all source domains, and can also Maintain the ability of the initialization recognition model to recognize face images in the source domain. Therefore, in order to reduce the storage space required for data storage and the amount of calculation required for face recognition model training, the electronic device can filter all face image samples in the source domain.
  • the electronic device can screen the full amount of human face image samples according to a preset screening strategy to obtain the filtered full amount of human face image samples.
  • the size of the storage space remains the same, that is, the number of filtered full face image samples remains the same
  • the higher the inter-class richness of the full face image samples the better the initialization recognition model can maintain the accuracy of the source domain face images. recognition ability. That is to say, it is possible to obtain the full amount of face image samples corresponding to multiple identity information as much as possible, and the full amount of face image samples of different states corresponding to each identity information does not need to be too many, that is, the intra-class richness can be appropriately reduced, so that Better maintain the recognition ability of the face recognition model for the target domain to the face image of the source domain.
  • the above preset screening strategy can be a strategy in which the number of identity information corresponding to the screened full face image samples is not less than the preset number when the number of the screened full face image samples remains unchanged, wherein the preset The number can be set according to requirements such as storage space and calculation amount, and is not specifically limited here.
  • the electronic device may randomly select a certain number of full face image samples corresponding to identity information from all full face image samples, and then randomly select the full face image samples corresponding to each identity information.
  • a certain number of full face image samples For example, the electronic device may first randomly select 1000 full face image samples corresponding to the identity information, and then randomly select 10 full face image samples from the full face image samples corresponding to each identity information.
  • the electronic device can select the full amount of human face image samples according to the distance between the facial features of each full amount of human face image samples and the feature center, For example, a certain number of full face image samples closest to the feature center may be selected.
  • the feature center is the classifier vector corresponding to the identity information to which these full face image samples belong. The closer the distance to the feature center, the higher the accuracy of the real face characteristics of the person whose identity information belongs to the face feature identifier of the full face image sample, so using such a full face image sample for subsequent feature Extraction can be more conducive to the recognition ability of the trained face recognition model.
  • the electronic device can input the screened full face image samples into the initialization recognition model to obtain the face features output by the middle layer of the initialization recognition model.
  • the middle layer of the initialization recognition model is neither the first layer nor the last layer in the initialization recognition model.
  • the initialization recognition model can include multiple layers.
  • the initialization recognition model is a residual network, and the residual network includes four residual blocks.
  • the first three blocks can be fixed
  • the model parameters of the first residual block, only the model parameters of the fourth residual block are adjusted.
  • the electronic device can obtain the face features output by the third residual block of the initialization recognition model.
  • the electronic device After the electronic device obtains the facial features output by the intermediate layer of the initialization recognition model, it can determine the source domain facial features. In an implementation manner, the electronic device may use the face features output by the intermediate layer as source domain face features. In another implementation manner, the electronic device may perform dimensionality reduction processing on the face features output by the intermediate layer, and obtain the face features after dimensionality reduction processing as source domain face features for easy storage.
  • the electronic device can screen the full amount of face image samples, and determine the source domain face features based on the screened full amount of face image samples.
  • the source domain face features are conducive to maintaining face recognition for the target domain The ability of the model to recognize face images in the source domain.
  • the above-mentioned initialization recognition model may include a parameter fixed part and a part to be trained.
  • the initialization recognition model can be divided into a fixed parameter part and a part to be trained.
  • the model parameters of the fixed parameter part are no longer adjusted to maintain the recognition ability of the face recognition model for face images in the source domain.
  • the model parameters of the part to be trained are adjusted so that the trained face recognition model has a stronger recognition ability for face images in the target domain.
  • the source domain face feature is the face feature output by the middle layer of the initialization recognition model
  • the parameter fixed part of the initialization recognition model can be consistent with the middle layer outputting the source domain face feature, that is, Process the part of the model that obtains the face features of the source domain.
  • the initial recognition model is a residual network, which includes four residual blocks, and the source domain face features are output by the third residual block.
  • the parameter-fixed part of initializing the recognition model may include the first three residual blocks, and the part to be trained includes the fourth residual block and the classifier.
  • the above-mentioned step of adjusting some model parameters of the initialization recognition model based on the target face image sample and the source domain face features may include:
  • the initial recognition model is incrementally trained.
  • the teacher-student network can be used for training.
  • the student network that is, the face recognition model
  • the teacher network can use a small amount of storage features, that is, the above-mentioned source domain Face features
  • incremental training is performed on the teacher network, that is, the initial recognition model, so that the student network can obtain similar performance to the teacher network, that is, to maintain the ability to recognize face images in the source domain.
  • the student network is further trained using the target face image samples in the target domain, so that it can have the recognition ability for the target face images in the target domain.
  • the electronic device can determine the pseudo-label corresponding to each target face image sample after obtaining the target face image sample in the target domain.
  • the pseudo-label is used to identify the corresponding The identity of the person to whom the target face image sample belongs, but not the real identity of the person to whom the target face image sample belongs.
  • the pseudo-label may be A, B, C, or 11, 12, 13, etc., which are not specifically limited here.
  • the electronic device can input each target face image sample into the parameter fixed part and the part to be trained to obtain the first predicted label, and then, based on the first predicted label and The difference between the pseudo-labels corresponding to the target face image sample, the electronic device can calculate the classification loss corresponding to the target face image sample, that is, the first classification loss.
  • the first classification loss may represent the difference between the recognition result of the current face recognition model for the face image in the target domain and the real result.
  • the electronic device may input face features in the source domain into the part to be trained, and obtain a second predicted label output by the part to be trained. Since the identity labels of the full amount of face image samples corresponding to the source domain face features are known, the electronic device can calculate the source domain person The classification loss corresponding to the face feature, that is, the second classification loss.
  • the second classification loss can represent the difference between the recognition result of the current face recognition model for the face image in the source domain and the real result.
  • the initial part is the corresponding model part when the model parameters of the part to be trained are fixed to the model parameters after training based on the full amount of face image samples.
  • the electronic device can fix the model parameters of the part to be trained.
  • the fixed model parameters are the model parameters of the part to be trained of the initialization recognition model based on the full amount of face image sample training.
  • initialize the recognition model as a residual network which includes four residual blocks, fix the model parameters of the first three residual blocks, and only adjust the model parameters of the fourth residual block, and the part to be trained includes the
  • the model parameters of the fourth residual block are fixed as the parameters of the fourth residual block corresponding to the initialization model, which is the initial part.
  • the electronic device can input the face features of the source domain into the part to be trained, and the part to be trained can perform further feature extraction on the face features of the source domain based on the current model parameters to obtain estimated features.
  • the electronic device can calculate the constraint loss based on the difference between the estimated feature and the initial feature.
  • the constraint loss can represent the difference between the facial features extracted from the part to be trained and the facial features extracted from the corresponding part with fixed model parameters in the initialization model, and can be used as a constraint to supervise the training of the face recognition model.
  • the execution sequence of the above step S301-step S303 is not specifically limited.
  • the electronic device can adjust the model parameters of the part to be trained based on the first classification loss, the second classification loss and the constraint loss, so as to train the initialization recognition model, Until the number of iterations of the target face image sample and source domain face features reaches the preset number, it is determined that the initialization model is converged.
  • the loss function value of the face recognition model can be calculated based on the first classification loss, the second classification loss and the constraint loss, and the model parameters of the part to be trained are adjusted based on the loss function value until the face recognition The loss function of the model converges, and the initialization identification model is determined to converge.
  • model parameters may be gradient descent algorithm, stochastic gradient descent algorithm, etc., which are not specifically limited and described here.
  • the second classification loss can represent the difference between the current face recognition model for the face image in the source domain
  • the constraint loss can characterize the difference between the face features extracted by the part to be trained and the face features extracted by the corresponding part of the initialization model with fixed model parameters, so based on the first classification loss , the second classification loss and the constraint loss to adjust the model parameters of the part to be trained can make the difference between the recognition result of the face recognition model for the face image in the target domain and the real result smaller and smaller, and keep the accuracy of the source domain The accuracy of the recognition results of face images.
  • the electronic device can use the face features of the source domain to train the initialization recognition model to maintain the face recognition model's ability to recognize face images in the source domain, and the electronic device can use target face image samples in the target domain Train the initialization recognition model to improve the face recognition model's ability to recognize face images in the target domain.
  • the recognition ability of the face recognition model to the face image in the source domain is better maintained, and the recognition ability of the face recognition model to the target face image in the target domain is improved.
  • conv is the convolution layer of the initialization recognition model
  • bn Indicates batch normalization of data
  • relu and tanh are the activation functions used in the activation function layer
  • residual block is the residual block for initializing the recognition model
  • the parameter fixed part includes the residual block of the first three layers
  • the part to be trained includes the fourth layer residual block and classifier
  • the initial part is the fourth layer residual block with model parameters fixed.
  • the electronic device may input the target face image sample into the parameter fixed part and the training part to obtain a first prediction label, and determine the first classification loss based on the first prediction label and the pseudo-label corresponding to the target face image sample.
  • the electronic device can restore the dimensions of the face features after dimensionality reduction to obtain the restored source domain face features, and input the restored source domain face features into the part to be trained to obtain the second prediction label, and based on the first The second prediction label and the identity label corresponding to the face feature in the source domain are used to determine the second classification loss.
  • the electronic device can input the recovered face features of the source domain into the part to be trained and the initial part corresponding to the part to be trained to obtain estimated features and initial features, and determine a constraint loss based on the estimated features and initial features.
  • the electronic device can calculate the loss function of the face recognition model based on the first classification loss, the second classification loss and the constraint loss, adjust the model parameters of the part to be trained until the initialization recognition model converges, and obtain the face recognition model for the source domain and the target domain. face recognition model.
  • the obtained face recognition model for the target domain maintains the ability to recognize face images in the source domain , which also enhances the recognition ability of face images in the target domain.
  • the face recognition model cannot be trained by using the face data of the source domain and the target domain at the same time, the face image can be accurately recognized, and the recognition ability and accuracy of the face recognition model are improved.
  • the above method may also include:
  • Clustering is performed on the target face image samples, and a pseudo-label corresponding to each target face image sample is determined.
  • the electronic device can cluster the target face image samples in the target domain to determine Pseudo-labels corresponding to each target face image sample.
  • the target face image samples can be classified according to their identities according to the similarity of face features, so as to obtain multiple groups of target face image samples, wherein each group of target face image samples belongs to the same Personnel, the electronic device can put a pseudo-label on each group of target face image samples, which is used to identify the identity of the person to which the corresponding target face image samples belong.
  • the k-means++ (K-means) clustering algorithm can be used to cluster the target face image samples to obtain multiple categories, and determine the pseudo-value of each target face image sample included in each category. Label.
  • the electronic device may also use Gaussian mixture model maximum expectation clustering, agglomerative hierarchical clustering, mean shift clustering, etc. to cluster the target face image samples, which are not specifically limited here.
  • the target face image samples in the target domain are the face images of workers in a certain factory, cluster the target face image samples, divide the target face image samples into multiple groups, and each group of target face image samples
  • the electronic device may determine that the pseudo-labels of each group of target face image samples are A, B, C, or 11, 12, 13, etc., which are not limited here.
  • the electronic device can cluster the target face image samples, and determine the pseudo-label corresponding to each target face image sample, so that if the real identity of the person to whom the target face image sample belongs cannot be known, Accurate pseudo-labels can also be determined to identify the identity of the person to which the corresponding target face image sample belongs.
  • the above step of adjusting the model parameters of the part to be trained based on the first classification loss, the second classification loss and the constraint loss may include:
  • L c1 is the above-mentioned first classification loss
  • L c2 is the above-mentioned second classification loss
  • L kd is the above-mentioned constraint loss
  • is a preset parameter.
  • the obtained loss function value can accurately represent the difference between the recognition result and the real result of the face image in the target domain, and the The difference between the recognition result of the face image and the real result and the difference between the face features extracted by the part to be trained and the face features extracted by the corresponding part of the initialization model with fixed model parameters. Therefore, the electronic device can use the above formula to calculate the loss function value L, and based on the loss function value L, adjust the model parameters of the part to be trained to obtain a face recognition model with strong recognition ability.
  • the value of the preset parameter ⁇ can be set according to the change of the loss function value in the training process combined with actual experience, and is not specifically limited here.
  • the electronic device may calculate the loss function value based on the first classification loss, the second classification loss, and the constraint loss according to a formula. And based on the loss function value, adjust the model parameters of the part to be trained to make the initialization recognition model converge.
  • the electronic device can accurately calculate the value of the loss function, which enhances the training effect of the initialization recognition model, improves the recognition effect of the face recognition model on the face image in the source domain, and also makes the face recognition model recognize the face in the target domain. Images have better performance.
  • the above step of determining the constraint loss based on the estimated features and the initial features may include:
  • the constraint loss L kd is calculated according to the following formula:
  • n is the number of face features in the source domain
  • F i is the initial feature corresponding to the i-th source domain face feature
  • the electronic device can use the above formula to calculate the constraint loss, so as to ensure that an accurate loss function value can be calculated.
  • the electronic device can calculate the constraint loss of each source domain face feature after model training, and thus calculate the constraint loss of the initialization recognition module.
  • the electronic device can accurately calculate the constraint loss, and use the constraint loss as supervision to adjust the model parameters of the part to be trained, and a face recognition model with higher accuracy can be obtained.
  • the above-mentioned step of determining the source domain facial features based on the facial features output by the intermediate layer may include:
  • the feature dimension is relatively high, which results in a large storage space required for storing the source domain face features.
  • the electronic device can perform dimensionality reduction processing on the facial features output by the intermediate layer, thereby significantly reducing the storage space required to store the source domain facial features while basically maintaining the amount of feature information.
  • PCA Principal Component Analysis, principal component analysis
  • PCA dimensionality reduction method can be used to perform dimensionality reduction processing on the facial features output by the intermediate layer, and the dimensionality-reduced facial features can be obtained as source domain facial features.
  • the core operation of PCA dimensionality reduction is SVD (Singular Value Decomposition, singular value decomposition).
  • U is the left singular matrix of A
  • U T is the transpose matrix of U
  • is the diagonal matrix containing the corresponding eigenvalues.
  • the PCA dimensionality reduction method can delete some relevant dimensions in the original data, and maximize the amount of information carried by the data while reducing the dimensionality of the data.
  • the process of obtaining face features in the source domain from the full amount of face image samples in the source domain and performing dimensionality reduction processing can be shown in Figure 5, wherein the initialization recognition model can be a convolutional neural network, conv is The convolutional layer of the convolutional neural network, bn means batch normalization of data, relu and tanh are the activation functions used in the activation function layer, and residual block is the residual block of the convolutional neural network.
  • the parameter fixed part includes the first three layers of residual blocks, and the module parameters of the fourth layer of residual blocks can be changed to initialize model training.
  • the electronic device can extract parameters The facial features output by some intermediate layers are fixed, and then through dimensionality reduction processing, the dimensionality-reduced facial features are obtained.
  • the method may further include:
  • the initial recognition model is adjusted based on the target face image samples and face features in the source domain.
  • the electronic device can perform dimension restoration processing on the source domain face features to obtain the restored source domain face features, and the restored source domain face feature dimensions and parameters are fixed.
  • Some intermediate layer output faces The dimensions of the features are the same.
  • the electronic device can perform dimensionality reduction processing on the face features output by the intermediate layer, and before adjusting some model parameters of the initialization recognition model based on the target face image sample and the source domain face features, the After the dimension reduction processing, the source domain face features are processed for dimension restoration. Therefore, on the basis of basically not affecting the ability of the face recognition model to recognize face images in the target domain, the storage space required for storing face features in the source domain can be significantly reduced.
  • the above method may further include:
  • the face recognition model After training the initialization recognition model and obtaining a face recognition model for the target domain, the face recognition model can recognize face images in the target domain and maintain the ability to recognize face images in the source domain.
  • the face recognition model can be deployed in an actual application scenario, that is, the target domain scene, and then the face image to be recognized in the target domain can be obtained.
  • the original model in the gate can be replaced with the face recognition model, which is used in the face recognition of the park personnel. It has better performance, and at the same time basically maintains the ability of face recognition for non-park personnel. Then, when a person wants to enter and exit the gate, the electronic device can collect the face image of the person as the face image to be recognized in the target domain.
  • S602. Recognize the face image to be recognized based on the face recognition model, and determine an identity corresponding to the face image to be recognized.
  • the electronic device may recognize the face image to be recognized based on the face recognition model, and determine the identity corresponding to the face image to be recognized. After identifying the face image to be recognized, the electronic device can determine the identity mark corresponding to the face image to be recognized, and perform different operations according to different identity marks. For example, the electronic device can control the gate to open the gate or keep it closed according to the identification corresponding to the face image to be recognized.
  • the pseudo-labels corresponding to the target face image samples are "person A”, "person B”, etc.
  • the electronic device can recognize that the identity corresponding to the face image to be recognized is "person B", it can be determined that the person is If the person in the industrial park has access authority, the electronic device can control the gate to open the door; if the identity corresponding to the face image to be recognized is identified as Liu XX, it can be determined that the person is not a person in the industrial park, but someone outside the park. If a person does not have access authority, then the electronic device can control the gate to keep it closed.
  • the electronic device can not only recognize the target human face image in the target domain, but also retain the ability to recognize the human face image in the source domain. Furthermore, no matter the person corresponding to the source domain or the target domain, the face image to be recognized can be recognized based on the face recognition model, and the identity corresponding to the face image to be recognized can be accurately determined.
  • the embodiment of the present application also provides a training device for the face recognition model.
  • the training device for the face recognition model provided by the embodiment of the present application is introduced below.
  • a kind of training device of face recognition model, described device comprises:
  • the initialization training module 701 is used to obtain the face features of the source domain and initialize the recognition model
  • the initialization recognition model is trained based on the full amount of face image samples in the source domain, and the source domain face features are the face features of the full amount of face image samples obtained through the initialization recognition model.
  • the identity label corresponding to the target face image sample is unknown.
  • Incremental training module 703 configured to adjust some model parameters of the initialization recognition model based on the target face image sample and the source domain face features, until the initialization recognition model converges to obtain domain face recognition model.
  • the electronic device can obtain the face features of the source domain and initialize the recognition model, wherein the initialization recognition model is obtained based on the training of the full amount of face image samples in the source domain, and the face features of the source domain are obtained through initialization
  • the face features of the full face image samples obtained by the recognition model the target face image samples in the target domain are obtained, and the identity label corresponding to the target face image samples is unknown; based on the target face image samples and the source domain face features, Adjust some model parameters of the initialization recognition model until the initialization recognition model converges, and the face recognition models for the source domain and the target domain are obtained.
  • the initialization model After the initialization model is trained using the full amount of face image samples in the source domain, some face features in the source domain are saved, and some parameters of the initialization model are fixed. Furthermore, after the initialization model is further trained using the target face image samples in the target domain and the face feature training in the source domain, the face recognition models for the source domain and the target domain are obtained.
  • the face recognition model not only maintains the ability to recognize the full amount of face images in the source domain, but also can accurately identify the target face images in the target domain. In the case of training, the face image can be accurately recognized, and the recognition ability and accuracy of the face recognition model are improved.
  • the above initialization training module 701 may include:
  • the sample screening unit is used to screen the full amount of human face image samples according to a preset screening strategy to obtain the screened full amount of human face image samples;
  • the preset screening strategy keeps the number of the screened full face image samples unchanged, the number of identity information corresponding to the screened full face image samples is not less than the preset number.
  • the feature acquisition unit is used to input the full amount of human face image samples after the screening into the initial recognition model, and obtain the human face features output by the middle layer of the initial recognition model.
  • a feature determination unit configured to determine source domain face features based on the face features output by the intermediate layer.
  • the above-mentioned initialization recognition model includes a fixed parameter part and a part to be trained
  • the above incremental training module 703 may include:
  • the first input unit 801 is configured to input the target face image sample into the parameter fixed part and the part to be trained to obtain a first predicted label, and based on the first predicted label and the target human face image
  • the pseudo-labels corresponding to the samples determine the first classification loss.
  • the second input unit 802 is configured to input the source domain facial features into the part to be trained to obtain a second predicted label, and based on the second predicted label and the identity label corresponding to the source domain facial features, Determine the second classification loss.
  • the third input unit 803 is configured to input the source domain face features into the part to be trained and the initial part corresponding to the part to be trained to obtain estimated features and initial features, and based on the estimated features and the initial features, determining a constraint loss;
  • the initial part is the corresponding model part when the model parameters of the part to be trained are fixed to the model parameters trained based on the full amount of face image samples.
  • a parameter adjustment unit 804 configured to adjust model parameters of the part to be trained based on the first classification loss, the second classification loss and the constraint loss.
  • the above-mentioned device may further include:
  • the target domain sample clustering module is used to classify the target human face image sample before the step of inputting the target human face image sample into the parameter fixed part and the to-be-trained part to obtain the first predicted label Perform clustering to determine the pseudo-label corresponding to each target face image sample;
  • the pseudo-label is used to identify the identity of the person to which the corresponding target face image sample belongs.
  • the parameter adjustment unit 804 may include:
  • the loss function value calculation subunit is used to calculate the loss function value L according to the following formula based on the first classification loss, the second classification loss and the constraint loss:
  • L c1 is the first classification loss
  • L c2 is the second classification loss
  • L kd is the constraint loss
  • is a preset parameter.
  • the parameter adjustment subunit is configured to adjust the model parameters of the part to be trained based on the loss function value.
  • the above-mentioned third input unit 803 may include:
  • the constraint loss calculation subunit is used to calculate the constraint loss L kd according to the following formula based on the estimated features and the initial features:
  • n is the number of face features in the source domain
  • F i is the initial feature corresponding to the i-th source domain face feature
  • the above feature determination unit may include:
  • the feature dimensionality reduction subunit is configured to perform dimensionality reduction processing on the facial features output by the intermediate layer, and obtain the reduced dimensionality facial features as source domain facial features.
  • the above-mentioned means may also include:
  • a feature recovery module configured to dimension the source domain face features before the step of adjusting some model parameters of the initialization recognition model based on the target face image sample and the source domain face features Restoration processing to obtain the restored source domain face features.
  • the above-mentioned device may further include:
  • a face image acquisition module to be recognized configured to obtain a face image to be recognized in the target domain
  • An identity determining module configured to identify the face image to be recognized based on the face recognition model, and determine the identity corresponding to the face image to be recognized.
  • the embodiment of the present application also provides an electronic device, as shown in FIG. 9 , including a processor 901, a communication interface 902, a memory 903, and a communication bus 904. complete the mutual communication,
  • Memory 903 used to store computer programs
  • the processor 901 is configured to implement the method steps described in any of the foregoing embodiments when executing the program stored in the memory 903 .
  • the communication bus mentioned above for the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the electronic device and other devices.
  • the memory may include a random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located far away from the aforementioned processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, it implements any of the foregoing embodiments steps of the method described above.
  • a computer program product including instructions is also provided, which, when run on a computer, causes the computer to execute the method steps described in any of the above embodiments.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a Solid State Disk (SSD)).
  • SSD Solid State Disk

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种人脸识别模型的训练方法、装置、电子设备及存储介质,方法包括:获取源域人脸特征以及初始化识别模型;获取目标域的目标人脸图像样本;基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数,直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。初始化模型在使用源域的全量人脸图像样本训练后,保存部分源域人脸特征,并固定初始化模型部分参数。进而,使用目标域的目标人脸图像样本和源域人脸特征对该初始化模型进行进一步训练后,得到针对源域和目标域的人脸识别模型。既保持了对源域全量人脸图像的识别能力,又可以准确识别目标域的目标人脸图像,提高了人脸识别模型的识别能力和精度。

Description

人脸识别模型的训练方法、装置、电子设备及存储介质
本申请要求于2021年12月29日提交中国专利局、申请号为202111637930.4发明名称为“人脸识别模型的训练方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及深度学习技术领域,特别是涉及一种人脸识别模型的训练方法、装置、电子设备及存储介质。
背景技术
随着计算机和深度学习技术的快速发展,深度学习模型在人脸识别领域的应用越来越广泛。目前基于深度学习的人脸识别模型需要使用全量人脸数据进行训练,训练后的人脸识别模型可以基于人的脸部特征信息进行身份识别,但使用源域数据训练的人脸识别模型在目标域的性能较差。
由于模型训练具有灾难性遗忘的特点,人脸识别模型单独使用目标域数据训练之后会遗忘在源域的识别性能。
在仅有少量存储空间可以存储少量源域数据的情况下,如何在目标域对人脸识别模型进行训练,才能使训练得到的人脸识别模型既可以保持源域的性能,又可以提升其在目标域的性能,以提高人脸识别模型的识别能力和精度,是亟需解决的问题。
发明内容
本申请实施例提供一种人脸识别模型的训练方法、装置、电子设备及存储介质,至少可以保持人脸识别模型在源域的性能,又提升其在目标域的性能。具体技术方案如下:
第一方面,本申请实施例提供了一种人脸识别模型的训练方法,所述方法包括:
获取源域人脸特征以及初始化识别模型,其中,所述初始化识别模型基于源域的全量人脸图像样本训练得到,所述源域人脸特征为通过所述初始化识别模型获得的所述全量人脸图像样本的人脸特征;
获取目标域的目标人脸图像样本,其中,所述目标人脸图像样本对应的身份标签未知;
基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数,直到所述初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
可选的,所述获取源域人脸特征的步骤,包括:
按照预设筛选策略,对所述全量人脸图像样本进行筛选,得到筛选后的全量人脸图像样本,其中,所述预设筛选策略使得所述筛选后的全量人脸图像样本的数量不变的情况下,所述筛选后的全量人脸图像样本对应的身份信息数量不小于预设数量;
将所述筛选后的全量人脸图像样本输入所述初始化识别模型,获取所述初始化识别模型的中间层输出的人脸特征;
基于所述中间层输出的人脸特征,确定源域人脸特征。
可选的,所述初始化识别模型包括参数固定部分和待训练部分;
所述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的 部分模型参数的步骤,包括:
将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签,并基于所述第一预测标签以及所述目标人脸图像样本对应的伪标签,确定第一分类损失;
将所述源域人脸特征输入所述待训练部分,得到第二预测标签,并基于所述第二预测标签以及所述源域人脸特征对应的身份标签,确定第二分类损失;
将所述源域人脸特征分别输入所述待训练部分以及所述待训练部分对应的初始部分,得到预估特征以及初始特征,并基于所述预估特征以及所述初始特征,确定约束损失,其中,所述初始部分为所述待训练部分的模型参数固定为基于所述全量人脸图像样本训练后的模型参数时对应的模型部分;
基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数。
可选的,在所述将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签的步骤之前,所述方法还包括:
对所述目标人脸图像样本进行聚类,确定每个目标人脸图像样本对应的伪标签,其中,所述伪标签用于标识对应的目标人脸图像样本所属的人员身份。
可选的,所述基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数的步骤,包括:
基于所述第一分类损失、所述第二分类损失及所述约束损失,按照以下公式计算得到损失函数值L:
L=L c1+L c2+λL kd
其中,L c1为所述第一分类损失,L c2为所述第二分类损失,L kd为所述约束损失,λ为预设参数;
基于所述损失函数值,调整所述待训练部分的模型参数。
可选的,所述基于所述预估特征以及所述初始特征,确定约束损失的步骤,包括,
基于所述预估特征以及所述初始特征,按照以下公式计算得到所述约束损失L kd
Figure PCTCN2022142777-appb-000001
其中,n为所述源域人脸特征的数量,F i为第i个源域人脸特征对应的初始特征,
Figure PCTCN2022142777-appb-000002
为第i个源域人脸特征对应的预估特征。
可选的,所述基于所述中间层输出的人脸特征,确定源域人脸特征的步骤,包括:
对所述中间层输出的人脸特征进行降维处理,得到降维后的人脸特征,作为源域人脸特征;
在所述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤之前,所述方法还包括:
对所述源域人脸特征进行维度恢复处理,得到恢复后的源域人脸特征。
可选的,所述方法还包括:
获取所述目标域的待识别人脸图像;
基于所述人脸识别模型对所述待识别人脸图像进行识别,确定所述待识别人脸图像对应的身份。
第二方面,本申请实施例提供了一种人脸识别模型的训练装置,所述装置包括:
初始化训练模块,用于获取源域人脸特征以及初始化识别模型,其中,所述初始化识别模型基于源域的全量人脸图像样本训练得到,所述源域人脸特征为通过所述初始化识别模型获得的所述全量人脸图像样本的人脸特征;
目标域样本获取模块,用于获取目标域的目标人脸图像样本,其中,所述目标人脸图像样本对应的身份标签未知;
增量训练模块,用于基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数,直到所述初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
可选的,所述初始化训练模块包括:
样本筛选单元,用于按照预设筛选策略,对所述全量人脸图像样本进行筛选,得到筛选后的全量人脸图像样本,其中,所述预设筛选策略使得所述筛选后的全量人脸图像样本的数量不变的情况下,所述筛选后的全量人脸图像样本对应的身份信息数量不小于预设数量;
特征获取单元,用于将所述筛选后的全量人脸图像样本输入所述初始化识别模型,获取所述初始化识别模型的中间层输出的人脸特征;
特征确定单元,用于基于所述中间层输出的人脸特征,确定源域人脸特征。
可选的,所述初始化识别模型包括参数固定部分和待训练部分;
所述增量训练模块包括:
第一输入单元,用于将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签,并基于所述第一预测标签以及所述目标人脸图像样本对应的伪标签,确定第一分类损失;
第二输入单元,用于将所述源域人脸特征输入所述待训练部分,得到第二预测标签,并基于所述第二预测标签以及所述源域人脸特征对应的身份标签,确定第二分类损失;
第三输入单元,用于将所述源域人脸特征分别输入所述待训练部分以及所述待训练部分对应的初始部分,得到预估特征以及初始特征,并基于所述预估特征以及所述初始特征,确定约束损失,其中,所述初始部分为所述待训练部分的模型参数固定为基于所述全量人脸图像样本训练后的模型参数时对应的模型部分;
参数调整单元,用于基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数。
可选的,所述装置还包括:
目标域样本聚模块,用于在所述将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签的步骤之前,对所述目标人脸图像样本进行聚类,确定每个目标人脸图像样本对应的伪标签,其中,所述伪标签用于标识对应的目标人脸图像样 本所属的人员身份。
可选的,所述参数调整单元包括:
损失函数值计算子单元,用于基于所述第一分类损失、所述第二分类损失及所述约束损失,按照以下公式计算得到损失函数值L:
L=L c1+L c2+λL kd
其中,L c1为所述第一分类损失,L c2为所述第二分类损失,L kd为所述约束损失,λ为预设参数;
参数调整子单元,用于基于所述损失函数值,调整所述待训练部分的模型参数。
可选的,所述第三输入单元包括:
约束损失计算子单元,用于基于所述预估特征以及所述初始特征,按照以下公式计算得到所述约束损失L kd
Figure PCTCN2022142777-appb-000003
其中,n为所述源域人脸特征的数量,F i为第i个源域人脸特征对应的初始特征,
Figure PCTCN2022142777-appb-000004
为第i个源域人脸特征对应的预估特征。
可选的,所述特征确定单元包括:
特征降维子单元,用于对所述中间层输出的人脸特征进行降维处理,得到降维后的人脸特征,作为源域人脸特征;
所述装置还包括:
特征恢复模块,用于在所述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤之前,对所述源域人脸特征进行维度恢复处理,得到恢复后的源域人脸特征。
可选的,所述装置还包括:
待识别人脸图像获取模块,用于获取所述目标域的待识别人脸图像;
身份确定模块,用于基于所述人脸识别模型对所述待识别人脸图像进行识别,确定所述待识别人脸图像对应的身份。
第三方面,本申请实施例提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
存储器,用于存放计算机程序;
处理器,用于执行存储器上所存放的程序时,实现上述第一方面任一所述的方法步骤。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面任一所述的方法步骤。
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述第一方面任一所述的方法步骤。
本申请实施例有益效果:
本申请实施例提供的方案中,电子设备可以获取源域人脸特征以及初始化识别模型, 其中,初始化识别模型基于源域的全量人脸图像样本训练得到,源域人脸特征为通过初始化识别模型获得的全量人脸图像样本的人脸特征;获取目标域的目标人脸图像样本,其中,目标人脸图像样本对应的身份标签未知;基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数,直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。初始化模型在使用源域的全量人脸图像样本训练后,保存部分源域人脸特征,并固定初始化模型的部分参数。进而,使用目标域的目标人脸图像样本和源域人脸特征训练对该初始化模型进行进一步训练后,得到针对源域和目标域的人脸识别模型。该人脸识别模型既保持了对源域全量人脸图像的识别能力,同时又可以准确识别目标域的目标人脸图像,在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的情况下,能够准确识别人脸图像,提高了人脸识别模型的识别能力和精度。当然,实施本申请的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种人脸识别模型的训练方法的流程图;
图2为图1所示实施例中步骤S101的获取源域人脸特征的一种具体流程图;
图3为图1所示实施例中步骤S103的调整初始化识别模型的部分模型参数的一种具体流程图;
图4为基于图1所示实施例的使用目标人脸图像样本和源域人脸特征对初始化识别模型进行训练的一种示意图;
图5为基于图1所示实施例的特征提取和特征降维处理的一种示意图;
图6为基于图1所示实施例的一种确定人脸图像的身份的一种具体流程图;
图7为本申请实施例所提供的一种人脸识别模型的训练装置的结构示意图;
图8为图7所示实施例中的增量训练模块的一种具体结构示意图;
图9为本申请实施例所提供的一种电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的场景下,能够使得训练得到的人脸识别模型既可以保持源域的性能,又可以提升其在目标域的性能,提高人脸识别模型的识别能力和精度,本申请实施例提供了一种人脸识别模型的训练方法、装置、电子设备、计算机可读存储介质以及计算机程序产品,下面首先对本申请实施例所提供的一种人脸识别模型的训练方法进行介绍。
本申请实施例所提供的人脸识别模型的训练方法可以应用于任意能够进行人脸识别模型训练的电子设备,例如,可以为各种用于模型训练的计算设备,可以为各种园区的出入口闸机对应的服务器,可以为安保***中人脸识别设备的服务器等,在此不做具体限定。为了描述清楚,后续称为电子设备。
如图1所示,一种人脸识别模型的训练方法,所述方法包括:
S101,获取源域人脸特征以及初始化识别模型;
其中,所述初始化识别模型基于源域的全量人脸图像样本训练得到,所述源域人脸特征为通过所述初始化识别模型获得的所述全量人脸图像样本的人脸特征。
S102,获取目标域的目标人脸图像样本;
其中,所述目标人脸图像样本对应的身份标签未知。
S103,基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数,直到所述初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
可见,本申请实施例提供的方案中,电子设备可以获取源域人脸特征以及初始化识别模型,其中,初始化识别模型基于源域的全量人脸图像样本训练得到,源域人脸特征为通过初始化识别模型获得的全量人脸图像样本的人脸特征;获取目标域的目标人脸图像样本,其中,目标人脸图像样本对应的身份标签未知;基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数,直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。初始化模型在使用源域的全量人脸图像样本训练后,保存部分源域人脸特征,并固定初始化模型的部分参数。进而,使用目标域的目标人脸图像样本和源域人脸特征训练对该初始化模型进行进一步训练后,得到针对源域和目标域的人脸识别模型。该人脸识别模型既保持了对源域全量人脸图像的识别能力,同时又可以准确识别目标域的目标人脸图像,在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的情况下,能够准确识别人脸图像,提高了人脸识别模型的识别能力和精度。
基于深度学习的人脸识别模型对常规场景的人脸识别性能已经比较优秀了,但是在一些特殊的场景,比如小孩人脸识别、低画质人脸识别和戴口罩的人脸识别等场景中,人脸识别的性能还有一定的上升空间。例如,卷积神经网络是一种常用的用于人脸识别的深度学习网络,卷积神经网络进行训练时,由于数据隐私或训练资源等问题,不同场景中的人脸图像无法同时进行训练。卷积神经网络的训练还具有灾难性遗忘的特性,灾难性遗忘是指通过训练已经获取部分人脸识别能力的人脸识别模型在学习识别新的人脸图像时,忘记或丧失了以前获取的部分人脸识别能力。
为了人脸识别模型能够准确识别人脸图像,需要使用全量人脸数据对模型进行训练。然而由于数据隐私等方面的原因,可能无法获得全量人脸数据,那么也就只能采用非全量人脸数据对模型进行训练,而使用这样的非全量人脸数据对模型进行训练,得到的人脸识别模型的人脸识别性能会非常差。例如,人脸识别模型为基于卷积神经网络的模型,人脸数据为佩戴口罩的人脸图像时,由于存在灾难性遗忘的特性,如果只使用这些佩戴口罩的人脸图像对人脸识别模型进行常规训练,得到的人脸识别模型几乎不能识别未佩戴口罩的人脸图像,识别能力和精度非常差。
在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的情况下,为了能够准确识别目标域的人脸图像,电子设备可以使用域适应增量学习的方式对人脸识别模型进行训练。本申请实施例中,用于训练人脸识别模型的人脸数据可以分为源域的全量人脸图像样本和目标域的目标人脸图像样本,其中,全量人脸图像样本为身份信息已知的人脸图像;目标域的目标人脸图像样本为待识别人员的人脸图像样本,目标人脸图像样本对应的身份标签未知。
域适应是源域和目标域对应的数据分布不同,但任务相同的一种迁移学习方法。例如,在本实施例中,全量人脸图像样本和目标人脸图像样本虽然分布不同,但是任务都是用来训练人脸识别模型的。增量学习是指能不断从新样本中学习新的知识,并能保存大部分已经学习到知识的学习方法。例如,在本实施例中,使用源域的全量人脸图像样本对初始化模型进行训练后,进一步使用目标域的目标人脸图像样本对初始化模型进行训练得到人脸识别模型,人脸识别模型可以保留对源域的人脸图像的识别能力,还增强了对目标域的目标人脸图像的识别能力。
在上述步骤S101中,电子设备可以获取源域人脸特征以及初始化识别模型,其中,初始化识别模型是基于源域的全量人脸图像样本训练得到的。全量人脸图像样本可以为CASIA、VGGFace2或MS1MV2等,在此不做限定。源域的全量人脸图像样本的身份信息是已知的,即全量人脸图像样本的身份标签已知。在对初始化模型进行训练时,将源域的全量人脸图像样本输入初始化模型后,初始化模型可以得到全量人脸图像样本的预测标签,基于预测标签和全量人脸图像样本的身份标签,电子设备可以按照以下公式计算得到分类损失:
Figure PCTCN2022142777-appb-000005
其中L c为使用的交叉熵损失,m表示margin的大小,s表示scale值的大小,θ表示权重和特征之间的夹角,i表示输入样本的索引,y i表示索引为i的输入样本对应的标签,N表示样本的数量。
进而,基于上述公式计算得到的分类损失,电子设备可以通过调整人脸识别模型的模型参数,来持续减小分类损失,直到源域的全量人脸图像样本的迭代次数达到预设次数,确定初始化模型收敛,得到初始化识别模型。当然,也可以基于初始化模型的损失函数收敛确定初始化模型收敛,得到初始化识别模型,这都是合理的。这样,训练完成的初始化识别模型具有对源域人脸图像的识别能力,也就是具有对普通人的人脸识别能力。
为了使人脸识别模型在增量训练后可以保持对源域人脸图像的识别能力,可以提取源域的全量人脸图像样本的人脸特征,并在增量训练的过程中使用源域人脸特征对初始化识别模型进行训练,使得初始化识别模型更好地保留对源域人脸图像的识别能力。在一种实施方式中,电子设备可以仅存储少量源域人脸特征,例如,在仅有少量存储空间的情况下,电子设备可以将随机挑选的一部分全量人脸图像样本输入初始化识别模型,获取初始化识别模型输出的人脸特征,作为源域人脸特征。
通过初始化训练得到的初始化识别模型已经具有了对源域人脸图像的识别能力,为了 增强对目标域人脸图像的识别能力,可以使用目标域的目标人脸图像样本对初始化识别模型进行增量训练。电子设备获取源域人脸特征以及初始化识别模型后,可以获取目标域的目标人脸图像样本,即执行上述步骤S102。
目标域的目标人脸图像样本可以是电子设备采集的,也可以是由外部设备输入电子设备的,在此不做限定。目标人脸图像样本对应的身份标签未知,即目标域的目标人脸图像样本对应的具体身份信息是无法确定的。例如,电子设备获取的目标域的目标人脸图像样本为某产业园区的多个员工的人脸图像,由于隐私保护,这些人脸图像无法确定真实身份,电子设备可以将这些人脸图像进行聚类操作,分别记录每一类人脸图像的标签为“员工A”、“员工B”等,以便在人脸识别的过程中对人员的身份作出判断。
进而,在上述步骤S103中,电子设备可以基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数,直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
为了保持初始化识别模型对源域人脸图像的识别能力,并且增强对目标域的目标人脸图像的识别能力,初始化识别模型可以包括参数固定部分和参数可调整部分,参数固定部分的模型参数保持不变,这样可以保持初始化识别模型对源域人脸图像的识别能力。
进而,电子设备可以将目标域的目标人脸图像样本和源域人脸特征输入初始化识别模型,并对初始化识别模型进行训练,电子设备可以调整初始化识别模型的部分模型参数,也即参数可调整部分的模型参数,直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型,针对源域和目标域的人脸识别模型也就具有了对目标域的目标人脸图像的识别能力。
采用本申请实施例所提供的方案,电子设备可以使用源域的全量人脸图像样本对人脸识别模型进行训练,得到的初始化识别模型具有对源域人脸图像的识别能力。在使用目标域的目标人脸图像样本对初始化识别模型进行训练的过程中,通过固定初始化识别模型的部分参数和使用源域人脸特征对初始化识别模型再次训练,得到的针对源域和目标域的人脸识别模型既保持了对源域人脸图像的识别能力,同时又可以准确识别目标域的目标人脸图像。在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的情况下,仅采用少量源域人脸特征即可以得到针对源域和目标域的人脸识别模型,训练得到的人脸识别模型既可以保持源域的性能,又可以提升其在目标域的性能,能够准确识别人脸图像,提高了人脸识别模型的识别能力和精度。
作为本申请实施例的一种实施方式,如图2所示,上述获取源域人脸特征的步骤,可以包括:
S201,按照预设筛选策略,对所述全量人脸图像样本进行筛选,得到筛选后的全量人脸图像样本;
其中,预设筛选策略可以使得在筛选后的全量人脸图像样本的数量不变的情况下,筛选后的全量人脸图像样本对应的身份信息数量不小于预设数量。
初始化训练的过程中,对人脸识别模型进行训练的源域的全量人脸图像样本的数据规模通常十分庞大,不利于存储。并且在基于目标人脸图像样本以及源域人脸特征,调整初 始化识别模型的部分模型参数的步骤中,并不需要使用所有源域的全量人脸图像样本对应的源域人脸特征,也可以保持初始化识别模型对源域人脸图像的识别能力。所以,为了减小数据存储所需存储空间和人脸识别模型训练所需的计算量,电子设备可以对源域的全量人脸图像样本进行筛选。
例如,针对每一个身份信息对应的源域的全量人脸图像样本,只需要提取三张全量人脸图像样本包含的源域人脸特征,就可以保持初始化识别模型对源域人脸图像的识别能力,那么,可以筛选出每一个身份信息对应的三张全量人脸图像样本用于后续的源域人脸特征提取。
电子设备可以按照预设筛选策略,对全量人脸图像样本进行筛选,得到筛选后的全量人脸图像样本。在存储空间大小不变即筛选后的全量人脸图像样本的数量不变的情况下,全量人脸图像样本的类间丰富性越高,初始化识别模型可以更好地维持对源域人脸图像的识别能力。也就是说,可以尽量获取多个身份信息所对应的全量人脸图像样本,而每个身份信息对应的不同状态的全量人脸图像样本可以不必过多,即类内丰富性可以适当减低,从而更好地维持针对目标域的人脸识别模型对源域人脸图像的识别能力。
所以上述预设筛选策略可以为使得筛选后的全量人脸图像样本的数量不变的情况下,筛选后的全量人脸图像样本对应的身份信息数量不小于预设数量的策略,其中,预设数量可以根据存储空间、计算量的大小等要求设置,在此不做具体限定。
在一种实施方式中,电子设备可以从所有全量人脸图像样本中随机选择一定数量的身份信息对应的全量人脸图像样本,然后针对每一个身份信息对应的全量人脸图像样本,从中随机选择一定数量的全量人脸图像样本。例如,电子设备可以先随机选择1000个身份信息对应的全量人脸图像样本,然后针对每一个身份信息对应的全量人脸图像样本,从中随机选择10个全量人脸图像样本。
在另一种实施方式中,针对每一个身份信息对应的全量人脸图像样本,电子设备可以根据各个全量人脸图像样本的人脸特征与特征中心之间的距离来选择全量人脸图像样本,例如,可以选择距离特征中心最近的一定数量的全量人脸图像样本。其中,特征中心为这些全量人脸图像样本所属的身份信息对应的分类器向量。与特征中心之间的距离越近,说明全量人脸图像样本的人脸特征标识所属的身份信息的人员的真实人脸特点的准确性越高,因此采用这样的全量人脸图像样本进行后续特征提取,可以更有利于训练得到的人脸识别模型的识别能力。
S202,将所述筛选后的全量人脸图像样本输入所述初始化识别模型,获取所述初始化识别模型的中间层输出的人脸特征。
确定了筛选后的全量人脸图像样本,电子设备可以将筛选后的全量人脸图像样本输入初始化识别模型,得到初始化识别模型的中间层输出的人脸特征。其中,初始化识别模型的中间层即为初始化识别模型中的非第一层,也非最后一层。可以采用公式F i=f(x i)来表示特征提取操作,其中,F i表示提取得到的人脸特征,x i表示输入初始化识别模型的全量人脸图像样本,f(x)表示初始化识别模型中进行特征提取所基于的函数。
作为一种实施方式,初始化识别模型可以包括多层,例如,初始化识别模型为残差网 络,该残差网络包括四个残差块,在对初始化训练进行增量训练过程中,可以固定前三个残差块的模型参数,只调整第四个残差块的模型参数。那么,电子设备可以获取初始化识别模型的第三个残差块输出的人脸特征。
S203,基于所述中间层输出的人脸特征,确定源域人脸特征。
电子设备获取初始化识别模型的中间层输出的人脸特征后,可以确定源域人脸特征。在一种实施方式中,电子设备可以将该中间层输出的人脸特征作为源域人脸特征。在另一种实施方式中,电子设备可以对该中间层输出的人脸特征进行降维处理,得到降维处理后的人脸特征,作为源域人脸特征,以便于进行存储。
在本实施例中,电子设备可以对全量人脸图像样本进行筛选,并基于筛选后的全量人脸图像样本确定源域人脸特征,源域人脸特征有利于维持针对目标域的人脸识别模型对源域人脸图像的识别能力。通过筛选源域的全量人脸图像样本,在不降低人脸识别模型对源域人脸图像的识别能力的基础上,降低了人脸识别模型训练所需的数据存储空间和计算量。
作为本申请实施例的一种实施方式,上述初始化识别模型可以包括参数固定部分和待训练部分。
为了维持人脸识别模型对源域人脸图像的识别能力,基于源域的全量人脸图像样本训练,得到的初始化识别模型后,可以将初始化识别模型分为参数固定部分和待训练部分,在使用目标人脸图像样本以及源域人脸特征训练初始化识别模型的过程中,参数固定部分的模型参数不再进行调整,以保持人脸识别模型对于源域的人脸图像的识别能力。而待训练部分的模型参数进行调整,以使得训练得到的人脸识别模型对于目标域的人脸图像的识别能力也是较强的。
在一种实施方式中,源域人脸特征为初始化识别模型的中间层输出的人脸特征,那么,初始化识别模型的参数固定部分可以与输出源域人脸特征的中间层相一致,即为处理得到源域人脸特征的模型部分。例如,初始化识别模型为残差网络,该残差网络包括四个残差块,源域人脸特征是第三个残差块输出的。那么,初始化识别模型的参数固定部分可以包括前三个残差块,待训练部分包括第四个残差块和分类器。
相应的,如图3所示,上述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤,可以包括:
S301,将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签,并基于所述第一预测标签以及所述目标人脸图像样本对应的伪标签,确定第一分类损失。
基于目标人脸图像样本以及源域人脸特征,对初始化识别模型进行增量训练,具体可以采用教师学生网络的方式进行训练,学生网络即人脸识别模型可以使用少量的存储特征即上述源域人脸特征,对教师网络即初始化识别模型进行增量训练,使得学生网络获得教师网络相近的性能,即保持对源域人脸图像的识别能力。增量训练过程中,使用目标域的目标人脸图像样本对学生网络进行进一步训练,使得其能够具有针对目标域的目标人脸图像的识别能力。
由于目标域的目标人脸图像样本对应的身份标签未知,电子设备可以在获取目标域的 目标人脸图像样本后,确定每个目标人脸图像样本对应的伪标签,伪标签用于标识对应的目标人脸图像样本所属的人员身份,但是并不是目标人脸图像样本所属的人员的真实身份。例如,伪标签可以为A、B、C,或者11、12、13等,在此不做具体限定。
为了提高人脸识别模型对目标域人脸图像的识别能力,电子设备可以将每个目标人脸图像样本输入参数固定部分和待训练部分,得到第一预测标签,进而,基于第一预测标签与目标人脸图像样本对应的伪标签之间差异,电子设备可以计算出目标人脸图像样本对应的分类损失,即第一分类损失。该第一分类损失可以表征当前的人脸识别模型对于目标域的人脸图像的识别结果与真实结果之间的差异。
S302,将所述源域人脸特征输入所述待训练部分,得到第二预测标签,并基于所述第二预测标签以及所述源域人脸特征对应的身份标签,确定第二分类损失。
为了维持人脸识别模型对源域人脸图像的识别能力,电子设备可以将源域人脸特征输入待训练部分,得到待训练部分输出的第二预测标签。由于源域人脸特征对应的全量人脸图像样本的身份标签是已知的,所以电子设备可以基于第二预测标签与源域人脸特征对应的身份标签之间差异,计算出该源域人脸特征对应的分类损失,即第二分类损失。该第二分类损失可以表征当前的人脸识别模型对于源域的人脸图像的识别结果与真实结果之间的差异。
S303,将所述源域人脸特征分别输入所述待训练部分以及所述待训练部分对应的初始部分,得到预估特征以及初始特征,并基于所述预估特征以及所述初始特征,确定约束损失;
其中,初始部分为待训练部分的模型参数固定为基于全量人脸图像样本训练后的模型参数时对应的模型部分。
电子设备可以固定待训练部分的模型参数,固定的模型参数为基于全量人脸图像样本训练后的初始化识别模型的待训练部分的模型参数,初始化识别模型的待训练部分的模型参数固定后,即为初始部分。例如,初始化识别模型为残差网络,该残差网络包括四个残差块,固定前三个残差块的模型参数,只调整第四个残差块的模型参数,待训练部分则包括该第四个残差块,将第四个残差块的模型参数固定为与初始化模型对应的第四个残差块的参数,即为初始部分。
电子设备可以将源域人脸特征输入待训练部分,待训练部分可以基于当前模型参数,对源域人脸特征进行进一步的特征提取,得到预估特征。将源域人脸特征输入待训练部分对应的初始部分,该初始部分可以基于固定的模型参数,对源域人脸特征进行进一步的特征提取,得到初始特征。
进而,电子设备可以基于预估特征与初始特征之间的差异,计算得到约束损失。该约束损失可以表征待训练部分提取得到的人脸特征与初始化模型中模型参数固定的相应部分提取得到的人脸特征之间差异,可以作为一种约束条件来监督人脸识别模型的训练。其中,上述步骤S301-步骤S303的执行顺序并不做具体限定。
S304,基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数。
得到上述第一分类损失、第二分类损失及约束损失后,电子设备便可以基于第一分类损失、第二分类损失及约束损失,调整待训练部分的模型参数,以对初始化识别模型进行训练,直到目标人脸图像样本和源域人脸特征的迭代次数达到预设次数,确定初始化模型收敛。
在另一种实施方式中,可以基于第一分类损失、第二分类损失及约束损失计算得到人脸识别模型的损失函数值,基于该损失函数值调整待训练部分的模型参数,直到人脸识别模型的损失函数收敛,确定初始化识别模型收敛。
其中,模型参数调整的具体方式可以采用梯度下降算法、随机梯度下降算法等,在此不做具体限定及说明。
由于第一分类损失可以表征当前的人脸识别模型对于目标域的人脸图像的识别结果与真实结果之间的差异,第二分类损失可以表征当前的人脸识别模型对于源域的人脸图像的识别结果与真实结果之间的差异,约束损失可以表征待训练部分提取得到的人脸特征与初始化模型中模型参数固定的相应部分提取得到的人脸特征之间差异,所以基于第一分类损失、第二分类损失及约束损失调整待训练部分的模型参数,可以使得人脸识别模型对于目标域的人脸图像的识别结果与真实结果之间的差异越来越小,并且保持对源域的人脸图像的识别结果的准确度。
在本实施例中,电子设备可以使用源域人脸特征对初始化识别模型进行训练,来保持人脸识别模型对源域人脸图像的识别能力,电子设备可以使用目标域的目标人脸图像样本对初始化识别模型进行训练,来提高人脸识别模型对目标域人脸图像的识别能力。通过固定初始化识别模型的部分模型参数,更好地保持了人脸识别模型对源域人脸图像的识别能力,并提高人脸识别模型对目标域的目标人脸图像的识别能力。
下面结合图4对本申请实施例所提供基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数的流程进行举例介绍,其中,conv为初始化识别模型的卷积层,bn表示对数据进行批量标准化,relu和tanh为激活函数层使用的激活函数,residual block为初始化识别模型的残差块,参数固定部分包括前三层残差块,待训练部分包括第四层残差块和分类器,初始部分为固定了模型参数的第四层残差块。
电子设备可以将目标人脸图像样本输入参数固定部分和待训练部分,得到第一预测标签,并基于第一预测标签以及目标人脸图像样本对应的伪标签,确定第一分类损失。
电子设备可以将降维后的人脸特征进行维度恢复处理,得到恢复后的源域人脸特征,并将恢复后的源域人脸特征输入待训练部分,得到第二预测标签,并基于第二预测标签以及源域人脸特征对应的身份标签,确定第二分类损失。
电子设备可以将恢复后的源域人脸特征分别输入待训练部分以及待训练部分对应的初始部分,得到预估特征以及初始特征,并基于预估特征以及初始特征,确定约束损失。
进而,电子设备可以基于第一分类损失、第二分类损失及约束损失计算得到人脸识别模型的损失函数,调整待训练部分的模型参数直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
由于上述各个流程的具体实施方式已在上述各实施例中进行了介绍,在此不再赘述。 在本实施例中,通过固定初始化识别模型的部分参数和使用源域人脸特征对初始化识别模型再次训练,使得得到的针对目标域的人脸识别模型保持了对源域人脸图像的识别能力,也增强了目标域人脸图像的识别能力。在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的情况下,能够准确识别人脸图像,提高了人脸识别模型的识别能力和精度。
作为本申请实施例的一种实施方式,在上述将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签的步骤之前,上述方法还可以包括:
对所述目标人脸图像样本进行聚类,确定每个目标人脸图像样本对应的伪标签。
由于目标域的目标人脸图像样本对应的身份标签未知,初始化识别模型无法使用这样的目标人脸图像样本进行训练,因此,电子设备可以对目标域的目标人脸图像样本进行聚类处理,确定每个目标人脸图像样本对应的伪标签。
聚类处理过程中,可以按照人脸特征的相似度,将目标人脸图像样本按照所属身份进行分类,从而得到多组目标人脸图像样本,其中,每一组目标人脸图像样本属于同一个人员,电子设备可以给各组目标人脸图像样本打上伪标签,用于标识对应的目标人脸图像样本所属的人员身份。
在一种实施方式中,可以使用k-means++(K均值)聚类算法对目标人脸图像样本进行聚类,得到多个类别,并确定各个类别所包括的每个目标人脸图像样本的伪标签。电子设备还可以采用高斯混合模型的最大期望聚类、凝聚层次聚类、均值漂移聚类等方式对目标人脸图像样本进行聚类,在此不做具体限定。
例如,目标域的目标人脸图像样本为某个工厂的工作人员的人脸图像,对目标人脸图像样本进行聚类,将目标人脸图像样本分为多组,每组目标人脸图像样本为一个工作人员的人脸图像,电子设备可以确定各组目标人脸图像样本的伪标签为A、B、C,或者11、12、13等,在此不做限定。
在本实施例中,电子设备可以对目标人脸图像样本进行聚类,确定每个目标人脸图像样本对应的伪标签,从而在无法获知目标人脸图像样本所属人员的真实身份的情况下,也可以确定准确的伪标签来标识对应的目标人脸图像样本所属的人员身份。
作为本申请实施例的一种实施方式,上述基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数的步骤,可以包括:
基于所述第一分类损失、所述第二分类损失及所述约束损失,按照以下公式计算得到损失函数值L:L=L c1+L c2+λL kd;基于所述损失函数值,调整所述待训练部分的模型参数。
其中,L c1为上述第一分类损失,L c2为上述第二分类损失,L kd为上述约束损失,λ为预设参数。
通过对第一分类损失、第二分类损失、预设参数的约束损失进行求和,得到的损失函数值可以准确表征目标域的人脸图像的识别结果与真实结果之间的差异、源域的人脸图像的识别结果与真实结果之间的差异和待训练部分提取得到的人脸特征与初始化模型中模型参数固定的相应部分提取得到的人脸特征之间差异。因此,电子设备可以采用上述公式 来计算损失函数值L,并基于损失函数值L,调整待训练部分的模型参数,以得到识别能力较强的人脸识别模型。
其中,预设参数λ的值可以根据训练过程中损失函数值的变化结合实际经验设置,在此不做具体限定。
在本实施例中,电子设备可以基于第一分类损失、第二分类损失及约束损失,按照公式计算得到损失函数值。并基于损失函数值,调整待训练部分的模型参数,以使初始化识别模型收敛。通过上述公式,电子设备可以准确计算损失函数值,增强了初始化识别模型的训练效果,提高了人脸识别模型对源域人脸图像的识别效果,也使得人脸识别模型在识别目标域人脸图像具有更优异的性能。
作为本申请实施例的一种实施方式,上述基于所述预估特征以及所述初始特征,确定约束损失的步骤,可以包括:
基于所述预估特征以及所述初始特征,按照以下公式计算得到所述约束损失L kd
Figure PCTCN2022142777-appb-000006
其中,n为所述源域人脸特征的数量,F i为第i个源域人脸特征对应的初始特征,
Figure PCTCN2022142777-appb-000007
为第i个源域人脸特征对应的预估特征。
通过计算每一个源域人脸特征对应的初始特征和源域人脸特征对应的预估特征的差值,可以得出待训练部分提取得到的每一个人脸特征与初始化模型中模型参数固定的相应部分提取得到的人脸特征之间差异,进而,通过计算各个源域人脸特征对应的差异值的方差,得到的约束损失可以准确地表征预估特征与初始特征的差异程度。因此,电子设备可以采用上述公式计算约束损失,以保证可以计算得到准确的损失函数值。
在本实施例中,电子设备可以计算每一个源域人脸特征经过模型训练后的约束损失,并由此计算出初始化识别模块约束损失。通过上述公式,电子设备可以准确计算约束损失,并将该约束损失作为监督来调整待训练部分的模型参数,可以得到准确性更高的人脸识别模型。
作为本申请实施例的一种实施方式,上述基于所述中间层输出的人脸特征,确定源域人脸特征的步骤,可以包括:
对所述中间层输出的人脸特征进行降维处理,得到降维后的人脸特征,作为源域人脸特征。
由于源域人脸特征是初始化识别模型的中间层输出的,特征维度较高,从而导致存储源域人脸特征需要较大的存储空间。为节省存储空间,电子设备可以对该中间层输出的人脸特征进行降维处理,从而在基本保持特征信息量的情况下,明显降低存储源域人脸特征所需的存储空间。
例如,可以采用PCA(Principal Component Analysis,主成分分析)降维法等,对中间层输出的人脸特征进行降维处理,得到降维后的人脸特征,作为源域人脸特征。PCA降维的核心操作为SVD(Singular ValueDecomposition,奇异值分解),SVD可以使用公式化表示为:AA T=UΣ 2U T,其中,A为待分解的矩阵,A T为A的转置矩阵,U为A的左奇异矩阵, U T为U的转置矩阵,Σ为包含对应特征值的对角矩阵。PCA降维法可以将原始数据中具有相关性的某些维度删除,在对数据进行降维的同时,最大化保留数据携带的信息量。
在一种实施方式中,由源域的全量人脸图像样本获取源域人脸特征并进行降维处理的过程可以如图5所示,其中,初始化识别模型可以为卷积神经网络,conv为卷积神经网络的卷积层,bn表示对数据进行批量标准化,relu和tanh为激活函数层使用的激活函数,residual block为卷积神经网络的残差块。参数固定部分包括前三层残差块,第四层残差块的模块参数可以改变,用于初始化模型训练,将源域的全量人脸图像样本输入卷积神经网络后,电子设备可以提取参数固定部分中间层输出的人脸特征,进而通过降维处理,得到降维后的人脸特征。
相应的,在上述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤之前,所述方法还可以包括:
对所述源域人脸特征进行维度恢复处理,得到恢复后的源域人脸特征。
为了尽量保持人脸识别模型对目标域人脸图像的识别能力,由于源域人脸特征是降维处理得到的,所以在基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数的步骤之前,电子设备可以对源域人脸特征进行维度恢复处理,得到恢复后的源域人脸特征,恢复后的源域人脸特征维度与参数固定部分中间层输出的人脸特征的维度相同。
在本实施例中,电子设备可以对中间层输出的人脸特征进行降维处理,并在基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数的步骤之前,对降维处理后的源域人脸特征进行维度恢复处理。从而在基本不影响人脸识别模型对目标域人脸图像的识别能力的基础上,可以明显降低存储源域人脸特征所需的存储空间。
作为本申请实施例的一种实施方式,如图6所示,上述方法还可以包括:
S601,获取所述目标域的待识别人脸图像。
在对初始化识别模型进行训练,得到了针对目标域的人脸识别模型后,该人脸识别模型可以识别目标域的人脸图像,还保持了对源域的人脸图像识别的能力,电子设备可以将该人脸识别模型部署于实际应用场景中,即目标域场景,进而获取目标域的待识别人脸图像。
例如,该人脸识别模型用于园区的出入口闸机的人脸识别时,可以将闸机中的原有模型换为该人脸识别模型,该人脸识别模型在园区人员的人脸识别中具有更优异的性能,同时也基本保持了对非园区人员的人脸识别的能力。那么,在人员想要出入闸机时,电子设备可以采集人员的人脸图像,作为目标域的待识别人脸图像。
S602,基于所述人脸识别模型对所述待识别人脸图像进行识别,确定所述待识别人脸图像对应的身份。
进而,电子设备可以基于人脸识别模型对待识别人脸图像进行识别,确定待识别人脸图像对应的身份。电子设备可以在识别待识别人脸图像后,可以确定待识别人脸图像对应的身份标识,并根据不同身份标识执行不同的操作。例如,电子设备可以根据待识别人脸图像对应的身份标识,控制闸机做出开门或保持关闭等动作。
例如,训练时目标人脸图像样本对应的伪标签为“人员A”、“人员B”等,在电子设备识别出待识别人脸图像对应的身份为“人员B”后,可以确定该人员为产业园区内人员,具有通行权限,那么电子设备可以控制闸机开门;如果识别出待识别人脸图像对应的身份为刘XX,可以确定该人员不为产业园区内人员,而是园区外的某一个人,不具有通行权限,那么电子设备可以控制闸机保持关闭。
在本实施例中,电子设备既可以识别目标域的目标人脸图像,又保留了对源域人脸图像的识别能力。进而,无论是源域还是目标域对应的人员,均可以基于人脸识别模型对待识别人脸图像进行识别,准确确定待识别人脸图像对应的身份。
相应于上述人脸识别模型的训练方法,本申请实施例还提供了一种人脸识别模型的训练装置,下面对本申请实施例所提供的一种人脸识别模型的训练装置进行介绍。
如图7所示,一种人脸识别模型的训练装置,所述装置包括:
初始化训练模块701,用于获取源域人脸特征以及初始化识别模型;
其中,所述初始化识别模型基于源域的全量人脸图像样本训练得到,所述源域人脸特征为通过所述初始化识别模型获得的所述全量人脸图像样本的人脸特征。
目标域样本获取模块702,用于获取目标域的目标人脸图像样本;
其中,所述目标人脸图像样本对应的身份标签未知。
增量训练模块703,用于基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数,直到所述初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
可见,本申请实施例提供的方案中,电子设备可以获取源域人脸特征以及初始化识别模型,其中,初始化识别模型基于源域的全量人脸图像样本训练得到,源域人脸特征为通过初始化识别模型获得的全量人脸图像样本的人脸特征;获取目标域的目标人脸图像样本,其中,目标人脸图像样本对应的身份标签未知;基于目标人脸图像样本以及源域人脸特征,调整初始化识别模型的部分模型参数,直到初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。初始化模型在使用源域的全量人脸图像样本训练后,保存部分源域人脸特征,并固定初始化模型的部分参数。进而,使用目标域的目标人脸图像样本和源域人脸特征训练对该初始化模型进行进一步训练后,得到针对源域和目标域的人脸识别模型。该人脸识别模型既保持了对源域全量人脸图像的识别能力,同时又可以准确识别目标域的目标人脸图像,在无法同时使用源域和目标域的人脸数据对人脸识别模型进行训练的情况下,能够准确识别人脸图像,提高了人脸识别模型的识别能力和精度。
作为本申请实施例的一种实施方式,上述初始化训练模块701可以包括:
样本筛选单元,用于按照预设筛选策略,对所述全量人脸图像样本进行筛选,得到筛选后的全量人脸图像样本;
其中,所述预设筛选策略使得所述筛选后的全量人脸图像样本的数量不变的情况下,所述筛选后的全量人脸图像样本对应的身份信息数量不小于预设数量。
特征获取单元,用于将所述筛选后的全量人脸图像样本输入所述初始化识别模型,获 取所述初始化识别模型的中间层输出的人脸特征。
特征确定单元,用于基于所述中间层输出的人脸特征,确定源域人脸特征。
作为本申请实施例的一种实施方式,上述初始化识别模型包括参数固定部分和待训练部分;
如图8所示,上述增量训练模块703可以包括:
第一输入单元801,用于将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签,并基于所述第一预测标签以及所述目标人脸图像样本对应的伪标签,确定第一分类损失。
第二输入单元802,用于将所述源域人脸特征输入所述待训练部分,得到第二预测标签,并基于所述第二预测标签以及所述源域人脸特征对应的身份标签,确定第二分类损失。
第三输入单元803,用于将所述源域人脸特征分别输入所述待训练部分以及所述待训练部分对应的初始部分,得到预估特征以及初始特征,并基于所述预估特征以及所述初始特征,确定约束损失;
其中,所述初始部分为所述待训练部分的模型参数固定为基于所述全量人脸图像样本训练后的模型参数时对应的模型部分。
参数调整单元804,用于基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数。
作为本申请实施例的一种实施方式,上述装置还可以包括:
目标域样本聚类模块,用于在所述将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签的步骤之前,对所述目标人脸图像样本进行聚类,确定每个目标人脸图像样本对应的伪标签;
其中,所述伪标签用于标识对应的目标人脸图像样本所属的人员身份。
作为本申请实施例的一种实施方式,上述参数调整单元804可以包括:
损失函数值计算子单元,用于基于所述第一分类损失、所述第二分类损失及所述约束损失,按照以下公式计算得到损失函数值L:
L=L c1+L c2+λL kd
其中,L c1为所述第一分类损失,L c2为所述第二分类损失,L kd为所述约束损失,λ为预设参数。
参数调整子单元,用于基于所述损失函数值,调整所述待训练部分的模型参数。
作为本申请实施例的一种实施方式,上述第三输入单元803可以包括:
约束损失计算子单元,用于基于所述预估特征以及所述初始特征,按照以下公式计算得到所述约束损失L kd
Figure PCTCN2022142777-appb-000008
其中,n为所述源域人脸特征的数量,F i为第i个源域人脸特征对应的初始特征,
Figure PCTCN2022142777-appb-000009
为第i个源域人脸特征对应的预估特征。
作为本申请实施例的一种实施方式,上述特征确定单元可以包括:
特征降维子单元,用于对所述中间层输出的人脸特征进行降维处理,得到降维后的人脸特征,作为源域人脸特征。
上述装置还可以包括:
特征恢复模块,用于在所述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤之前,对所述源域人脸特征进行维度恢复处理,得到恢复后的源域人脸特征。
作为本申请实施例的一种实施方式,上述装置还可以包括:
待识别人脸图像获取模块,用于获取所述目标域的待识别人脸图像;
身份确定模块,用于基于所述人脸识别模型对所述待识别人脸图像进行识别,确定所述待识别人脸图像对应的身份。
本申请实施例还提供了一种电子设备,如图9所示,包括处理器901、通信接口902、存储器903和通信总线904,其中,处理器901,通信接口902,存储器903通过通信总线904完成相互间的通信,
存储器903,用于存放计算机程序;
处理器901,用于执行存储器903上所存放的程序时,实现上述任一实施例所述的方法步骤。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一实施例所述的方法的步骤。
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一实施例所述的方法步骤。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。 当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、电子设备、计算机可读存储介质以及计算机程序产品实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (11)

  1. 一种人脸识别模型的训练方法,其特征在于,所述方法包括:
    获取源域人脸特征以及初始化识别模型,其中,所述初始化识别模型基于源域的全量人脸图像样本训练得到,所述源域人脸特征为通过所述初始化识别模型获得的所述全量人脸图像样本的人脸特征;
    获取目标域的目标人脸图像样本,其中,所述目标人脸图像样本对应的身份标签未知;
    基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数,直到所述初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
  2. 根据权利要求1所述的方法,其特征在于,所述获取源域人脸特征的步骤,包括:
    按照预设筛选策略,对所述全量人脸图像样本进行筛选,得到筛选后的全量人脸图像样本,其中,所述预设筛选策略使得所述筛选后的全量人脸图像样本的数量不变的情况下,所述筛选后的全量人脸图像样本对应的身份信息数量不小于预设数量;
    将所述筛选后的全量人脸图像样本输入所述初始化识别模型,获取所述初始化识别模型的中间层输出的人脸特征;
    基于所述中间层输出的人脸特征,确定源域人脸特征。
  3. 根据权利要求1所述的方法,其特征在于,所述初始化识别模型包括参数固定部分和待训练部分;
    所述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤,包括:
    将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签,并基于所述第一预测标签以及所述目标人脸图像样本对应的伪标签,确定第一分类损失;
    将所述源域人脸特征输入所述待训练部分,得到第二预测标签,并基于所述第二预测标签以及所述源域人脸特征对应的身份标签,确定第二分类损失;
    将所述源域人脸特征分别输入所述待训练部分以及所述待训练部分对应的初始部分,得到预估特征以及初始特征,并基于所述预估特征以及所述初始特征,确定约束损失,其中,所述初始部分为所述待训练部分的模型参数固定为基于所述全量人脸图像样本训练后的模型参数时对应的模型部分;
    基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数。
  4. 根据权利要求3所述的方法,其特征在于,在所述将所述目标人脸图像样本输入所述参数固定部分和所述待训练部分,得到第一预测标签的步骤之前,所述方法还包括:
    对所述目标人脸图像样本进行聚类,确定每个目标人脸图像样本对应的伪标签,其中,所述伪标签用于标识对应的目标人脸图像样本所属的人员身份。
  5. 根据权利要求3所述的方法,其特征在于,所述基于所述第一分类损失、所述第二分类损失及所述约束损失,调整所述待训练部分的模型参数的步骤,包括:
    基于所述第一分类损失、所述第二分类损失及所述约束损失,按照以下公式计算得到 损失函数值L:
    L=L c1+L c2+λL kd
    其中,L c1为所述第一分类损失,L c2为所述第二分类损失,L kd为所述约束损失,λ为预设参数;
    基于所述损失函数值,调整所述待训练部分的模型参数。
  6. 根据权利要求3所述的方法,其特征在于,所述基于所述预估特征以及所述初始特征,确定约束损失的步骤,包括:
    基于所述预估特征以及所述初始特征,按照以下公式计算得到所述约束损失L kd
    Figure PCTCN2022142777-appb-100001
    其中,n为所述源域人脸特征的数量,F i为第i个源域人脸特征对应的初始特征,
    Figure PCTCN2022142777-appb-100002
    为第i个源域人脸特征对应的预估特征。
  7. 根据权利要求2-6任一项所述的方法,其特征在于,所述基于所述中间层输出的人脸特征,确定源域人脸特征的步骤,包括:
    对所述中间层输出的人脸特征进行降维处理,得到降维后的人脸特征,作为源域人脸特征;
    在所述基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数的步骤之前,所述方法还包括:
    对所述源域人脸特征进行维度恢复处理,得到恢复后的源域人脸特征。
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:
    获取所述目标域的待识别人脸图像;
    基于所述人脸识别模型对所述待识别人脸图像进行识别,确定所述待识别人脸图像对应的身份。
  9. 一种人脸识别模型的训练装置,其特征在于,所述装置包括:
    初始化训练模块,用于获取源域人脸特征以及初始化识别模型,其中,所述初始化识别模型基于源域的全量人脸图像样本训练得到,所述源域人脸特征为通过所述初始化识别模型获得的所述全量人脸图像样本的人脸特征;
    目标域样本获取模块,用于获取目标域的目标人脸图像样本,其中,所述目标人脸图像样本对应的身份标签未知;
    增量训练模块,用于基于所述目标人脸图像样本以及所述源域人脸特征,调整所述初始化识别模型的部分模型参数,直到所述初始化识别模型收敛,得到针对源域和目标域的人脸识别模型。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-8任一所述的方法步骤。
  11. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行权利要求1-8任一所述的方法步骤。
PCT/CN2022/142777 2021-12-29 2022-12-28 人脸识别模型的训练方法、装置、电子设备及存储介质 WO2023125654A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111637930.4 2021-12-29
CN202111637930.4A CN114333013A (zh) 2021-12-29 2021-12-29 人脸识别模型的训练方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023125654A1 true WO2023125654A1 (zh) 2023-07-06

Family

ID=81017897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142777 WO2023125654A1 (zh) 2021-12-29 2022-12-28 人脸识别模型的训练方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114333013A (zh)
WO (1) WO2023125654A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218515A (zh) * 2023-09-19 2023-12-12 人民网股份有限公司 一种目标检测方法、装置、计算设备和存储介质
CN117217288A (zh) * 2023-09-21 2023-12-12 摩尔线程智能科技(北京)有限责任公司 大模型的微调方法、装置、电子设备和存储介质
CN117831106A (zh) * 2023-12-29 2024-04-05 广电运通集团股份有限公司 人脸识别模型训练方法、装置、电子设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114333013A (zh) * 2021-12-29 2022-04-12 杭州海康威视数字技术股份有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN114998712B (zh) * 2022-08-03 2022-11-15 阿里巴巴(中国)有限公司 图像识别方法、存储介质及电子设备
CN115861302B (zh) * 2023-02-16 2023-05-05 华东交通大学 一种管接头表面缺陷检测方法及***
CN117711078A (zh) * 2023-12-13 2024-03-15 西安电子科技大学广州研究院 一种针对人脸识别***的模型遗忘方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329617A (zh) * 2020-11-04 2021-02-05 中国科学院自动化研究所 基于单张源域样本的新场景人脸识别模型构建方法、***
CN112395986A (zh) * 2020-11-17 2021-02-23 广州像素数据技术股份有限公司 一种新场景快速迁移且防遗忘的人脸识别方法
CN112801236A (zh) * 2021-04-14 2021-05-14 腾讯科技(深圳)有限公司 图像识别模型的迁移方法、装置、设备及存储介质
CN114333013A (zh) * 2021-12-29 2022-04-12 杭州海康威视数字技术股份有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329617A (zh) * 2020-11-04 2021-02-05 中国科学院自动化研究所 基于单张源域样本的新场景人脸识别模型构建方法、***
CN112395986A (zh) * 2020-11-17 2021-02-23 广州像素数据技术股份有限公司 一种新场景快速迁移且防遗忘的人脸识别方法
CN112801236A (zh) * 2021-04-14 2021-05-14 腾讯科技(深圳)有限公司 图像识别模型的迁移方法、装置、设备及存储介质
CN114333013A (zh) * 2021-12-29 2022-04-12 杭州海康威视数字技术股份有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218515A (zh) * 2023-09-19 2023-12-12 人民网股份有限公司 一种目标检测方法、装置、计算设备和存储介质
CN117218515B (zh) * 2023-09-19 2024-05-03 人民网股份有限公司 一种目标检测方法、装置、计算设备和存储介质
CN117217288A (zh) * 2023-09-21 2023-12-12 摩尔线程智能科技(北京)有限责任公司 大模型的微调方法、装置、电子设备和存储介质
CN117217288B (zh) * 2023-09-21 2024-04-05 摩尔线程智能科技(北京)有限责任公司 大模型的微调方法、装置、电子设备和存储介质
CN117831106A (zh) * 2023-12-29 2024-04-05 广电运通集团股份有限公司 人脸识别模型训练方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114333013A (zh) 2022-04-12

Similar Documents

Publication Publication Date Title
WO2023125654A1 (zh) 人脸识别模型的训练方法、装置、电子设备及存储介质
Chen et al. The Lao text classification method based on KNN
WO2020114378A1 (zh) 视频水印的识别方法、装置、设备及存储介质
CN110069709B (zh) 意图识别方法、装置、计算机可读介质及电子设备
Zeng et al. Deep convolutional neural networks for annotating gene expression patterns in the mouse brain
JP7266674B2 (ja) 画像分類モデルの訓練方法、画像処理方法及び装置
TW201909112A (zh) 圖像特徵獲取
CN108898181B (zh) 一种图像分类模型的处理方法、装置及存储介质
CN111133453A (zh) 人工神经网络
US20200082213A1 (en) Sample processing method and device
CN114998602B (zh) 基于低置信度样本对比损失的域适应学习方法及***
CN114186063B (zh) 跨域文本情绪分类模型的训练方法和分类方法
WO2023179429A1 (zh) 一种视频数据的处理方法、装置、电子设备及存储介质
CN111008575A (zh) 一种基于多尺度上下文信息融合的鲁棒人脸识别方法
Kao et al. Disc-GLasso: Discriminative graph learning with sparsity regularization
CN112749737A (zh) 图像分类方法及装置、电子设备、存储介质
CN111178196B (zh) 一种细胞分类的方法、装置及设备
CN115801374A (zh) 网络入侵数据分类方法、装置、电子设备及存储介质
CN114495243A (zh) 图像识别模型训练及图像识别方法、装置、电子设备
CN112861626B (zh) 基于小样本学习的细粒度表情分类方法
CN111191781A (zh) 训练神经网络的方法、对象识别方法和设备以及介质
Wang et al. Intelligent radar HRRP target recognition based on CNN-BERT model
Ahmad et al. Deep convolutional neural network using triplet loss to distinguish the identical twins
CN105740916B (zh) 图像特征编码方法及装置
CN113762005A (zh) 特征选择模型的训练、对象分类方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914906

Country of ref document: EP

Kind code of ref document: A1