CN113850243A - Model training method, face recognition method, electronic device and storage medium - Google Patents

Model training method, face recognition method, electronic device and storage medium Download PDF

Info

Publication number
CN113850243A
CN113850243A CN202111438776.8A CN202111438776A CN113850243A CN 113850243 A CN113850243 A CN 113850243A CN 202111438776 A CN202111438776 A CN 202111438776A CN 113850243 A CN113850243 A CN 113850243A
Authority
CN
China
Prior art keywords
face recognition
crowd
model
image
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111438776.8A
Other languages
Chinese (zh)
Inventor
胡长胜
浦煜
付贤强
何武
朱海涛
户磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dilusense Technology Co Ltd
Hefei Dilusense Technology Co Ltd
Original Assignee
Beijing Dilusense Technology Co Ltd
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dilusense Technology Co Ltd, Hefei Dilusense Technology Co Ltd filed Critical Beijing Dilusense Technology Co Ltd
Priority to CN202111438776.8A priority Critical patent/CN113850243A/en
Publication of CN113850243A publication Critical patent/CN113850243A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention relates to the field of face recognition, and discloses a model training method, a face recognition method, electronic equipment and a storage medium, wherein the model training comprises the following steps: acquiring image samples containing faces under different crowd categories, and labeling category labels of the crowd categories to which the image samples belong; performing original training on a pre-established face recognition model based on the triple sample constructed by the image sample to obtain a trained face recognition model; taking a feature map output by any layer of the feature extraction network in the face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model; and training the crowd category branch network based on the image sample and the category label of the image sample to obtain the intermediate model after training. The scheme can effectively solve the problem that the existing face recognition product adopting the same face recognition algorithm cannot well solve the problem of high rejection rate or high false recognition rate caused by crowd category difference.

Description

Model training method, face recognition method, electronic device and storage medium
Technical Field
The present invention relates to the field of face recognition, and in particular, to a model training method, a face recognition method, an electronic device, and a storage medium.
Background
With the popularization of the face recognition technology in daily life, the application scenes faced by the face recognition technology are more and more complex. The traditional face recognition methods based on a single scene and a single group have exposed limitations. When the population in the application scenario contains multiple families (yellow, white, black, etc.) or multiple age groups (children, adults, the elderly), there are two common solutions:
the first scheme is as follows: different face recognition algorithms are trained aiming at different crowd categories, and then a specific face recognition model and a recognition threshold are selected according to an estimation model with an evaluation classification function (such as ethnicity classification or age group classification) to complete face recognition.
Scheme II: the differences among the crowd categories are not distinguished, and the whole recognition model is adopted for face recognition.
However, both of the above solutions have some drawbacks:
although the first scheme can well solve the problem of face recognition of different crowd categories under complex scenes on the basis of methods and results, the whole system is complex, the algorithm development workload is large, and the requirements on computing power and storage of a hardware platform are high.
In the second scheme, for the weight type recognition model, due to the capability of the model (capacity), the weight type recognition model is trained together for various crowd categories during training, and the overall recognition effect of the weight type recognition model can possibly meet the scene requirement. However, in essence, due to the specificity of the various types of population, the corresponding feature spaces of the various types of population, whether they are weight models or light models, cannot be completely overlapped and overlap each other to some extent. Therefore, if the identification is directly performed without distinguishing, the experience on a specific population is poor, such as higher rejection rate or higher false recognition rate. Meanwhile, due to the inherent defect of overlapping of feature spaces, after a traditional machine learning method is used for clustering features or an additional evaluation classification model is used for distinguishing groups, although a part of conditions of refusal or false recognition can be improved, all scene requirements still cannot be met, and particularly the face recognition system with extremely high requirements on safety is deployed in a financial scene or a face recognition door lock scene.
Disclosure of Invention
The embodiment of the invention aims to provide a model training method, a face recognition method, an electronic device and a storage medium, which can effectively solve the problem that the current face recognition product adopts the same face recognition algorithm and cannot well solve the problem of high rejection rate or high false recognition rate caused by crowd category difference.
In order to solve the above technical problem, an embodiment of the present invention provides a model training method, including:
acquiring image samples containing faces under different crowd categories, and labeling category labels of the crowd categories to which the image samples belong;
performing original training on a pre-established face recognition model based on the triple sample constructed by the image sample to obtain a trained face recognition model;
taking a feature map output by any layer of the feature extraction network in the face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model; the output of the intermediate model comprises the human face characteristics output by the human face recognition model and the crowd category output by the crowd category branch network;
and training the crowd category branch network based on the image sample and the category label of the image sample to obtain the intermediate model after training.
The embodiment of the invention provides a face recognition method, which comprises the following steps:
performing face recognition on a face image to be recognized by using an intermediate model trained by the model training method to obtain face features output by the face recognition model in the intermediate model and a crowd category output by the crowd category branch network;
and comparing the face features output by the face recognition model with the face features in the registered feature library one by adopting a similarity threshold corresponding to the crowd category obtained by the intermediate model recognition in preset similarity thresholds, and determining the identity information of the face in the face image to be recognized.
An embodiment of the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model training method as described above, or a face recognition method as described above.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements a model training method as described above, or a face recognition method as described above.
Compared with the prior art, the method and the device have the advantages that image samples containing faces under different crowd categories are collected, and category labels of the crowd categories to which the image samples belong are labeled; performing original training on a pre-built face recognition model based on a triple sample constructed by an image sample to obtain a trained face recognition model; taking a feature map output by any layer of a feature extraction network in a face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model; the output of the intermediate model comprises the human face characteristics output by the human face recognition model and the crowd category output by the crowd category branch network; training a crowd category branch network based on the image sample and the category label of the image sample to obtain a trained intermediate model. The intermediate model trained in the scheme can simultaneously obtain the face characteristics of the face image to be recognized and the crowd category to which the face to be recognized belongs. During subsequent feature comparison, a similarity threshold corresponding to the identified crowd category in preset similarity thresholds can be adopted to perform similarity comparison on the face features output by the face identification model and the face features in the registered feature library one by one to determine the identity information of the face in the face image to be identified, so that the problems of high rejection rate or high false identification rate caused by crowd category difference are well solved, and the accuracy of face identification is improved.
Drawings
FIG. 1 is a first flowchart illustrating a first embodiment of a model training method according to the present invention;
FIG. 2 is a schematic structural diagram of an intermediate model according to an embodiment of the invention;
FIG. 3 is a detailed flowchart II of a model training method according to an embodiment of the present invention;
FIG. 4 is a detailed flow chart of a face recognition method according to an embodiment of the invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
An embodiment of the present invention relates to a model training method, and as shown in fig. 1, the model training method provided in this embodiment includes the following steps.
Step 101: the method comprises the steps of collecting image samples containing faces under different crowd categories, and labeling category labels of the crowd categories to which the image samples belong.
The different crowd categories are a plurality of crowd categories divided from a certain dimension, such as a plurality of age categories divided by age or a plurality of race categories divided by race.
Specifically, image samples of faces under different crowd categories divided according to the certain dimension are collected, and category labels of the crowd categories to which the image samples belong are labeled. For example, three age categories may be divided by the age group to which the face belongs: children, adults, the elderly; the corresponding category labels are 0, 1, 2, where 0 represents child, 1 represents adult, and 2 represents elderly. For another example, the following race categories may be classified by the race to which the face belongs: yellows, caucasians, melanophors … …; the corresponding class labels are 0, 1, 2, … …, where 0 represents the yellow race, 1 represents the white race, 2 represents the black race, … ….
In order to avoid the bias of the face recognition model obtained by training (for example, the face recognition model has better robustness for the middle-aged population and has poorer robustness for the elderly and children population), the existing face recognition model is used as a pre-training model, a triple loss sampling strategy is modified, and the overall data balance is kept. In one example, the process of acquiring image samples containing faces in different demographic categories may satisfy the following acquisition strategy.
In any sampled image sample: the image samples including all the crowd categories and all the crowd categories are balanced in number; the number of the image samples of different persons is balanced; and the image samples of the same person are balanced in the number of the image samples in the preset multiple application scenes.
The image sample quantity balance of all the crowd categories is that the quantity of the image samples sampled each time is balanced in the whole dimension (such as age dimension and race dimension). For example, for the age dimension, the number of samples for each age category is required to be balanced or fit a normal distribution (e.g., child: adult: senior =2:6: 2).
The number balance of the image samples belonging to different people means that the number of the image samples correspondingly sampled by different people is totally balanced so as to avoid the long tail effect of data.
The number of image samples of the same person in a plurality of preset application scenes is balanced, and it is required to ensure that image data of each person in each scene (such as an indoor scene, an outdoor scene and the like) can be uniformly acquired in one-time sampling, so that the application scenes are balanced as much as possible.
Step 102: and performing original training on the pre-established face recognition model based on the triple samples constructed by the image samples to obtain the trained face recognition model.
In particular, a triplet sample is constructed based on the acquired image samples. In general, a generic triplet sample includes: anchor (a), positive (p), and negative (n). Wherein a and p are the same type of sample (corresponding to the same person); a and n are different classes of samples (corresponding to different people). When the face recognition model training is carried out based on the triple sample, the triple loss is calculated by adopting the following triple loss functionL
Figure 153868DEST_PATH_IMAGE001
………………………(1)
Wherein the content of the first and second substances,
Figure 601030DEST_PATH_IMAGE002
is as followsiAnchor samples in the triplet image samples,
Figure DEST_PATH_IMAGE003
Is as followsiPositive samples of the triplet image samples,
Figure 633446DEST_PATH_IMAGE004
Is as followsiNegative samples of the triplet image samples,fCalculated by face recognition modelA characteristic vector,NFor the total number of triplet image samples,
Figure DEST_PATH_IMAGE005
Is a measure of Euclidean distance between the positive and anchor samples,
Figure 2110DEST_PATH_IMAGE006
Is the Euclidean distance measure between the negative and anchor samples;marginis a spacing parameter, "+" denotes "", "" is a]"when the value in is greater than zero, the value is taken as loss; when less than zero, the loss is 0.
Here, the cosine distance may be used instead of the euclidean distance, and the overall difference is not large.
In the step, a basic triplet loss function is used for training a pre-built face recognition model, the training is stopped after convergence, and the trained face recognition model is obtained on the basis. In order to distinguish from the subsequent process of training the face recognition model, the training process in this step is called an original training process.
Step 103: taking a feature map output by any layer of a feature extraction network in a face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model; the output of the intermediate model comprises the face characteristics output by the face recognition model and the crowd category output by the crowd category branch network.
Specifically, in this embodiment, a feature map output at any layer in a feature extraction network in the face recognition model is used as an input, a crowd category branch network is added, and the face recognition model after the crowd category branch network is added is recorded as an intermediate model. The feature extraction network is mainly responsible for extracting feature images of the face image at different depth levels during face image recognition. For example, the feature extraction network may be a convolutional neural network including a plurality of convolutional layers, and a feature map output by any one of the plurality of convolutional layers (preferably, a convolutional layer in a non-head-tail position) is used as an input of the crowd category branch network. The output of the crowd category branch network is a plurality of crowd categories, and the crowd category branch network is used for carrying out crowd category division on the face features output by the face recognition model to obtain the crowd categories to which the face features belong. Therefore, the output of the intermediate model comprises two parts, namely a new human face feature output by the human face recognition model and a crowd category output by the crowd category branch network.
In an example, the face recognition model may adopt a residual error network ResNet50 structure, and accordingly, a crowd category branch network is added by taking a feature map output by any layer in a feature extraction network in the face recognition model as an input, and a process of forming an intermediate model may be implemented by the following steps.
Step 1: and selecting an output characteristic diagram of the 2 nd residual block of the conv5_ x layer in the residual network ResNet50 structure as an input of the crowd category branch network.
Specifically, as shown in fig. 2, a structure diagram of the intermediate model exemplarily given in this embodiment is shown, and in practical application, the structure of the intermediate model may also be flexibly set according to actual requirements. In FIG. 2, in the left backbone networkxThe network structure from (face image) to Feature (face Feature vector) is a face recognition model, and a ResNet50 structure is adopted, and comprises: the multilayer structure comprises a Conv5_ x layer, a first 1x1 convolutional layer (Conv _1x 1), a second Global Pooling layer (Global Pooling) and a second full connection layer (FC) which are sequentially connected in series. And taking the feature map output by the 2 nd Residual Block (Residual Block _ 1) of the conv5_ x layer as the input of the crowd category branch network. The signature size at this time is 7x7, Channel = 512.
Step 2: constructing a crowd category branch network by adopting a residual block, a first global pooling layer and a first full-connection layer which are connected in series from front to back; the input of the residual block is used as the input of the crowd category branch network, and the output of the first full connection layer is used as the output of the crowd category branch network.
As shown in fig. 2, the crowd category branching network includes: the Residual Block (Residual Block _ agent) comprises a first Global Pooling layer (Global Pooling) and a first full connection layer (FC) which are sequentially connected in series. In practical application, the structure of the crowd category branch network can be flexibly set according to actual requirements.
Specifically, a feature map output by the 2 nd Residual Block (Residual Block _ 1) of the conv5_ x layer is used as an input of the crowd category branch network, and the feature map size is 7x7 at this time, and Channel = 512; after entering the crowd category branch network, a new Residual _ Block structure (Residual-Block _ Age) is connected, and an output channel (channel) of the Residual _ Block structure can be set to be 128 (can be adjusted according to actual conditions); then, next to a global pooling layer, changing the feature map from 7x7 to 1x 1; and finally, a full connection layer (FC) with the output dimension of the crowd category number is used as the output of the crowd category branch network. Shown in fig. 2 is a network of crowd category branches built for the age group dimension, so the output dimension is 3 (children, adults, elderly).
Step 104: training a crowd category branch network based on the image sample and the category label of the image sample to obtain a trained intermediate model.
Specifically, since the training of the newly added crowd category branch network is simple, the face recognition model trained in step 102, that is, the trunk network, is only required to be frozen, and then the newly added branch portion is trained, and softmax loss training is used in the training process. Because the number of the crowd categories is relatively small, the convergence speed is high, and the training of the crowd category branch network is finished after the crowd categories are completely converged, so that the intermediate model after the training is finished is finally obtained.
Compared with the related art, the embodiment collects the image samples containing the faces under different crowd categories, and labels the category labels of the crowd categories to which the image samples belong; performing original training on a pre-built face recognition model based on a triple sample constructed by an image sample to obtain a trained face recognition model; taking a feature map output by any layer of a feature extraction network in a face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model; the output of the intermediate model comprises the human face characteristics output by the human face recognition model and the crowd category output by the crowd category branch network; training a crowd category branch network based on the image sample and the category label of the image sample to obtain a trained intermediate model. The intermediate model trained in the scheme can simultaneously obtain the face characteristics of the face image to be recognized and the crowd category to which the face to be recognized belongs. During subsequent feature comparison, a similarity threshold corresponding to the identified crowd category in preset similarity thresholds can be adopted to perform similarity comparison on the face features output by the face identification model and the face features in the registered feature library one by one to determine the identity information of the face in the face image to be identified, so that the problems of high rejection rate or high false identification rate caused by crowd category difference are well solved, and the accuracy of face identification is improved.
Another embodiment of the invention relates to a model training method, as shown in fig. 3, which is an improvement over the method steps shown in fig. 1 in that the originally trained human recognition model is optimally trained to obtain an optimized intermediate model. As shown in fig. 3, after step 102, the following steps are also included.
Step 105: and calculating the triple loss of the triple image samples according to the face recognition model obtained in the original training, and extracting the difficult samples from the triple image samples according to the calculation result.
In the actual model training process, if the weight type recognition model is used, the face recognition model obtained through the original training can be well distinguished in the feature space according to different crowd categories due to the fact that the model capacity of the weight type model is large. However, if the application scene mainly includes a lightweight model or has a high face recognition security requirement, the model recognition capability needs to be further improved. In this embodiment, the human recognition model obtained by the original training is optimally trained to improve the model recognition capability.
Specifically, the triple image samples are input into a face recognition model obtained during original training, and feature vectors corresponding to the image samples in the triple image samples are output. The triplet loss is then calculated by the triplet loss function as shown in equation (1)L. When in useLIf the sample number is greater than 0, a loss will occur, and the corresponding triple sample can be regarded as trappedA difficult sample; when in useLWhen the loss is not greater than 0, the loss is 0, and the corresponding triple sample can be regarded as a simple sample.
However, in actual model training, more desirable goals are: the feature vectors corresponding to different crowd categories are differentiated; the feature vectors corresponding to different people in the same crowd category are also distinguishable.
For this reason, the present embodiment improves the conventional triplet loss function to achieve the above two discriminations as much as possible. Accordingly, this step 105 can be realized by the following steps.
Step 1: calculating a triplet loss for each triplet image sample by the following equation (2)L
Figure DEST_PATH_IMAGE007
………………………(2)
When in use
Figure 364958DEST_PATH_IMAGE002
And
Figure 932206DEST_PATH_IMAGE008
when belonging to the same crowd category, the user can select the crowd,M=margin1
when in use
Figure 371409DEST_PATH_IMAGE002
And
Figure 24107DEST_PATH_IMAGE008
when the people belong to different crowd categories, the people can be classified into different groups,M=margin2
wherein the content of the first and second substances,
Figure 597171DEST_PATH_IMAGE002
is as followsiAnchor samples in the triplet image samples,
Figure 753345DEST_PATH_IMAGE003
Is as followsiPositive samples of the triplet image samples,
Figure 143875DEST_PATH_IMAGE004
Is as followsiNegative samples of the triplet image samples,f(X) is an image sample, a feature vector obtained by calculation of the face recognition model,NFor the total number of triplet image samples,
Figure 955974DEST_PATH_IMAGE005
Is a measure of Euclidean distance between the positive and anchor samples,
Figure 129466DEST_PATH_IMAGE006
Is the Euclidean distance measure between the negative and anchor samples;M、 margin1margin2are all interval parameters, andmargin2is greater thanmargin1
Specifically, in calculatingiInterval parameter when a triplet of a triplet image sample is lostMIs based on positive samples
Figure 405727DEST_PATH_IMAGE002
And negative sample
Figure 950846DEST_PATH_IMAGE008
To the category of the population to which it belongs, i.e. the current sample
Figure 312558DEST_PATH_IMAGE002
And negative sample
Figure 227424DEST_PATH_IMAGE008
When the same group of people is in different categories, the value ismargin1(ii) a When the sample is positive
Figure 358191DEST_PATH_IMAGE002
And negative sample
Figure 824944DEST_PATH_IMAGE008
When the different groups are different types of different groups of people, the value ismargin2And is andmargin2is greater thanmargin1To ensure the time of the positive and negative samples in different crowd categoriesThe interval is larger than the interval of the same crowd category, and the actual requirement is met.
Step 2: extracting triple lossesLA triplet image sample larger than 0 is taken as a difficult sample.
Specifically, the triplet loss for each triplet image sample is calculated according to equation (2)LAnd extracting the triple loss thereinLA triplet image sample larger than 0 is taken as a difficult sample.
In one example, triple losses may be determined firstLTriplet image samples greater than 0; then extracting the three-element image sample from the determined three-element image sampleM=margin1Corresponding first difficult sample andM=margin2and taking the corresponding second difficult sample as the finally extracted difficult sample, wherein the ratio of the number of the first difficult sample to the second difficult sample is 1: 2.
Specifically, since each image sample is trained indiscriminately in step 102, further sampling is required for the difficult samples mined according to the condition of step 2. According to the experiment, the setting is usedmargin2The difficult sample dug out is more than usedmargin1And (4) digging out difficult samples, so that the training effect of the final model is better. If the ratio is set to 2:1, it indicates that in the present triplet loss sampling, the number of difficult samples in different classes for which the anchor sample and the negative sample are in different crowd categories is 2 times the number of difficult samples in different classes for which the anchor sample and the negative sample are in the same crowd category.
Step 106: retraining the face recognition model obtained after the original training based on the difficult sample to obtain an optimized face recognition model; and the interval parameter used for calculating the triple loss during the retraining is smaller than the interval parameter used for calculating the triple loss during the original training.
Specifically, the training process of retraining in this step is the same as the process of originally training the face recognition model, and the differences are only: the image samples used during retraining are all the difficult samples obtained in step 105, and the interval parameters used for calculating the triplet loss during retraining are smaller than the interval parameters used for calculating the triplet loss during the original training, because the training on the difficult samples is more strict than that on the ordinary samples.
On this basis, step 103 may accordingly comprise the following sub-steps.
Substep 1031: and (3) taking a feature map output by any layer in the feature extraction network in the optimized face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model.
Compared with the related art, the embodiment calculates the triple loss of the triple image samples by aiming at the face recognition model obtained in the original training, and extracts the difficult samples from the triple image samples according to the calculation result; retraining the face recognition model obtained after the original training based on the difficult sample to obtain an optimized face recognition model; and the interval parameter used for calculating the triple loss during retraining is smaller than the interval parameter used for calculating the triple loss during original training, so that the recognition capability of the face recognition model is improved.
Another embodiment of the present invention relates to a face recognition method, as shown in fig. 4, which includes the following steps.
Step 201: and performing face recognition on the face image to be recognized by using the intermediate model trained by the model training method to obtain the face features output by the face recognition model in the intermediate model and the crowd categories output by the crowd category branch network.
Specifically, by using the intermediate model obtained in the above method embodiment, the face image to be recognized is subjected to face recognition, and two outputs, namely, a new face feature output by the face recognition model in the intermediate model and a crowd category output by the crowd category branch network, can be obtained.
Step 202: and performing similarity comparison on the face features output by the face recognition model and the face features in the registered feature library one by adopting a similarity threshold corresponding to the crowd category obtained by the intermediate model recognition in the preset similarity thresholds, and determining the identity information of the face in the face image to be recognized.
Specifically, according to the crowd category output by the crowd category branch network in the intermediate model, a similarity threshold corresponding to the crowd category is extracted from preset similarity thresholds. When the face features output by the face recognition model are subjected to similarity comparison with the face features in the registered feature library one by one, whether the two face features are the same person can be judged based on the extracted similarity threshold. For example, when the compared similarity value is greater than the similarity threshold value, it is determined that the two face features correspond to the same person, otherwise, it is determined that the two face features are not the same person.
Compared with the related art, the intermediate model obtained by the training can simultaneously obtain the face features of the face image to be recognized and the crowd category to which the face to be recognized belongs. During subsequent feature comparison, a similarity threshold corresponding to the identified crowd category in preset similarity thresholds can be adopted to perform similarity comparison on the face features output by the face identification model and the face features in the registered feature library one by one to determine the identity information of the face in the face image to be identified, so that the problems of high rejection rate or high false identification rate caused by crowd category difference are well solved, and the accuracy of face identification is improved.
Another embodiment of the invention relates to an electronic device, as shown in FIG. 5, comprising at least one processor 302; and a memory 301 communicatively coupled to the at least one processor 302; the memory 301 stores instructions executable by the at least one processor 302, and the instructions are executed by the at least one processor 302 to enable the at least one processor 302 to perform any of the method embodiments described above.
Where the memory 301 and processor 302 are coupled in a bus, the bus may comprise any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 302 and memory 301 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 302 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 302.
The processor 302 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 301 may be used to store data used by processor 302 in performing operations.
Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes any of the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A method of model training, comprising:
acquiring image samples containing faces under different crowd categories, and labeling category labels of the crowd categories to which the image samples belong;
performing original training on a pre-established face recognition model based on the triple sample constructed by the image sample to obtain a trained face recognition model;
taking a feature map output by any layer of the feature extraction network in the face recognition model as input, and additionally arranging a crowd category branch network to form an intermediate model; the output of the intermediate model comprises the human face characteristics output by the human face recognition model and the crowd category output by the crowd category branch network;
and training the crowd category branch network based on the image sample and the category label of the image sample to obtain the intermediate model after training.
2. The method of claim 1, wherein the crowd categories are age categories divided by age or ethnicity categories divided by ethnicity; the process of acquiring image samples containing faces under different crowd categories meets the following acquisition strategies:
in any sampled image sample: the image samples which comprise all the crowd categories and are in each crowd category are balanced in number; the number of the image samples of different persons is balanced; and the image samples of the same person are balanced in the number of the image samples in the preset multiple application scenes.
3. The method according to claim 1, wherein the original training of the pre-constructed face recognition model based on the triple sample constructed by the image sample to obtain the trained face recognition model comprises:
calculating the triple loss of the triple image samples according to the face recognition model obtained in the original training, and extracting difficult samples from the triple image samples according to the calculation result;
retraining the face recognition model obtained after the original training based on the difficult sample to obtain an optimized face recognition model; the interval parameter used for calculating the triple loss during retraining is smaller than the interval parameter used for calculating the triple loss during original training;
the method for forming the intermediate model by adding the crowd classification branch network by taking the feature graph output by any layer in the feature extraction network in the face recognition model as input comprises the following steps:
and adding the crowd category branch network by taking the feature map output by any layer of the feature extraction network in the optimized face recognition model as input to form the intermediate model.
4. The method of claim 3, wherein the calculating the triplet loss of the triplet image samples for the face recognition model obtained during the original training and extracting the difficult samples from the triplet image samples according to the calculation result comprises:
calculating a triplet loss for each of the triplet image samples by the following formulaL
Figure 867013DEST_PATH_IMAGE001
When in use
Figure 570527DEST_PATH_IMAGE002
And
Figure 69772DEST_PATH_IMAGE003
when belonging to the same crowd category, the user can select the crowd,M=margin1
when in use
Figure 807921DEST_PATH_IMAGE002
And
Figure 124633DEST_PATH_IMAGE003
when the people belong to different crowd categories, the people can be classified into different groups,M=margin2
wherein the content of the first and second substances,
Figure 315443DEST_PATH_IMAGE002
is as followsiAnchor samples in the triplet image samples,
Figure 867647DEST_PATH_IMAGE004
Is as followsiPositive samples of the triplet image samples,
Figure 194723DEST_PATH_IMAGE005
Is as followsiNegative samples of the triplet image samples,f(X) is an image sample, a feature vector obtained by calculation of the face recognition model,NFor the total number of triplet image samples,
Figure 682336DEST_PATH_IMAGE006
Is a measure of Euclidean distance between the positive and anchor samples,
Figure 360442DEST_PATH_IMAGE007
Is the Euclidean distance measure between the negative and anchor samples;M、 margin1margin2are all interval parameters, andmargin2is greater thanmargin1
Extracting the triple lossLA triplet image sample greater than 0 is taken as the difficult sample.
5. The method of claim 4, wherein the extracting the triplet of lossesLTriplet image samples greater than 0 as the difficult samples include:
determining the triplet lossLTriplet image samples greater than 0;
extracting from the determined triplet image sampleM=margin1Corresponding first difficult sample andM=margin2and taking a corresponding second difficult sample as the difficult sample finally extracted, wherein the ratio of the number of the first difficult sample to the second difficult sample is 1: 2.
6. The method of claim 1, wherein the face recognition model adopts a residual error network ResNet50 structure, and the adding a crowd category branch network with a feature map output from any layer of a feature extraction network in the face recognition model as an input to form an intermediate model comprises:
selecting an output characteristic diagram of a 2 nd residual block of a conv5_ x layer in the residual network ResNet50 structure as the input of the crowd category branch network;
constructing the crowd category branch network by adopting a residual block, a first global pooling layer and a first full-connection layer which are connected in series from front to back; the input of the residual block is used as the input of the crowd category branching network, and the output of the first full connection layer is used as the output of the crowd category branching network.
7. The method of claim 6, wherein the ResNet50 structure comprises: the global pooling layer comprises a conv5_ x layer, a first 1x1 convolutional layer, a second global pooling layer and a second full-connection layer which are sequentially connected in series.
8. A face recognition method, comprising:
performing face recognition on a face image to be recognized by using an intermediate model trained by the model training method according to any one of claims 1 to 7 to obtain face features output by the face recognition model in the intermediate model and a crowd category output by the crowd category branch network;
and comparing the face features output by the face recognition model with the face features in the registered feature library one by adopting a similarity threshold corresponding to the crowd category obtained by the intermediate model recognition in preset similarity thresholds, and determining the identity information of the face in the face image to be recognized.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a model training method as claimed in any one of claims 1 to 7, or a face recognition method as claimed in claim 8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the model training method of any one of claims 1 to 7 or the face recognition method of claim 8.
CN202111438776.8A 2021-11-29 2021-11-29 Model training method, face recognition method, electronic device and storage medium Pending CN113850243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111438776.8A CN113850243A (en) 2021-11-29 2021-11-29 Model training method, face recognition method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111438776.8A CN113850243A (en) 2021-11-29 2021-11-29 Model training method, face recognition method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113850243A true CN113850243A (en) 2021-12-28

Family

ID=78982496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111438776.8A Pending CN113850243A (en) 2021-11-29 2021-11-29 Model training method, face recognition method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113850243A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495123A (en) * 2022-01-14 2022-05-13 北京百度网讯科技有限公司 Optimization method, device, equipment and medium of optical character recognition model
CN115410265A (en) * 2022-11-01 2022-11-29 合肥的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN115471893A (en) * 2022-09-16 2022-12-13 北京百度网讯科技有限公司 Method and device for training face recognition model and face recognition
CN116386108A (en) * 2023-03-27 2023-07-04 南京理工大学 Fairness face recognition method based on instance consistency

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845330A (en) * 2016-11-17 2017-06-13 北京品恩科技股份有限公司 A kind of training method of the two-dimension human face identification model based on depth convolutional neural networks
CN107704890A (en) * 2017-10-27 2018-02-16 北京旷视科技有限公司 A kind of generation method and device of four-tuple image
CN107844784A (en) * 2017-12-08 2018-03-27 广东美的智能机器人有限公司 Face identification method, device, computer equipment and readable storage medium storing program for executing
CN108805077A (en) * 2018-06-11 2018-11-13 深圳市唯特视科技有限公司 A kind of face identification system of the deep learning network based on triple loss function
CN108875548A (en) * 2018-04-18 2018-11-23 科大讯飞股份有限公司 Personage's orbit generation method and device, storage medium, electronic equipment
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning
CN110084216A (en) * 2019-05-06 2019-08-02 苏州科达科技股份有限公司 Human face recognition model training and face identification method, system, equipment and medium
CN110197099A (en) * 2018-02-26 2019-09-03 腾讯科技(深圳)有限公司 The method and apparatus of across age recognition of face and its model training
CN110503053A (en) * 2019-08-27 2019-11-26 电子科技大学 Human motion recognition method based on cyclic convolution neural network
CN110909785A (en) * 2019-11-18 2020-03-24 西北工业大学 Multitask Triplet loss function learning method based on semantic hierarchy
CN111292801A (en) * 2020-01-21 2020-06-16 西湖大学 Method for evaluating thyroid nodule by combining protein mass spectrum with deep learning
CN111439267A (en) * 2020-03-30 2020-07-24 上海商汤临港智能科技有限公司 Method and device for adjusting cabin environment
CN111695415A (en) * 2020-04-28 2020-09-22 平安科技(深圳)有限公司 Construction method and identification method of image identification model and related equipment
CN111783698A (en) * 2020-07-06 2020-10-16 周书田 Method for improving training stability of face recognition model
CN111814706A (en) * 2020-07-14 2020-10-23 电子科技大学 Face recognition and attribute classification method based on multitask convolutional neural network
CN111967315A (en) * 2020-07-10 2020-11-20 华南理工大学 Human body comprehensive information acquisition method based on face recognition and infrared detection
CN112001366A (en) * 2020-09-25 2020-11-27 北京百度网讯科技有限公司 Model training method, face recognition device, face recognition equipment and medium
CN112232117A (en) * 2020-09-08 2021-01-15 深圳微步信息股份有限公司 Face recognition method, face recognition device and storage medium
CN112288074A (en) * 2020-08-07 2021-01-29 京东安联财产保险有限公司 Image recognition network generation method and device, storage medium and electronic equipment
CN112686955A (en) * 2020-12-25 2021-04-20 陈艳 Air hole positioning method and system in air tightness detection process based on artificial intelligence
CN112819098A (en) * 2021-02-26 2021-05-18 南京邮电大学 Domain self-adaption method based on triple and difference measurement
CN113326832A (en) * 2021-08-04 2021-08-31 北京的卢深视科技有限公司 Model training method, image processing method, electronic device, and storage medium
CN113657289A (en) * 2021-08-19 2021-11-16 北京百度网讯科技有限公司 Training method and device of threshold estimation model and electronic equipment

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845330A (en) * 2016-11-17 2017-06-13 北京品恩科技股份有限公司 A kind of training method of the two-dimension human face identification model based on depth convolutional neural networks
CN107704890A (en) * 2017-10-27 2018-02-16 北京旷视科技有限公司 A kind of generation method and device of four-tuple image
CN107844784A (en) * 2017-12-08 2018-03-27 广东美的智能机器人有限公司 Face identification method, device, computer equipment and readable storage medium storing program for executing
CN110197099A (en) * 2018-02-26 2019-09-03 腾讯科技(深圳)有限公司 The method and apparatus of across age recognition of face and its model training
CN108875548A (en) * 2018-04-18 2018-11-23 科大讯飞股份有限公司 Personage's orbit generation method and device, storage medium, electronic equipment
CN108805077A (en) * 2018-06-11 2018-11-13 深圳市唯特视科技有限公司 A kind of face identification system of the deep learning network based on triple loss function
CN109815801A (en) * 2018-12-18 2019-05-28 北京英索科技发展有限公司 Face identification method and device based on deep learning
CN110084216A (en) * 2019-05-06 2019-08-02 苏州科达科技股份有限公司 Human face recognition model training and face identification method, system, equipment and medium
CN110503053A (en) * 2019-08-27 2019-11-26 电子科技大学 Human motion recognition method based on cyclic convolution neural network
CN110909785A (en) * 2019-11-18 2020-03-24 西北工业大学 Multitask Triplet loss function learning method based on semantic hierarchy
CN111292801A (en) * 2020-01-21 2020-06-16 西湖大学 Method for evaluating thyroid nodule by combining protein mass spectrum with deep learning
CN111439267A (en) * 2020-03-30 2020-07-24 上海商汤临港智能科技有限公司 Method and device for adjusting cabin environment
CN111695415A (en) * 2020-04-28 2020-09-22 平安科技(深圳)有限公司 Construction method and identification method of image identification model and related equipment
CN111783698A (en) * 2020-07-06 2020-10-16 周书田 Method for improving training stability of face recognition model
CN111967315A (en) * 2020-07-10 2020-11-20 华南理工大学 Human body comprehensive information acquisition method based on face recognition and infrared detection
CN111814706A (en) * 2020-07-14 2020-10-23 电子科技大学 Face recognition and attribute classification method based on multitask convolutional neural network
CN112288074A (en) * 2020-08-07 2021-01-29 京东安联财产保险有限公司 Image recognition network generation method and device, storage medium and electronic equipment
CN112232117A (en) * 2020-09-08 2021-01-15 深圳微步信息股份有限公司 Face recognition method, face recognition device and storage medium
CN112001366A (en) * 2020-09-25 2020-11-27 北京百度网讯科技有限公司 Model training method, face recognition device, face recognition equipment and medium
CN112686955A (en) * 2020-12-25 2021-04-20 陈艳 Air hole positioning method and system in air tightness detection process based on artificial intelligence
CN112819098A (en) * 2021-02-26 2021-05-18 南京邮电大学 Domain self-adaption method based on triple and difference measurement
CN113326832A (en) * 2021-08-04 2021-08-31 北京的卢深视科技有限公司 Model training method, image processing method, electronic device, and storage medium
CN113657289A (en) * 2021-08-19 2021-11-16 北京百度网讯科技有限公司 Training method and device of threshold estimation model and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495123A (en) * 2022-01-14 2022-05-13 北京百度网讯科技有限公司 Optimization method, device, equipment and medium of optical character recognition model
CN115471893A (en) * 2022-09-16 2022-12-13 北京百度网讯科技有限公司 Method and device for training face recognition model and face recognition
CN115471893B (en) * 2022-09-16 2023-11-21 北京百度网讯科技有限公司 Face recognition model training, face recognition method and device
CN115410265A (en) * 2022-11-01 2022-11-29 合肥的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN115410265B (en) * 2022-11-01 2023-01-31 合肥的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN116386108A (en) * 2023-03-27 2023-07-04 南京理工大学 Fairness face recognition method based on instance consistency
CN116386108B (en) * 2023-03-27 2023-09-19 南京理工大学 Fairness face recognition method based on instance consistency

Similar Documents

Publication Publication Date Title
CN113850243A (en) Model training method, face recognition method, electronic device and storage medium
CN112949780B (en) Feature model training method, device, equipment and storage medium
CN110163236B (en) Model training method and device, storage medium and electronic device
CN107657249A (en) Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again
CN106951825A (en) A kind of quality of human face image assessment system and implementation method
CN110188829B (en) Neural network training method, target recognition method and related products
EP3136292A1 (en) Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium
CN110147699B (en) Image recognition method and device and related equipment
CN104504362A (en) Face detection method based on convolutional neural network
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN110516537B (en) Face age estimation method based on self-learning
CN110807402B (en) Facial feature positioning method, system and terminal equipment based on skin color detection
CN112163637B (en) Image classification model training method and device based on unbalanced data
WO2019167784A1 (en) Position specifying device, position specifying method, and computer program
CN116363712B (en) Palmprint palm vein recognition method based on modal informativity evaluation strategy
Yang et al. A Face Detection Method Based on Skin Color Model and Improved AdaBoost Algorithm.
CN114463829B (en) Model training method, relationship identification method, electronic device, and storage medium
CN114861875A (en) Internet of things intrusion detection method based on self-supervision learning and self-knowledge distillation
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN114333011B (en) Network training method, face recognition method, electronic device and storage medium
CN112560823B (en) Adaptive variance and weight face age estimation method based on distribution learning
CN111737688B (en) Attack defense system based on user portrait
CN115410265B (en) Model training method, face recognition method, electronic device and storage medium
CN115082762A (en) Target detection unsupervised domain adaptation system based on regional recommendation network center alignment
CN115035562A (en) Facemask shielded face recognition method based on FaceNet improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211228

RJ01 Rejection of invention patent application after publication