CN114764942B - Difficult positive and negative sample online mining method and face recognition method - Google Patents

Difficult positive and negative sample online mining method and face recognition method Download PDF

Info

Publication number
CN114764942B
CN114764942B CN202210555142.9A CN202210555142A CN114764942B CN 114764942 B CN114764942 B CN 114764942B CN 202210555142 A CN202210555142 A CN 202210555142A CN 114764942 B CN114764942 B CN 114764942B
Authority
CN
China
Prior art keywords
sampling
sample
pair
loss function
pairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210555142.9A
Other languages
Chinese (zh)
Other versions
CN114764942A (en
Inventor
郑文先
陶映帆
杨文明
廖庆敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202210555142.9A priority Critical patent/CN114764942B/en
Publication of CN114764942A publication Critical patent/CN114764942A/en
Application granted granted Critical
Publication of CN114764942B publication Critical patent/CN114764942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an online hard positive and negative sample mining method, which is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from a sample face pair by a first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.

Description

Difficult positive and negative sample online mining method and face recognition method
Technical Field
The invention relates to the crossing field of image processing and machine learning, in particular to a hard positive and negative sample mining method and a face recognition method for model training by applying the mining method.
Background
It is a common practice in the field of face recognition to improve the face recognition accuracy by performing model training using a large number of face pictures. Different training processes and methods are adopted, and the obtained model identification effect is different. The improvement of the training process and method to improve the accuracy of face recognition by the model is one of the research subjects of related scholars in the industry.
A face recognition model training framework based on metric learning (metric learning) principle, which can utilize millions of face data sets, usually employs a metric learning algorithm combined with online mining. When a model is trained using a metric learning algorithm, a large number of identical face pairs (positive sample pairs) and different face pairs (negative sample pairs) need to be provided to the model. In the middle and later stages of training, the more difficult the sample pairs are, the more obvious the recognition performance of the model can be improved, and the training speed is higher. With the increase of the number of model training iterations, the difficulty level standard is evolving continuously, so that the difficult sample pairs must be mined on line in the training process, and the difficulty level of the mined positive and negative sample pairs is positively correlated with the data volume participating in each training iteration.
In summary, in the training stage of the face recognition model, the hard positive and negative samples are trained by using the hard positive and negative samples online mining method, and the larger the data batch participating in the hard positive and negative samples online mining is, the more the accuracy of the face recognition algorithm is improved. However, in the existing training process, feature extraction, loss function calculation, gradient calculation and back propagation are mainly performed by a GPU (Graphics Processing Unit), while the GPU card has a limited memory capacity and a limited number of images that can be processed, and the data batch on a single GPU card is usually between dozens to hundreds of face images, which greatly limits the difficulty degree of mining hard positive and negative samples, thereby limiting the improvement of the face recognition model training efficiency.
Disclosure of Invention
The invention mainly aims to provide a method for mining hard positive and negative samples on line, which is used for solving the problems in the prior art and improving the efficiency of mining the hard positive and negative samples on line.
In order to achieve the above purpose, the invention provides the following technical solutions on one hand:
a hard positive and negative sample online mining method is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from a sample face pair by a first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.
On the other hand, the invention provides the following technical scheme:
a difficult positive and negative sample online excavating device is used for the training process of a face recognition model and comprises a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.
The invention further provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the hard positive and negative sample online mining method can be realized.
The invention also provides a face recognition method, which comprises a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method.
The beneficial effects of the invention include: the method can be matched with a first GPU and a second GPU (graphics processing units) through a CPU, image feature calculation is carried out only through the first GPU, after a feature vector pair of a sample pair is obtained, loss function and gradient calculation are carried out through the CPU, and gradient back propagation is carried out through the second GPU sharing model parameters, so that the first GPU is independent of the size of a video memory, the feature vector pair of the sample pair can be continuously calculated in a pipeline mode, the online mining efficiency of difficult samples is improved, and the model training efficiency is further improved.
Drawings
FIG. 1 is a schematic diagram of an online hard positive and negative sample mining method according to an embodiment of the invention;
fig. 2 is a schematic diagram of a three-dimensional loss function matrix a × B × C according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description of embodiments.
The noun interpretation:
(1) positive sample: the same type of image is a positive sample pair, such as a same face pair
(2) Negative sample: the different types of images being negative sample pairs, e.g. different face pairs
(3) Hard samples: the method refers to that the same face pair is recognized as a face pair of different faces by a model in the recognition process
(4) Difficult negative sample: the method refers to that different face pairs are recognized as face pairs of the same face by a model in the recognition process
(5) Metric learning: in the model training process, the distance between the positive sample pairs is made as small as possible, and the distance between the negative sample pairs is made as large as possible. The hard sample pairs are at a greater distance in metric learning and therefore produce a greater gradient dip. The hard negative sample pairs are smaller in distance in metric learning and therefore also produce a larger gradient dip. A simple positive-negative sample pair provides no or a small gradient dip.
(6) Difficult positive negative sample is excavated on line: in the model training process based on metric learning, after each iteration, the input batch data in the next iteration comprises hard positive and negative samples so as to improve the recognition capability of the model and the gradient reduction speed of the model.
The embodiment of the invention provides an online hard positive and negative sample mining method which can be applied to a Central Processing Unit (CPU) and is applied to a training stage of a face recognition model. In this embodiment, the method may adopt the system architecture as shown in fig. 1, which includes one CPU and at least one group of GPUs, where the group of GPUs includes two GPUs, i.e., a first GPU and a second GPU. The first GPU and the second GPU are connected with the CPU in a bus mode, the first GPU is mainly responsible for feature extraction, features of the image are extracted and then transmitted to the CPU, loss functions and gradients are calculated in the CPU, hard samples are mined according to the loss functions, and samples with large loss functions are selected as the hard samples. And meanwhile, the CPU calculates the gradient and transmits the gradient to the second GPU, so that the second GPU performs back propagation through the gradient, and model parameters are adjusted and shared to the first GPU. The loss function of each sample pair is between 0 and 1, and the greater the loss function is, the greater the difficulty of correctly identifying the identified model of the sample pair is.
Specifically, a sample face pair is input into the first GPU for feature extraction, and each face image obtains a corresponding feature vector, so that one sample face pair corresponds to two feature vectors, which are referred to as a feature vector pair herein. The CPU receives the pairs of feature vectors and places them in a pool of feature vectors while computing the penalty function for each pair of feature vectors. In a specific embodiment, the computation of the loss function is as follows:
when the sample face pair is a binary group (i.e. contains two face images), the loss function is:
Figure BDA0003654640020000041
wherein y is a label of the sample face pair, and represents that the sample face pair is a negative sample pair when y =0, and represents that the sample face pair is a positive sample pair when y = 1; d a,b The measurement distance of the face images a and b is a binary group, and the space distance between the characteristic vectors of the face images a and b is represented; beta is a preset threshold value used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or whether the measurement distance of the binary group under the negative sample pair is large enough, and the value range is between 0.35 and 0.4; (beta-d) a,b ) + Represents a change function of the form max (0, beta-d) a,b ) When d is present a,b Max (0, beta-d) when beta is not less than beta a,b ) The value is 0; when d is a,b Less than beta, max (0, beta-d) a,b ) Value of beta-d a,b
It can be seen that when the sample face pair is a positive sample pair, the loss function is
Figure BDA0003654640020000042
The larger the loss function at this time, the larger the metric distance representing the dyad, and the more difficult it is to identify the correct pair of positive samples, which can be considered as a pair of hard-to-correct samples. When the sample face pair is a negative sample pair, the loss function is
Figure BDA0003654640020000043
At the preset value of beta, the larger the loss function is, the metric distance d of the binary group is shown a,b The smaller the less difficult the negative exemplar pair is to be correctly identified as a negative exemplar pair, which can be considered as a difficult negative exemplar pair.
When the sample face pairs are triplets, the loss function can be expressed as:
L 2 =d a,b +(d a,b -d a,c +β) +
the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; d a,c Is the measurement distance between the face images a and c; (d) a,b -d a,c +β) + Represents a change function of the form max (0,d) a,b -d a,c + β, when (d) a,b -d a,c The value of (d) is larger than 0 a,b -d a,c + β), otherwise the value is 0.
It can be seen that when d a,b The smaller, d a,c The larger the loss function is, the larger the loss function is d a,b At this time: when the loss function is smaller, the measurement distance of the positive sample pair a and b is small, and the measurement distance of the negative sample pair a and c is large, the triple is a simple sample pair; when the loss function is larger, the metric distance of the positive sample pair a, b becomes larger, and the positive sample pair a, b is a hard positive sample pair. On the contrary, when d a,b The larger, d a,c The smaller the loss function is d a,b +(d a,b -d a,c +β) + At this time: when the loss function is larger, the measurement distance of the positive sample pair a and b is larger, and the measurement distance of the negative sample pair a and c is smaller, the triplet is a difficult sample pair.
After the loss function is obtained through calculation, the CPU performs hard positive and negative sample mining according to the loss function to obtain a target sample set containing hard positive and negative sample face pairs. Specifically, when the number of feature vector pairs in the feature vector pool reaches a preset number (for example, 128 × 128=16384 feature vector pairs), a sample face pair corresponding to a feature vector pair in the feature vector pool is sampled by a preset sampling strategy, so as to extract hard-positive and negative sample face pairs, and obtain a target sample set. And then, calculating the gradient of each hard positive and negative sample face pair in the target sample set, transmitting the calculated gradient to a second GPU so that the second GPU can perform back propagation through the gradient, adjusting model parameters, sharing the adjusted model parameters to a first GPU, and performing face feature extraction by the first GPU according to the adjusted model parameters.
In some embodiments, the feature vector pool stores feature vector pairs that meet a condition (e.g., the loss function is greater than a predetermined value), and when the number of feature vector pairs that meet the condition reaches a certain number (which may be predetermined according to actual conditions), for example, reaches 128 × 128=16384 feature vector pairs, the sample face pairs corresponding to the feature vector pairs in the feature vector pool are sampled by a predetermined sampling policy.
In other embodiments, the loss functions may also be stored in the feature vector pool, and when the number of the loss functions stored in the feature vector pool (actually equivalent to the number of feature vector pairs) reaches a preset number, the loss functions in the feature vector pool are sampled by a preset sampling strategy, and a sample face pair corresponding to the extracted loss function is used as a hard-positive sample face pair, so as to form the target sample set.
Before sampling begins, the loss functions may be sorted to obtain a loss function matrix, for example, a 128 × 128 loss function matrix is obtained, and then sampling is performed in the loss function matrix through a preset sampling kernel. When sampling is started, a target sample set is initialized, the number of face pairs of samples in the initialized target sample set is 0, and hard positive and negative samples obtained by sampling are added into the target sample set every time sampling is completed. It should be understood that if the feature vector pairs are stored in the feature vector pool, the loss functions corresponding to the stored feature vector pairs are arranged into a loss function matrix; if the loss functions are stored in the feature vector pool, the stored loss functions are sorted into a loss function matrix.
In some embodiments, when the loss functions are sorted, the loss functions may be preprocessed, where the preprocessing may be to calculate an average value and a standard deviation of the loss functions, and control distribution of the loss functions according to the average value and the standard deviation, so that the distribution of the loss functions conforms to gaussian distribution and covers all the loss functions higher than a first threshold; the first threshold is a preset threshold, and in this embodiment, the value is 0.8, it should be understood that the user may take other values between 0 and 1 according to the actual requirement, for example, some values around 0.8, and the invention is not limited thereto. And the loss functions conforming to the Gaussian distribution are used for constructing a loss function matrix, so that most of lower loss functions are removed, the balance between the loss functions higher than 0.8 and the loss functions lower than 0.2 is kept, and the balance of the samples is further ensured.
And then, sampling the loss function matrix, and adding a sample face pair corresponding to the sampled loss function into a target sample set as a hard positive and negative sample face pair. In the early stage of model training, a difficult sample pair with great gradient contribution exists, and the loss function is large and is close to 1; along with the increase of the iteration times, the recognition capability of the model is also increased, and the contribution of the hard sample to the brought gradient is reduced compared with the early training period, so that the sampling method I can be adopted for sampling in the early training period; and in the later training stage, sampling can be performed by adopting a second sampling method. When training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a sampling method to obtain a target sample set containing hard positive and negative sample face pairs in current iteration; during each iteration in the training process, judging whether the sampling method can be switched to a second sampling method, wherein the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to a second sampling method for sampling; wherein e% is a preset percentage threshold; e can be a value between 0 and 60, preferably 50, depending on experience or practice. Or when the value of the largest of the eigenvectors in the eigenvector pool corresponding to the eigenvector pair (or the value of the largest of the loss function matrix) is less than f, switching to the sampling method two for sampling; wherein f is more than 0 and less than 0.5.
The first sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the sum of loss functions in the sampling area; and selecting sample face pairs corresponding to the Top N loss functions with the largest values from the loss functions in the sampling region according to the integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set.
For example, in the first sampling method, the sampling kernel may be 3*3, for example, and the sampling step size is 3. Specifically, 3*3 sampling kernel has a 3*3 sampling region in the loss function matrix, and 3*3 sampling region loss function sum is calculated by 3*3 sampling kernel, which is a value between 0 and 9. According to the integer number N of the loss function sum, the sampling number in the 3*3 sampling area is determined to be N, and the Top N loss functions with the largest value can be selected from the sampling area to be added into the target sample set. For example, when the sum of the loss functions is 8.3, the sample face pair corresponding to the loss function with the largest Top 8 values in the sample region may be selected and added to the target sample set.
The second sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area; and selecting sample face pairs corresponding to Top M loss functions with the largest values from the loss functions of the sampling region according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.
For example, in the first sampling method, the sampling kernel may be 3*3, for example, and the sampling step size is 3. Specifically, 3*3 sampling kernel has a 3*3 sampling region in the loss function matrix, and 3*3 sampling region's weighted sum of loss functions is computed by 3*3 sampling kernel. According to the integer number M of the weighted sum of the loss functions, the sample face pair corresponding to the loss function with the largest Top M value can be selected from the sampling area and added into the target sample set. Wherein the weighted sum of the loss functions is related to the current number of iterations. The weighted sum w of the loss functions within the sample region can be calculated by the following equation:
Figure BDA0003654640020000071
wherein J is the current iteration number, T j Is the number of pairs of feature vectors, L, in the pool of feature vectors in the current iteration t For the loss function corresponding to the T-th eigenvector pair in the current iteration, T i Is the total number of sample face pairs, L, in the target sample set in the last iteration k Is the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.
In different training stages, different sampling methods are adopted, corresponding sample face pairs are sampled from the loss function matrix to a target sample set, and difficult sample pairs in different stages can be mined more accurately. Further, in the second sampling method, a dynamic weighting sampling method is adopted, so that difficult samples in the later training period can be mined more accurately. Compared with the scheme that the sample face pairs corresponding to the Top N loss functions are directly adopted as the difficult samples in the prior art, and the gradient descending direction cannot be ensured, the embodiment of the invention selects the difficult samples dynamically by using the first sampling method and the second sampling method, so that the number of the difficult samples is more balanced, the gradient descending direction is ensured, and the training efficiency is improved. The gradient direction can affect the training effect of the model, when the samples are unbalanced, for example, only hard samples exist, the model can be trained only by the hard positive and negative samples, and the obtained model can learn to recognize faces which are difficult to recognize, so that the trained model is sensitive to the faces which are difficult to recognize (the recognition accuracy is high), and is not sensitive to some simple faces (the recognition accuracy is poor).
The sampling method I and the sampling method II select that the sample face pairs corresponding to the Top N loss functions and the Top M loss functions are added into the target sample set, after the target sample set is obtained, the target sample set comprises the difficult sample face pairs with the large loss functions, in the training process, gradients corresponding to all the sample face pairs in the target sample set are issued to the second GPU for back propagation, and model parameters in the second GPU are adjusted. The first GPU and the second GPU are only hardware devices equipped with recognition models, and may also be referred to as training engines. It should be noted that, in the case that the CPU has enough computing resources, multiple metric learning models may be trained simultaneously, and multiple sets of GPUs are provided, where each metric learning model needs to be provided with one set of GPUs (including a first GPU and a second GPU). Fig. 1 is a schematic flow chart of simultaneous training of multiple metric learning models.
In other examples, the loss function matrix may be further sliced to obtain a slice matrix with a preset size; and then, the slice matrixes are superposed to obtain a three-dimensional loss function matrix A, B and C shown in figure 2, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrixes contain A, B loss functions, and C represents the number of the slice matrixes. For example, slicing the 128 × 128 loss function matrix and then stacking the sliced loss function matrix into 64 × 4 three-dimensional loss function matrices, that is, slicing the 128 × 128 loss function matrix to obtain 4 64 × 64 slice matrices and then stacking the sliced loss function matrices, and certainly, when the 128 × 128 loss function matrix is sliced into other sizes and corresponding numbers, the size and the corresponding numbers may be possible; and performing sampling by using a first sampling method and a second sampling method on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling core formed by overlapping C preset sampling cores to obtain the target sample set. For example, the sampling method combining the first sampling method and the second sampling method is implemented on the three-dimensional loss function matrix a × B × C through the three-dimensional acquisition core 3 × C, and the size of the three-dimensional loss function matrix is reduced, so that the area of the three-dimensional acquisition core to be sampled is reduced, the sliding frequency of the three-dimensional acquisition core can be reduced, and the sampling speed can be increased.
In addition, in the second sampling method, random masking may be performed on the sampling core, specifically, first, the largest loss function in the current iteration sampling region is obtained as a mask value, and then a position is randomly selected from the sampling core and masked with the mask value, so that the loss function of the masked position is output as the mask value in the sampling process. By random masking, a smaller loss function can be sampled at a chance, and the distribution of the loss function sampled by the sampling core is more balanced.
In some embodiments, when the second GPU shares the model parameter with the first GPU, in order to reduce the data transfer amount of the shared model parameter, the following operations may be performed: only the DIFF (difference value) between the model parameter adjusted by the second GPU and the current model parameter of the first GPU in the current iteration may be calculated, and then only the difference value is transmitted to the first GPU to be added to the current model parameter of the first GPU, so that the model parameter identical to that of the second GPU is obtained in the first GPU, that is, the model parameter adjusted by the second GPU is shared to the first GPU. For example, the current model parameter of the first GPU is (1,2,3,4,5,6,7,8,9), the model parameter of the second GPU after back propagation is (1,3,3,4,5,6,7,8,9), and the difference value may be (1-1,3-2,3-3,4-4,5-5,6-6,7-7,8-8,9-9) = (0,1,0,0,0,0,0,0,0), the difference value (0,1,0,0,0,0,0,0,0) is only transmitted to the first GPU and added to the current model parameter of the first GPU to obtain the same model parameter as that of the second GPU, so as to implement sharing, and this way of only transmitting the difference value reduces the data amount shared by the parameters.
In another embodiment of the present invention, an online hard positive and negative sample mining device is provided, which is used for a training process of a face recognition model, and includes a Central Processing Unit (CPU), and a first graphics processing unit (first GPU) and a second graphics processing unit (second GPU) connected to the CPU; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.
It should be understood that, in the above apparatus, the first GPU, the second GPU and the CPU may be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. For example, the CPU is configured to perform hard positive and negative sample mining, loss function calculation, and gradient calculation, and the specific mining steps and calculation steps can be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. The detailed description is omitted, and it should be understood by those skilled in the art that the apparatus is an apparatus corresponding to the hard positive and negative sample online mining method of the foregoing embodiment.
Furthermore, an embodiment of the present invention may further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the hard positive and negative sample online mining method of the foregoing embodiment can be implemented. A computer readable storage medium may include, among other things, a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied in a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Other embodiments of the present invention further provide a face recognition method, which includes a training process of a face recognition model, where the training process includes the steps of the above hard positive and negative sample online mining method.
The foregoing is a further detailed description of the invention in connection with specific preferred embodiments and it is not intended to limit the invention to the specific embodiments described. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims (11)

1. A hard positive and negative sample online mining method is used for a training process of a face recognition model, and is characterized by comprising the following steps:
s1, acquiring a feature vector pair extracted from a sample face pair by a first graphic processor;
s2, calculating a loss function of the sample face pair according to the feature vector pair;
s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs;
s4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor so that the second graphics processor can perform back propagation through the gradient, adjust model parameters, and share the adjusted model parameters to the first graphics processor;
step S3 specifically includes:
adding the feature vector pairs of the sample face pairs into a feature vector pool;
the loss functions corresponding to the eigenvector pairs in the eigenvector pool are arranged into a loss function matrix, and then sampling is carried out in the loss function matrix through a preset sampling kernel to obtain the target sample set; the sampling step comprises: when training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a first sampling method to obtain a target sampling set containing hard positive and negative sample face pairs in current iteration; during each iteration in the training process, whether the second sampling method is switched to is judged, and the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to the second sampling method for sampling; wherein e% is a preset percentage threshold; or when the value of the largest loss function corresponding to the feature vector pair in the feature vector pool or the value of the largest loss function matrix is smaller than f, switching to the second sampling method for sampling; wherein f is more than 0 and less than 0.5.
2. The online hard positive-negative sample mining method according to claim 1, wherein in step S2, when the sample face pair is a binary group, the loss function is:
Figure FDA0003899215830000011
wherein y is a label of the sample face pair, and when y =0, it indicates that the sample face pair is a negative sample pair, and when y =1, it indicates that the sample face pair is a positive sample pair; d a,b The measurement distance of a face image a and b is a binary group, and the space distance between the characteristic vectors of the face image a and b is represented; beta is a preset threshold value used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or measuring whether the measurement distance of the binary group under the negative sample pair is large enough; (beta-d) a,b ) + Represents a change function of the form max (0, beta-d) a,b ) When d is a,b Max (0, beta-d) when beta is not less than beta a,b ) The value is 0; when d is a,b Less than beta, max (0, beta-d) a,b ) Value of beta-d a,b
When the sample face pairs are triplets, the loss function is:
L 2 =d a,b +(d a,b -d a,c +β) +
the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; d a,c Is the measurement distance between the face images a and c; (d) a,b -d a,c +β) + Represents a change function of the form max (0,d) a,b -d a,c + β, when (d) a,b -d a,c When + beta) is greater than 0, the value is (d) a,b -d a,c + β), otherwise, 0.
3. The method for mining hard positive and negative samples on line according to claim 1, wherein the step of arranging the loss functions corresponding to the feature vector pairs in the feature vector pool into a loss function matrix comprises the following steps:
calculating the average value and the standard deviation of the loss function corresponding to the feature vector pairs in the feature vector pool;
controlling the distribution of the loss functions corresponding to the eigenvectors in the eigenvector pool according to the average value and the standard deviation to enable the loss functions to be in accordance with Gaussian distribution and cover all loss functions higher than a first threshold;
a loss function conforming to a gaussian distribution is used to construct a matrix of loss functions.
4. The hard positive-negative sample online mining method according to claim 1, wherein the first sampling method comprises A1-A2:
a1, selecting a sampling area in the loss function matrix by using the preset sampling core, and calculating the sum of loss functions in the sampling area;
a2, selecting sample face pairs corresponding to TopN loss functions with the largest values from the loss functions of the sampling area according to an integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set;
the second sampling method includes steps B1 to B2:
b1, selecting a sampling area in the loss function matrix by using the preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area;
and B2, selecting sample face pairs corresponding to TopM loss functions with the maximum value from the loss functions of the sampling area according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.
5. The method for mining hard positive and negative samples on line as claimed in claim 4, wherein the weighted sum w of the loss functions in the sample area is determined in B1 by the following formula:
Figure FDA0003899215830000031
wherein J is the current iteration number, T j Is the number of pairs of feature vectors in the feature vector pool in the current iteration, L t For the loss function corresponding to the T-th eigenvector pair in the current iteration, T i Is the total number of sample face pairs, L, in the target sample set in the last iteration k Is the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.
6. The online hard positive and negative sample mining method according to claim 4, further comprising:
slicing the loss function matrix to obtain a slice matrix with a preset size;
superposing the slice matrixes to obtain a three-dimensional loss function matrix A, B and C, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrix comprises A, B loss functions, and C represents the number of the slice matrixes;
and performing sampling on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling kernel formed by overlapping C preset sampling kernels by using the first sampling method and the second sampling method to obtain the target sample set.
7. The hard positive-negative sample online mining method according to claim 4, wherein in the second sampling method, the preset sampling kernel is randomly masked, and the random masking comprises:
firstly, the maximum loss function in the current iteration sampling area is obtained as a mask value, and then a position is randomly selected from a sampling core to be masked by using the mask value, so that the loss function at the masked position is output as the mask value in the sampling process.
8. The method for mining hard positive and negative samples on line as claimed in claim 1, wherein the step S4 shares the adjusted model parameters to the first graphic processor, and comprises:
calculating the difference value between the model parameter adjusted by the second graphic processor in the current iteration and the model parameter of the first graphic processor;
and transmitting the difference value to the first graphics processor to be added with the model parameter of the first graphics processor, so that the model parameter adjusted by the second graphics processor is shared to the first graphics processor.
9. A difficult positive and negative sample online excavating device is used for a training process of a face recognition model and is characterized by comprising a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphic processor;
according to the loss function, hard positive and negative sample mining is carried out on the sample face pairs to obtain a target sample set containing the hard positive and negative sample face pairs, and the method specifically comprises the following steps:
adding the feature vector pair of the sample face pair into a feature vector pool; the loss functions corresponding to the eigenvector pairs in the eigenvector pool are arranged into a loss function matrix, and then sampling is carried out in the loss function matrix through a preset sampling kernel to obtain the target sample set; the sampling step comprises: when training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a first sampling method to obtain a target sampling set containing hard positive and negative sample face pairs in current iteration; during each iteration in the training process, whether the second sampling method is switched to is judged, and the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to the second sampling method for sampling; wherein e% is a preset percentage threshold; or when the value of the largest loss function corresponding to the feature vector pair in the feature vector pool or the value of the largest loss function matrix is smaller than f, switching to the second sampling method for sampling; wherein f is more than 0 and less than 0.5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of the hard positive and negative sample online mining method according to any one of claims 1 to 8.
11. A face recognition method comprising a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method according to any one of claims 1 to 8.
CN202210555142.9A 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method Active CN114764942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210555142.9A CN114764942B (en) 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210555142.9A CN114764942B (en) 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method

Publications (2)

Publication Number Publication Date
CN114764942A CN114764942A (en) 2022-07-19
CN114764942B true CN114764942B (en) 2022-12-09

Family

ID=82364980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210555142.9A Active CN114764942B (en) 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method

Country Status (1)

Country Link
CN (1) CN114764942B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558057B (en) * 2024-01-12 2024-04-16 清华大学深圳国际研究生院 Face recognition method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035751B (en) * 2014-06-20 2016-10-12 深圳市腾讯计算机***有限公司 Data parallel processing method based on multi-graphics processor and device
US10002402B2 (en) * 2015-07-23 2018-06-19 Sony Corporation Learning convolution neural networks on heterogeneous CPU-GPU platform
CN107330355B (en) * 2017-05-11 2021-01-26 中山大学 Deep pedestrian re-identification method based on positive sample balance constraint
US11164079B2 (en) * 2017-12-15 2021-11-02 International Business Machines Corporation Multi-GPU deep learning using CPUs
CN108647577B (en) * 2018-04-10 2021-04-20 华中科技大学 Self-adaptive pedestrian re-identification method and system for difficult excavation
CN110163265A (en) * 2019-04-30 2019-08-23 腾讯科技(深圳)有限公司 Data processing method, device and computer equipment
JP7256811B2 (en) * 2019-10-12 2023-04-12 バイドゥドットコム タイムズ テクノロジー (ベイジン) カンパニー リミテッド Method and system for accelerating AI training using advanced interconnect technology
CN111667050B (en) * 2020-04-21 2021-11-30 佳都科技集团股份有限公司 Metric learning method, device, equipment and storage medium
CN113569657A (en) * 2021-07-05 2021-10-29 浙江大华技术股份有限公司 Pedestrian re-identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114764942A (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
EP3872705A1 (en) Detection model training method and apparatus and terminal device
CN109902546A (en) Face identification method, device and computer-readable medium
CN108280455B (en) Human body key point detection method and apparatus, electronic device, program, and medium
CN107292352B (en) Image classification method and device based on convolutional neural network
CN110969250A (en) Neural network training method and device
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
CN113111979B (en) Model training method, image detection method and detection device
CN112115973A (en) Convolutional neural network based image identification method
CN110298394B (en) Image recognition method and related device
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN111224905B (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN109409210B (en) Face detection method and system based on SSD (solid State disk) framework
CN114764942B (en) Difficult positive and negative sample online mining method and face recognition method
CN114092793B (en) End-to-end biological target detection method suitable for complex underwater environment
CN110210278A (en) A kind of video object detection method, device and storage medium
CN115601751B (en) Fundus image semantic segmentation method based on domain generalization
CN114445268A (en) Garment style migration method and system based on deep learning
CN113989519B (en) Long-tail target detection method and system
CN111260655A (en) Image generation method and device based on deep neural network model
CN112862095A (en) Self-distillation learning method and device based on characteristic analysis and readable storage medium
CN112132841A (en) Medical image cutting method and device
Patil et al. Deep hyperparameter transfer learning for diabetic retinopathy classification
CN110796716A (en) Image coloring method based on multiple residual error networks and regularized transfer learning
CN111860557A (en) Image processing method and device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant