CN114764942B

CN114764942B - Difficult positive and negative sample online mining method and face recognition method

Info

Publication number: CN114764942B
Application number: CN202210555142.9A
Authority: CN
Inventors: 郑文先; 陶映帆; 杨文明; 廖庆敏
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-12-09
Anticipated expiration: 2042-05-20
Also published as: CN114764942A

Abstract

The invention discloses an online hard positive and negative sample mining method, which is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from a sample face pair by a first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.

Description

Difficult positive and negative sample online mining method and face recognition method

Technical Field

The invention relates to the crossing field of image processing and machine learning, in particular to a hard positive and negative sample mining method and a face recognition method for model training by applying the mining method.

Background

It is a common practice in the field of face recognition to improve the face recognition accuracy by performing model training using a large number of face pictures. Different training processes and methods are adopted, and the obtained model identification effect is different. The improvement of the training process and method to improve the accuracy of face recognition by the model is one of the research subjects of related scholars in the industry.

A face recognition model training framework based on metric learning (metric learning) principle, which can utilize millions of face data sets, usually employs a metric learning algorithm combined with online mining. When a model is trained using a metric learning algorithm, a large number of identical face pairs (positive sample pairs) and different face pairs (negative sample pairs) need to be provided to the model. In the middle and later stages of training, the more difficult the sample pairs are, the more obvious the recognition performance of the model can be improved, and the training speed is higher. With the increase of the number of model training iterations, the difficulty level standard is evolving continuously, so that the difficult sample pairs must be mined on line in the training process, and the difficulty level of the mined positive and negative sample pairs is positively correlated with the data volume participating in each training iteration.

In summary, in the training stage of the face recognition model, the hard positive and negative samples are trained by using the hard positive and negative samples online mining method, and the larger the data batch participating in the hard positive and negative samples online mining is, the more the accuracy of the face recognition algorithm is improved. However, in the existing training process, feature extraction, loss function calculation, gradient calculation and back propagation are mainly performed by a GPU (Graphics Processing Unit), while the GPU card has a limited memory capacity and a limited number of images that can be processed, and the data batch on a single GPU card is usually between dozens to hundreds of face images, which greatly limits the difficulty degree of mining hard positive and negative samples, thereby limiting the improvement of the face recognition model training efficiency.

Disclosure of Invention

The invention mainly aims to provide a method for mining hard positive and negative samples on line, which is used for solving the problems in the prior art and improving the efficiency of mining the hard positive and negative samples on line.

In order to achieve the above purpose, the invention provides the following technical solutions on one hand:

a hard positive and negative sample online mining method is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from a sample face pair by a first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.

On the other hand, the invention provides the following technical scheme:

a difficult positive and negative sample online excavating device is used for the training process of a face recognition model and comprises a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.

The invention further provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the hard positive and negative sample online mining method can be realized.

The invention also provides a face recognition method, which comprises a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method.

The beneficial effects of the invention include: the method can be matched with a first GPU and a second GPU (graphics processing units) through a CPU, image feature calculation is carried out only through the first GPU, after a feature vector pair of a sample pair is obtained, loss function and gradient calculation are carried out through the CPU, and gradient back propagation is carried out through the second GPU sharing model parameters, so that the first GPU is independent of the size of a video memory, the feature vector pair of the sample pair can be continuously calculated in a pipeline mode, the online mining efficiency of difficult samples is improved, and the model training efficiency is further improved.

Drawings

FIG. 1 is a schematic diagram of an online hard positive and negative sample mining method according to an embodiment of the invention;

fig. 2 is a schematic diagram of a three-dimensional loss function matrix a × B × C according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description of embodiments.

The noun interpretation:

(1) positive sample: the same type of image is a positive sample pair, such as a same face pair

(2) Negative sample: the different types of images being negative sample pairs, e.g. different face pairs

(3) Hard samples: the method refers to that the same face pair is recognized as a face pair of different faces by a model in the recognition process

(4) Difficult negative sample: the method refers to that different face pairs are recognized as face pairs of the same face by a model in the recognition process

(5) Metric learning: in the model training process, the distance between the positive sample pairs is made as small as possible, and the distance between the negative sample pairs is made as large as possible. The hard sample pairs are at a greater distance in metric learning and therefore produce a greater gradient dip. The hard negative sample pairs are smaller in distance in metric learning and therefore also produce a larger gradient dip. A simple positive-negative sample pair provides no or a small gradient dip.

(6) Difficult positive negative sample is excavated on line: in the model training process based on metric learning, after each iteration, the input batch data in the next iteration comprises hard positive and negative samples so as to improve the recognition capability of the model and the gradient reduction speed of the model.

The embodiment of the invention provides an online hard positive and negative sample mining method which can be applied to a Central Processing Unit (CPU) and is applied to a training stage of a face recognition model. In this embodiment, the method may adopt the system architecture as shown in fig. 1, which includes one CPU and at least one group of GPUs, where the group of GPUs includes two GPUs, i.e., a first GPU and a second GPU. The first GPU and the second GPU are connected with the CPU in a bus mode, the first GPU is mainly responsible for feature extraction, features of the image are extracted and then transmitted to the CPU, loss functions and gradients are calculated in the CPU, hard samples are mined according to the loss functions, and samples with large loss functions are selected as the hard samples. And meanwhile, the CPU calculates the gradient and transmits the gradient to the second GPU, so that the second GPU performs back propagation through the gradient, and model parameters are adjusted and shared to the first GPU. The loss function of each sample pair is between 0 and 1, and the greater the loss function is, the greater the difficulty of correctly identifying the identified model of the sample pair is.

Specifically, a sample face pair is input into the first GPU for feature extraction, and each face image obtains a corresponding feature vector, so that one sample face pair corresponds to two feature vectors, which are referred to as a feature vector pair herein. The CPU receives the pairs of feature vectors and places them in a pool of feature vectors while computing the penalty function for each pair of feature vectors. In a specific embodiment, the computation of the loss function is as follows:

when the sample face pair is a binary group (i.e. contains two face images), the loss function is:

wherein y is a label of the sample face pair, and represents that the sample face pair is a negative sample pair when y =0, and represents that the sample face pair is a positive sample pair when y = 1; d _a,b The measurement distance of the face images a and b is a binary group, and the space distance between the characteristic vectors of the face images a and b is represented; beta is a preset threshold value used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or whether the measurement distance of the binary group under the negative sample pair is large enough, and the value range is between 0.35 and 0.4; (beta-d) _a,b ) ₊ Represents a change function of the form max (0, beta-d) _a,b ) When d is present _a,b Max (0, beta-d) when beta is not less than beta _a,b ) The value is 0; when d is _a,b Less than beta, max (0, beta-d) _a,b ) Value of beta-d _a,b 。

It can be seen that when the sample face pair is a positive sample pair, the loss function is

The larger the loss function at this time, the larger the metric distance representing the dyad, and the more difficult it is to identify the correct pair of positive samples, which can be considered as a pair of hard-to-correct samples. When the sample face pair is a negative sample pair, the loss function is

At the preset value of beta, the larger the loss function is, the metric distance d of the binary group is shown _a,b The smaller the less difficult the negative exemplar pair is to be correctly identified as a negative exemplar pair, which can be considered as a difficult negative exemplar pair.

When the sample face pairs are triplets, the loss function can be expressed as:

L ₂ ＝d _a,b +(d _a,b -d _a,c +β) ₊

the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; d _a,c Is the measurement distance between the face images a and c; (d) _a,b -d _a,c +β) ₊ Represents a change function of the form max (0,d) _a,b -d _a,c + β, when (d) _a,b -d _a,c The value of (d) is larger than 0 _a,b -d _a,c + β), otherwise the value is 0.

It can be seen that when d _a,b The smaller, d _a,c The larger the loss function is, the larger the loss function is d _a,b At this time: when the loss function is smaller, the measurement distance of the positive sample pair a and b is small, and the measurement distance of the negative sample pair a and c is large, the triple is a simple sample pair; when the loss function is larger, the metric distance of the positive sample pair a, b becomes larger, and the positive sample pair a, b is a hard positive sample pair. On the contrary, when d _a,b The larger, d _a,c The smaller the loss function is d _a,b +(d _a,b -d _a,c +β) ₊ At this time: when the loss function is larger, the measurement distance of the positive sample pair a and b is larger, and the measurement distance of the negative sample pair a and c is smaller, the triplet is a difficult sample pair.

After the loss function is obtained through calculation, the CPU performs hard positive and negative sample mining according to the loss function to obtain a target sample set containing hard positive and negative sample face pairs. Specifically, when the number of feature vector pairs in the feature vector pool reaches a preset number (for example, 128 × 128=16384 feature vector pairs), a sample face pair corresponding to a feature vector pair in the feature vector pool is sampled by a preset sampling strategy, so as to extract hard-positive and negative sample face pairs, and obtain a target sample set. And then, calculating the gradient of each hard positive and negative sample face pair in the target sample set, transmitting the calculated gradient to a second GPU so that the second GPU can perform back propagation through the gradient, adjusting model parameters, sharing the adjusted model parameters to a first GPU, and performing face feature extraction by the first GPU according to the adjusted model parameters.

In some embodiments, the feature vector pool stores feature vector pairs that meet a condition (e.g., the loss function is greater than a predetermined value), and when the number of feature vector pairs that meet the condition reaches a certain number (which may be predetermined according to actual conditions), for example, reaches 128 × 128=16384 feature vector pairs, the sample face pairs corresponding to the feature vector pairs in the feature vector pool are sampled by a predetermined sampling policy.

In other embodiments, the loss functions may also be stored in the feature vector pool, and when the number of the loss functions stored in the feature vector pool (actually equivalent to the number of feature vector pairs) reaches a preset number, the loss functions in the feature vector pool are sampled by a preset sampling strategy, and a sample face pair corresponding to the extracted loss function is used as a hard-positive sample face pair, so as to form the target sample set.

Before sampling begins, the loss functions may be sorted to obtain a loss function matrix, for example, a 128 × 128 loss function matrix is obtained, and then sampling is performed in the loss function matrix through a preset sampling kernel. When sampling is started, a target sample set is initialized, the number of face pairs of samples in the initialized target sample set is 0, and hard positive and negative samples obtained by sampling are added into the target sample set every time sampling is completed. It should be understood that if the feature vector pairs are stored in the feature vector pool, the loss functions corresponding to the stored feature vector pairs are arranged into a loss function matrix; if the loss functions are stored in the feature vector pool, the stored loss functions are sorted into a loss function matrix.

In some embodiments, when the loss functions are sorted, the loss functions may be preprocessed, where the preprocessing may be to calculate an average value and a standard deviation of the loss functions, and control distribution of the loss functions according to the average value and the standard deviation, so that the distribution of the loss functions conforms to gaussian distribution and covers all the loss functions higher than a first threshold; the first threshold is a preset threshold, and in this embodiment, the value is 0.8, it should be understood that the user may take other values between 0 and 1 according to the actual requirement, for example, some values around 0.8, and the invention is not limited thereto. And the loss functions conforming to the Gaussian distribution are used for constructing a loss function matrix, so that most of lower loss functions are removed, the balance between the loss functions higher than 0.8 and the loss functions lower than 0.2 is kept, and the balance of the samples is further ensured.

And then, sampling the loss function matrix, and adding a sample face pair corresponding to the sampled loss function into a target sample set as a hard positive and negative sample face pair. In the early stage of model training, a difficult sample pair with great gradient contribution exists, and the loss function is large and is close to 1; along with the increase of the iteration times, the recognition capability of the model is also increased, and the contribution of the hard sample to the brought gradient is reduced compared with the early training period, so that the sampling method I can be adopted for sampling in the early training period; and in the later training stage, sampling can be performed by adopting a second sampling method. When training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a sampling method to obtain a target sample set containing hard positive and negative sample face pairs in current iteration; during each iteration in the training process, judging whether the sampling method can be switched to a second sampling method, wherein the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to a second sampling method for sampling; wherein e% is a preset percentage threshold; e can be a value between 0 and 60, preferably 50, depending on experience or practice. Or when the value of the largest of the eigenvectors in the eigenvector pool corresponding to the eigenvector pair (or the value of the largest of the loss function matrix) is less than f, switching to the sampling method two for sampling; wherein f is more than 0 and less than 0.5.

The first sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the sum of loss functions in the sampling area; and selecting sample face pairs corresponding to the Top N loss functions with the largest values from the loss functions in the sampling region according to the integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set.

For example, in the first sampling method, the sampling kernel may be 3*3, for example, and the sampling step size is 3. Specifically, 3*3 sampling kernel has a 3*3 sampling region in the loss function matrix, and 3*3 sampling region loss function sum is calculated by 3*3 sampling kernel, which is a value between 0 and 9. According to the integer number N of the loss function sum, the sampling number in the 3*3 sampling area is determined to be N, and the Top N loss functions with the largest value can be selected from the sampling area to be added into the target sample set. For example, when the sum of the loss functions is 8.3, the sample face pair corresponding to the loss function with the largest Top 8 values in the sample region may be selected and added to the target sample set.

The second sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area; and selecting sample face pairs corresponding to Top M loss functions with the largest values from the loss functions of the sampling region according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.

For example, in the first sampling method, the sampling kernel may be 3*3, for example, and the sampling step size is 3. Specifically, 3*3 sampling kernel has a 3*3 sampling region in the loss function matrix, and 3*3 sampling region's weighted sum of loss functions is computed by 3*3 sampling kernel. According to the integer number M of the weighted sum of the loss functions, the sample face pair corresponding to the loss function with the largest Top M value can be selected from the sampling area and added into the target sample set. Wherein the weighted sum of the loss functions is related to the current number of iterations. The weighted sum w of the loss functions within the sample region can be calculated by the following equation:

wherein J is the current iteration number, T _j Is the number of pairs of feature vectors, L, in the pool of feature vectors in the current iteration _t For the loss function corresponding to the T-th eigenvector pair in the current iteration, T _i Is the total number of sample face pairs, L, in the target sample set in the last iteration _k Is the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.

In different training stages, different sampling methods are adopted, corresponding sample face pairs are sampled from the loss function matrix to a target sample set, and difficult sample pairs in different stages can be mined more accurately. Further, in the second sampling method, a dynamic weighting sampling method is adopted, so that difficult samples in the later training period can be mined more accurately. Compared with the scheme that the sample face pairs corresponding to the Top N loss functions are directly adopted as the difficult samples in the prior art, and the gradient descending direction cannot be ensured, the embodiment of the invention selects the difficult samples dynamically by using the first sampling method and the second sampling method, so that the number of the difficult samples is more balanced, the gradient descending direction is ensured, and the training efficiency is improved. The gradient direction can affect the training effect of the model, when the samples are unbalanced, for example, only hard samples exist, the model can be trained only by the hard positive and negative samples, and the obtained model can learn to recognize faces which are difficult to recognize, so that the trained model is sensitive to the faces which are difficult to recognize (the recognition accuracy is high), and is not sensitive to some simple faces (the recognition accuracy is poor).

The sampling method I and the sampling method II select that the sample face pairs corresponding to the Top N loss functions and the Top M loss functions are added into the target sample set, after the target sample set is obtained, the target sample set comprises the difficult sample face pairs with the large loss functions, in the training process, gradients corresponding to all the sample face pairs in the target sample set are issued to the second GPU for back propagation, and model parameters in the second GPU are adjusted. The first GPU and the second GPU are only hardware devices equipped with recognition models, and may also be referred to as training engines. It should be noted that, in the case that the CPU has enough computing resources, multiple metric learning models may be trained simultaneously, and multiple sets of GPUs are provided, where each metric learning model needs to be provided with one set of GPUs (including a first GPU and a second GPU). Fig. 1 is a schematic flow chart of simultaneous training of multiple metric learning models.

In other examples, the loss function matrix may be further sliced to obtain a slice matrix with a preset size; and then, the slice matrixes are superposed to obtain a three-dimensional loss function matrix A, B and C shown in figure 2, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrixes contain A, B loss functions, and C represents the number of the slice matrixes. For example, slicing the 128 × 128 loss function matrix and then stacking the sliced loss function matrix into 64 × 4 three-dimensional loss function matrices, that is, slicing the 128 × 128 loss function matrix to obtain 4 64 × 64 slice matrices and then stacking the sliced loss function matrices, and certainly, when the 128 × 128 loss function matrix is sliced into other sizes and corresponding numbers, the size and the corresponding numbers may be possible; and performing sampling by using a first sampling method and a second sampling method on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling core formed by overlapping C preset sampling cores to obtain the target sample set. For example, the sampling method combining the first sampling method and the second sampling method is implemented on the three-dimensional loss function matrix a × B × C through the three-dimensional acquisition core 3 × C, and the size of the three-dimensional loss function matrix is reduced, so that the area of the three-dimensional acquisition core to be sampled is reduced, the sliding frequency of the three-dimensional acquisition core can be reduced, and the sampling speed can be increased.

In addition, in the second sampling method, random masking may be performed on the sampling core, specifically, first, the largest loss function in the current iteration sampling region is obtained as a mask value, and then a position is randomly selected from the sampling core and masked with the mask value, so that the loss function of the masked position is output as the mask value in the sampling process. By random masking, a smaller loss function can be sampled at a chance, and the distribution of the loss function sampled by the sampling core is more balanced.

In some embodiments, when the second GPU shares the model parameter with the first GPU, in order to reduce the data transfer amount of the shared model parameter, the following operations may be performed: only the DIFF (difference value) between the model parameter adjusted by the second GPU and the current model parameter of the first GPU in the current iteration may be calculated, and then only the difference value is transmitted to the first GPU to be added to the current model parameter of the first GPU, so that the model parameter identical to that of the second GPU is obtained in the first GPU, that is, the model parameter adjusted by the second GPU is shared to the first GPU. For example, the current model parameter of the first GPU is (1,2,3,4,5,6,7,8,9), the model parameter of the second GPU after back propagation is (1,3,3,4,5,6,7,8,9), and the difference value may be (1-1,3-2,3-3,4-4,5-5,6-6,7-7,8-8,9-9) = (0,1,0,0,0,0,0,0,0), the difference value (0,1,0,0,0,0,0,0,0) is only transmitted to the first GPU and added to the current model parameter of the first GPU to obtain the same model parameter as that of the second GPU, so as to implement sharing, and this way of only transmitting the difference value reduces the data amount shared by the parameters.

In another embodiment of the present invention, an online hard positive and negative sample mining device is provided, which is used for a training process of a face recognition model, and includes a Central Processing Unit (CPU), and a first graphics processing unit (first GPU) and a second graphics processing unit (second GPU) connected to the CPU; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.

It should be understood that, in the above apparatus, the first GPU, the second GPU and the CPU may be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. For example, the CPU is configured to perform hard positive and negative sample mining, loss function calculation, and gradient calculation, and the specific mining steps and calculation steps can be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. The detailed description is omitted, and it should be understood by those skilled in the art that the apparatus is an apparatus corresponding to the hard positive and negative sample online mining method of the foregoing embodiment.

Furthermore, an embodiment of the present invention may further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the hard positive and negative sample online mining method of the foregoing embodiment can be implemented. A computer readable storage medium may include, among other things, a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied in a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Other embodiments of the present invention further provide a face recognition method, which includes a training process of a face recognition model, where the training process includes the steps of the above hard positive and negative sample online mining method.

The foregoing is a further detailed description of the invention in connection with specific preferred embodiments and it is not intended to limit the invention to the specific embodiments described. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A hard positive and negative sample online mining method is used for a training process of a face recognition model, and is characterized by comprising the following steps:

s1, acquiring a feature vector pair extracted from a sample face pair by a first graphic processor;

s2, calculating a loss function of the sample face pair according to the feature vector pair;

s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs;

s4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor so that the second graphics processor can perform back propagation through the gradient, adjust model parameters, and share the adjusted model parameters to the first graphics processor;

step S3 specifically includes:

adding the feature vector pairs of the sample face pairs into a feature vector pool;

the loss functions corresponding to the eigenvector pairs in the eigenvector pool are arranged into a loss function matrix, and then sampling is carried out in the loss function matrix through a preset sampling kernel to obtain the target sample set; the sampling step comprises: when training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a first sampling method to obtain a target sampling set containing hard positive and negative sample face pairs in current iteration; during each iteration in the training process, whether the second sampling method is switched to is judged, and the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to the second sampling method for sampling; wherein e% is a preset percentage threshold; or when the value of the largest loss function corresponding to the feature vector pair in the feature vector pool or the value of the largest loss function matrix is smaller than f, switching to the second sampling method for sampling; wherein f is more than 0 and less than 0.5.

2. The online hard positive-negative sample mining method according to claim 1, wherein in step S2, when the sample face pair is a binary group, the loss function is:

wherein y is a label of the sample face pair, and when y =0, it indicates that the sample face pair is a negative sample pair, and when y =1, it indicates that the sample face pair is a positive sample pair; d _a,b The measurement distance of a face image a and b is a binary group, and the space distance between the characteristic vectors of the face image a and b is represented; beta is a preset threshold value used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or measuring whether the measurement distance of the binary group under the negative sample pair is large enough; (beta-d) _a,b ) ₊ Represents a change function of the form max (0, beta-d) _a,b ) When d is _a,b Max (0, beta-d) when beta is not less than beta _a,b ) The value is 0; when d is _a,b Less than beta, max (0, beta-d) _a,b ) Value of beta-d _a,b ；

When the sample face pairs are triplets, the loss function is:

L ₂ ＝d _a,b +(d _a,b -d _a,c +β) ₊

the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; d _a,c Is the measurement distance between the face images a and c; (d) _a,b -d _a,c +β) ₊ Represents a change function of the form max (0,d) _a,b -d _a,c + β, when (d) _a,b -d _a,c When + beta) is greater than 0, the value is (d) _a,b -d _a,c + β), otherwise, 0.

3. The method for mining hard positive and negative samples on line according to claim 1, wherein the step of arranging the loss functions corresponding to the feature vector pairs in the feature vector pool into a loss function matrix comprises the following steps:

calculating the average value and the standard deviation of the loss function corresponding to the feature vector pairs in the feature vector pool;

controlling the distribution of the loss functions corresponding to the eigenvectors in the eigenvector pool according to the average value and the standard deviation to enable the loss functions to be in accordance with Gaussian distribution and cover all loss functions higher than a first threshold;

a loss function conforming to a gaussian distribution is used to construct a matrix of loss functions.

4. The hard positive-negative sample online mining method according to claim 1, wherein the first sampling method comprises A1-A2:

a1, selecting a sampling area in the loss function matrix by using the preset sampling core, and calculating the sum of loss functions in the sampling area;

a2, selecting sample face pairs corresponding to TopN loss functions with the largest values from the loss functions of the sampling area according to an integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set;

the second sampling method includes steps B1 to B2:

b1, selecting a sampling area in the loss function matrix by using the preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area;

and B2, selecting sample face pairs corresponding to TopM loss functions with the maximum value from the loss functions of the sampling area according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.

5. The method for mining hard positive and negative samples on line as claimed in claim 4, wherein the weighted sum w of the loss functions in the sample area is determined in B1 by the following formula:

wherein J is the current iteration number, T _j Is the number of pairs of feature vectors in the feature vector pool in the current iteration, L _t For the loss function corresponding to the T-th eigenvector pair in the current iteration, T _i Is the total number of sample face pairs, L, in the target sample set in the last iteration _k Is the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.

6. The online hard positive and negative sample mining method according to claim 4, further comprising:

slicing the loss function matrix to obtain a slice matrix with a preset size;

superposing the slice matrixes to obtain a three-dimensional loss function matrix A, B and C, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrix comprises A, B loss functions, and C represents the number of the slice matrixes;

and performing sampling on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling kernel formed by overlapping C preset sampling kernels by using the first sampling method and the second sampling method to obtain the target sample set.

7. The hard positive-negative sample online mining method according to claim 4, wherein in the second sampling method, the preset sampling kernel is randomly masked, and the random masking comprises:

firstly, the maximum loss function in the current iteration sampling area is obtained as a mask value, and then a position is randomly selected from a sampling core to be masked by using the mask value, so that the loss function at the masked position is output as the mask value in the sampling process.

8. The method for mining hard positive and negative samples on line as claimed in claim 1, wherein the step S4 shares the adjusted model parameters to the first graphic processor, and comprises:

calculating the difference value between the model parameter adjusted by the second graphic processor in the current iteration and the model parameter of the first graphic processor;

and transmitting the difference value to the first graphics processor to be added with the model parameter of the first graphics processor, so that the model parameter adjusted by the second graphics processor is shared to the first graphics processor.

9. A difficult positive and negative sample online excavating device is used for a training process of a face recognition model and is characterized by comprising a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphic processor;

according to the loss function, hard positive and negative sample mining is carried out on the sample face pairs to obtain a target sample set containing the hard positive and negative sample face pairs, and the method specifically comprises the following steps:

adding the feature vector pair of the sample face pair into a feature vector pool; the loss functions corresponding to the eigenvector pairs in the eigenvector pool are arranged into a loss function matrix, and then sampling is carried out in the loss function matrix through a preset sampling kernel to obtain the target sample set; the sampling step comprises: when training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a first sampling method to obtain a target sampling set containing hard positive and negative sample face pairs in current iteration; during each iteration in the training process, whether the second sampling method is switched to is judged, and the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to the second sampling method for sampling; wherein e% is a preset percentage threshold; or when the value of the largest loss function corresponding to the feature vector pair in the feature vector pool or the value of the largest loss function matrix is smaller than f, switching to the second sampling method for sampling; wherein f is more than 0 and less than 0.5.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of the hard positive and negative sample online mining method according to any one of claims 1 to 8.

11. A face recognition method comprising a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method according to any one of claims 1 to 8.