CN112215163B

CN112215163B - Weighted post-processing method applied to face detection prediction frame

Info

Publication number: CN112215163B
Application number: CN202011092870.8A
Authority: CN
Inventors: 朱海明; 瞿洪桂; 孙家乐
Original assignee: Beijing Sinonet Science and Technology Co Ltd
Current assignee: Beijing Sinonet Science and Technology Co Ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-05-25
Anticipated expiration: 2040-10-13
Also published as: CN112215163A

Abstract

The invention relates to the field of face detection, and discloses a weighted post-processing method applied to a face detection prediction frame, which comprises the steps of utilizing a face detector to carry out face detection on a picture to be detected; deleting the prediction box with the confidence coefficient lower than the first confidence coefficient threshold value; calculating generalized intersection ratios between the prediction frame with the highest confidence coefficient and the updated prediction frames; screening n prediction frames with the generalized intersection ratio larger than an intersection ratio threshold value from the updated prediction frames; obtaining a weighted prediction frame corresponding to the prediction frame with the highest confidence coefficient; deleting n prediction boxes; and obtaining a weighted post-processing result of the face detector. The invention utilizes more effective prediction data of the face detector, so that the position information and the confidence information of the deleted prediction frame are effectively utilized at the same time, the position of the output prediction frame is better corrected, the generalized cross-over ratio is introduced, and the final output position precision of the face detector is improved.

Description

Weighted post-processing method applied to face detection prediction frame

Technical Field

The invention relates to the field of face detection, in particular to a weighted post-processing method applied to a face detection prediction frame.

Background

With the wide application of the deep learning technology, the technologies of face recognition, face beautification and the like are more and more mature, and face detection is the first step in the application scenes, and the precision of the face detection has a great influence on the precision of other follow-up algorithms. In a face detection algorithm, a model predicts a plurality of coordinates which may be face frames and corresponding confidence levels, and the prediction frames cannot be used as final output, so that repeated prediction frames and prediction frames with low confidence levels need to be filtered out.

In the existing methods, a non-maximum value suppression (NMS) method is mostly adopted, and an intersection ratio (IoU) between a prediction frame with the highest confidence degree in model prediction frames and other prediction frames is calculated, the prediction frame with IoU being larger than a set threshold value is directly deleted, IoU calculated by the prediction frame with the highest confidence degree and other prediction frames is selected, the prediction frame with the highest confidence degree being larger than the set threshold value is deleted IoU again, and the steps are repeated until all the prediction frames are processed. In the calculation mode, only the prediction frame with the highest confidence coefficient of the current picture is considered, and other prediction frames which are calculated by IoU and have higher coincidence degree with the prediction frame are deleted, and the deleted prediction frames also contain the position information of the human face, so that more effective data are lost. IoU cannot distinguish different alignment modes between two prediction frames when used as a calculation mode of coincidence degree, and cannot accurately reflect the coincidence degree of the two prediction frames. In the final output result of the method, only one piece of face uses the information of one prediction frame, more effective data is not utilized, and the position precision of the finally output face frame is not high.

Disclosure of Invention

The invention provides a weighted post-processing method applied to a face detection prediction frame, thereby solving the problems in the prior art.

A weighting post-processing method applied to a face detection prediction frame comprises the following steps:

s1) collecting a picture to be detected, and performing face detection on the picture to be detected by using a face detector to obtain all prediction frames corresponding to the picture to be detected, wherein each prediction frame corresponds to a confidence coefficient;

s2), setting a first confidence threshold conf _ thresh1, deleting the prediction boxes with confidence lower than the first confidence threshold conf _ thresh1 from all the prediction boxes, and obtaining a plurality of prediction boxes with confidence higher than or equal to the first confidence threshold conf _ thresh 1;

s3) updating a plurality of prediction frames with confidence degrees higher than or equal to a first confidence degree threshold conf _ thresh1 to obtain a plurality of updated prediction frames, sequencing the plurality of updated prediction frames according to the sequence of the confidence degrees from high to low to obtain the prediction frame with the highest current confidence degree in the plurality of updated prediction frames, wherein the confidence degree of the prediction frame with the highest current confidence degree is Max _ conf, and calculating the generalized intersection and ratio between the prediction frame with the highest current confidence degree and the plurality of updated prediction frames;

s4) setting an intersection-to-parallel ratio threshold value iou _ thresh, screening n prediction frames with the generalized intersection-to-parallel ratio being larger than the intersection-to-parallel ratio threshold value iou _ thresh from the updated prediction frames, and obtaining position information and confidence degrees of the n prediction frames;

s5) obtaining a weighted prediction frame corresponding to the prediction frame with the highest confidence coefficient according to the position information and the confidence coefficient of the n prediction frames, and storing the information of the weighted prediction frame, wherein the information of the weighted prediction frame comprises the position information and the confidence coefficient of the weighted prediction frame;

s6) deleting n prediction boxes from among the prediction boxes having the confidence level higher than or equal to the first confidence level threshold conf _ thresh 1;

s7), determining whether all the prediction frames with confidence higher than or equal to the first confidence threshold conf _ thresh1 have been deleted, if yes, proceeding to step S8); if not, returning to the step S3);

s8), obtaining a plurality of weighted prediction frames, setting a second confidence threshold conf _ thresh2, and obtaining a weighted post-processing result of the face detector according to the second confidence threshold conf _ thresh 2.

In step S7), if the confidence levels of the prediction frames higher than or equal to the first confidence level threshold conf _ thresh1 are not all deleted, the process returns to step S3), the prediction frames with the confidence levels higher than or equal to the first confidence level threshold conf _ thresh1 are updated, that is, the prediction frames with the confidence levels higher than or equal to the first confidence level threshold conf _ thresh1 after deleting the n prediction frames are used as the updated prediction frames, the process loops from step S3) to step S7), the prediction frames with the confidence levels higher than or equal to the first confidence level threshold conf _ thresh1 are continuously updated, a part of the prediction frames are deleted while obtaining the weighted prediction frames, and finally the prediction frames with the confidence levels higher than or equal to the first confidence level threshold conf _ thresh1 are all deleted while obtaining the weighted prediction frames.

Further, in step S3), calculating generalized intersection ratios between the prediction boxes with the highest current confidence and the updated prediction boxes respectively, including the following steps:

s31) recording the prediction box with the highest current confidence coefficient as A, and recording the ith prediction box in the updated prediction boxes as B_i；

S32) calculating the prediction box A and the ith prediction box B with the highest current confidence_iThe intersection area A ≈ B between them_i；

S33) calculating the prediction box A and the ith prediction box B with the highest current confidence_iArea of union between A and B_i；

S34) according to the intersection area A ≈ B_iAnd union area Ab-_iObtaining the prediction frame A and the ith prediction frame B with the highest current confidence_iGeneralized cross-over ratio between

C represents the prediction frame A with the highest current confidence coefficient and the ith prediction frame B_iThe minimum envelope rectangular area.

Further, in step S5), a weighted prediction frame corresponding to the prediction frame with the highest current confidence is obtained according to the position information and the confidence of the n prediction frames, and information of the weighted prediction frame is stored, where the information of the weighted prediction frame includes the position information and the confidence of the weighted prediction frame, and the method includes the following steps:

s51) position information of the n prediction frames

Correspondingly multiplying the confidence degrees of the n prediction frames to obtain a position information matrix of the n prediction frames multiplied by the confidence degrees

The position information of the nth prediction box is [ X1 ]_nY1_n X2_n Y2_n]，X1_n、Y1_nX coordinate value of the top left corner and y coordinate value of the top left corner of the nth prediction box, X2_n、Y2_nThe coordinate values of the lower right corner x and the lower right corner y of the nth prediction frame are respectively; conf_nConfidence for the nth prediction box;

s52) adding the position information matrixes of the n prediction frames multiplied by the confidence level in the step S51) row by row to obtain the position information of the prediction frames added row by row[X1 Y1 X2 Y2]X1 ═ conf, the coordinate value of the top left corner X of the prediction frame added row by row₁*X1₁+conf₂*X1₂+...+conf_n*X1_nY1 (conf) is the coordinate value of the top left corner of the prediction frame added row by row₁*Y1₁+conf₂*Y1₂+...+conf_n*Y1_nThe X-coordinate value X2 ═ conf at the bottom right corner of the prediction frame after line addition₁*X2₁+conf₂*X2₂+...+conf_n*X2_nThe Y-coordinate value Y2 ═ conf at the bottom right corner of the prediction frame after line addition₁*Y2₁+conf₂*Y2₂+...+conf_n*Y2_n；

S53) calculating confidence sums conf ═ conf of the n prediction frames₁+conf₂+...+conf_n；

S54) obtaining the weighted prediction frame corresponding to the prediction frame with the highest current confidence coefficient, and storing the information of the weighted prediction frame

Taking the confidence coefficient Max _ conf of the prediction frame with the highest current confidence coefficient as the confidence coefficient of the weighted prediction frame corresponding to the prediction frame with the highest current confidence coefficient, wherein the information of the weighted prediction frame comprises the position information of the weighted prediction frame

And the confidence Max _ conf of the weighted prediction box.

Further, in step S8), the weighted prediction result of the face detector is the position information and the confidence of several weighted prediction frames with confidence degrees greater than the second confidence degree threshold conf _ thresh2 and several weighted prediction frames with confidence degrees greater than the second confidence degree threshold conf _ thresh 2.

The invention has the beneficial effects that: compared with the NMS algorithm, the method utilizes more effective prediction data of the face detector, uses all the weight of the prediction frames with the highest confidence coefficient and the coincidence degree of the prediction frames with the highest confidence coefficient higher than a certain confidence coefficient threshold value as the final output (namely the weighted prediction frame) corresponding to the prediction frame with the highest confidence coefficient, and does not only use one prediction frame with the highest confidence coefficient and delete other coincident prediction frames, so that the position information of the deleted prediction frames is effectively utilized; when the weight processing of the prediction frames is carried out at the same time, the position information of each prediction frame is multiplied by the confidence coefficient of the prediction frame, the confidence coefficient information of the prediction frames is effectively utilized, and the output position of the prediction frame is better corrected; and a generalized intersection ratio GIoU which can more accurately evaluate the contact ratio of the two prediction frames is introduced, the generalized intersection ratio GIoU is used as a calculation mode of the contact ratio of the prediction frame with the highest confidence coefficient and other prediction frames, the selected data is more accurate, and the position accuracy finally output by the face detector is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a weighting post-processing method applied to a face detection prediction frame according to a first embodiment of the present invention.

Fig. 2 is a schematic diagram of a minimum bounding rectangle area provided in the first embodiment.

Fig. 3 is a test result diagram comparing with other face detection algorithms according to the first embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

In a first embodiment, a weighted post-processing method applied to a face detection prediction frame, as shown in fig. 1, includes the following steps:

s3) updating a plurality of prediction frames with confidence degrees higher than or equal to a first confidence degree threshold conf _ thresh1, obtaining a plurality of updated prediction frames, sequencing the plurality of updated prediction frames according to the sequence of the confidence degrees from high to low, obtaining the prediction frame with the highest current confidence degree in the plurality of updated prediction frames, wherein the confidence degree of the prediction frame with the highest current confidence degree is Max _ conf, and calculating the generalized intersection and comparison between the prediction frame with the highest current confidence degree and the plurality of updated prediction frames respectively, wherein the method comprises the following steps:

s31) recording the prediction box with the highest current confidence coefficient as A, and recording the ith prediction box in other prediction boxes as B_i；

C represents the prediction frame A with the highest current confidence coefficient and the ith prediction frame B_iThe minimum envelope rectangular area (see fig. 2).

s5) obtaining a weighted prediction frame corresponding to the prediction frame with the highest current confidence according to the position information and the confidence of the n prediction frames, and storing information of the weighted prediction frame, where the information of the weighted prediction frame includes the position information and the confidence of the weighted prediction frame, and the method includes the following steps:

s51) position information of the n prediction frames

s52) adding the position information matrixes of the n prediction frames multiplied by the confidence degrees in S51) row by row to obtain the position information [ X1Y 1X 2Y 2 of the prediction frames added row by row]X1 ═ conf, the coordinate value of the top left corner X of the prediction frame added row by row₁*X1₁+conf₂*X1₂+...+conf_n*X1_nY1 (conf) is the coordinate value of the top left corner of the prediction frame added row by row₁*Y1₁+conf₂*Y1₂+...+conf_n*Y1_nThe X-coordinate value X2 ═ conf at the bottom right corner of the prediction frame after line addition₁*X2₁+conf₂*X2₂+...+conf_n*X2_nThe Y-coordinate value Y2 ═ conf at the bottom right corner of the prediction frame after line addition₁*Y2₁+conf₂*Y2₂+...+conf_n*Y2_n；

S54) obtaining the weighted prediction frame corresponding to the prediction frame with the highest current confidence coefficient, and saving the information of the weighted prediction frame

And the confidence Max _ conf of the weighted prediction box.

In step S8), the weighted post-processing results of the face detector are position information and confidence of several weighted prediction frames with confidence degrees greater than the second confidence threshold conf _ thresh2 and several weighted prediction frames with confidence degrees greater than the second confidence threshold conf _ thresh 2.

Compared with other face detection algorithms, the technical scheme adopted by the embodiment is optimal in accuracy, less increase in running time is caused, and specific test results are shown in fig. 3. In FIG. 3, other various face detection algorithms include NMS algorithm, soft-NMS algorithm, and-NMS (GIoU) algorithm, and-NMS (DIoU) algorithm, weighted-NMS (GIoU) algorithm, weighted-NMS (DIoU) algorithm, DIoU-NMS algorithm, and GIoU-NMS algorithm; taking the intersection ratio threshold value iou _ thresh of other face detection algorithms except the soft-NMS algorithm to be 0.5; confidence threshold 2 is a second confidence threshold conf _ thresh 2; the AP (average precision) reflects the accuracy of the prediction box, with greater numbers being more accurate. The technical scheme adopted by the invention is a weighted-NMS (GIoU) algorithm, and it can be seen that the APs of the technical scheme adopted by the invention (namely the weighted-NMS (GIoU) algorithm) are the highest under different confidence thresholds, and the improvement is the largest compared with the NMS algorithm.

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

compared with the NMS algorithm, the method utilizes more effective prediction data of the face detector, uses all the weight of the prediction frames with the highest confidence coefficient and the coincidence degree of the prediction frames with the highest confidence coefficient higher than a certain confidence coefficient threshold value as the final output (namely the weighted prediction frame) corresponding to the prediction frame with the highest confidence coefficient, and does not only use one prediction frame with the highest confidence coefficient and delete other coincident prediction frames, so that the position information of the deleted prediction frames is effectively utilized; when the weight processing of the prediction frames is carried out at the same time, the position information of each prediction frame is multiplied by the confidence coefficient of the prediction frame, the confidence coefficient information of the prediction frames is effectively utilized, and the output position of the prediction frame is better corrected; and a generalized intersection ratio GIoU which can more accurately evaluate the contact ratio of the two prediction frames is introduced, the generalized intersection ratio GIoU is used as a calculation mode of the contact ratio of the prediction frame with the highest confidence coefficient and other prediction frames, the selected data is more accurate, and the position accuracy finally output by the face detector is further improved.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A weighted post-processing method applied to a face detection prediction frame is characterized by comprising the following steps:

s3) updating a plurality of prediction frames with confidence degrees higher than or equal to the first confidence degree threshold conf _ thresh1 to obtain a plurality of updated prediction frames, sequencing the plurality of updated prediction frames according to the sequence of the confidence degrees from high to low to obtain the prediction frame with the highest current confidence degree in the plurality of updated prediction frames, wherein the confidence degree of the prediction frame with the highest current confidence degree is Max _ conf, and calculating the generalized intersection-parallel ratio between the prediction frame with the highest current confidence degree and the plurality of updated prediction frames;

s4) setting an intersection-to-parallel ratio threshold value iou _ thresh, screening n prediction frames with the generalized intersection-to-parallel ratio being larger than the intersection-to-parallel ratio threshold value iou _ thresh from the updated prediction frames, and acquiring the position information and the confidence coefficient of the n prediction frames;

s5) obtaining a weighted prediction frame corresponding to the prediction frame with the highest confidence coefficient according to the position information and the confidence coefficient of the n prediction frames, and storing the information of the weighted prediction frame, wherein the information of the weighted prediction frame comprises the position information of the weighted prediction frame and the confidence coefficient of the weighted prediction frame; the method comprises the following steps:

s51) the position information of the n prediction frames

The position information of the nth prediction box is [ X1 ]_n Y1_nX2_n Y2_n]，X1_n、Y1_nAre respectively asThe X-coordinate value of the upper left corner and the y-coordinate value of the upper left corner of the nth prediction box, X2_n、Y2_nThe coordinate values of the lower right corner x and the lower right corner y of the nth prediction frame are respectively; conf_nConfidence for the nth prediction box;

s52) adding the position information matrixes of the n prediction frames multiplied by the confidence level in the step S51) row by row to obtain the position information [ X1Y 1X 2Y 2 of the prediction frames added row by row]X1 ═ conf, the coordinate value of the top left corner X of the prediction frame added row by row₁*X1₁+conf₂*X1₂+...+conf_n*X1_nY1 (conf) is the coordinate value of the top left corner of the prediction frame added row by row₁*Y1₁+conf₂*Y1₂+...+conf_n*Y1_nThe X-coordinate value X2 ═ conf at the bottom right corner of the prediction frame after line addition₁*X2₁+conf₂*X2₂+...+conf_n*X2_nThe Y-coordinate value Y2 ═ conf at the bottom right corner of the prediction frame after line addition₁*Y2₁+conf₂*Y2₂+...+conf_n*Y2_n；

And a confidence Max _ conf of the weighted prediction box;

s6) deleting the n prediction boxes from among the number of prediction boxes whose confidence is higher than or equal to the first confidence threshold conf _ thresh 1;

s8), obtaining a plurality of weighted prediction frames, setting a second confidence coefficient threshold conf _ thresh2, and obtaining a weighted post-processing result of the face detector according to the second confidence coefficient threshold conf _ thresh 2.

2. The method for weighted post-processing of a face detection prediction box according to claim 1, wherein in step S3), the method for calculating the generalized intersection-parallel ratio between the prediction box with the highest current confidence coefficient and the updated prediction boxes comprises the following steps:

S32) calculating the prediction box A and the ith prediction box B with the highest current confidence_iThe intersection area between the two is A and B_i；

S34) according to the intersection area A ^ B_iAnd union area Ab-_iObtaining the prediction frame A and the ith prediction frame B with the highest current confidence_iGeneralized cross-over ratio between

3. The method for weighted post-processing applied to a face detection prediction box according to claim 1, wherein in step S8), the result of weighted post-processing of the face detector is position information and confidence of several weighted prediction boxes with confidence degrees greater than the second confidence threshold conf _ thresh2 and several weighted prediction boxes with confidence degrees greater than the second confidence threshold conf _ thresh 2.