CN112465799A

CN112465799A - Optimization of object detector and object detection

Info

Publication number: CN112465799A
Application number: CN202011449435.6A
Authority: CN
Inventors: 杨帆; 王瀚洋; 胡建国; 白立群; 陈凯琪
Original assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Current assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-03-09

Abstract

The invention provides an optimization of an object detector and object detection, comprising: marking a target marking frame and ignoring the marking frame; determining a positive sample and a negative sample based on the intersection ratio of the target labeling box and the anchor; in the negative samples, based on the fact that the area ratio of the neglected labeling box to the anchor is larger than or equal to a set threshold value, setting the corresponding negative samples as the neglected samples; the area ratio of the neglected labeling box to the anchor is the area intersection of the neglected labeling box and the anchor/the anchor area; processing each anchor and any neglected detection frame according to the determination mode of the neglected samples, defining all neglected samples, and removing the neglected samples from the negative samples to be used as corrected negative samples; and training the target detector based on the positive samples and the modified negative sample classification. According to the invention, the negative samples with poor quality are removed, and the positive samples and the negative samples with neglected samples removed from the negative samples are used for classification training, so that false detection is reduced, and the recall rate and the detection precision are improved.

Description

Optimization of object detector and object detection

Technical Field

The invention relates to the technical field of pedestrian recognition, in particular to optimization of a detector for a difficult target, and specifically relates to optimization of a target detector and a target detection method and system.

Background

In the image processing based on deep learning, the false detection rate of target detection is one of the key problems, and a common method for solving the false detection problem at present is to perform detector training optimization based on an online hard-case mining method. However, because the data set used in the training process is the sample set data labeled in advance, the process of labeling data has certain subjectivity, so that the data has a factor of unclean, and the cleanness of the training data directly affects the performance of the detector training.

However, some unavoidable problems do face in the actual target labeling, for example, when people are labeled in a dense crowd, part of the pedestrians are seriously shielded and only leak out of the head or the leg below the knee. So still label it as pedestrian's frame at the same time, will make the positive sample become complicated; if not labeled, these leaked human torso will become negative samples, which will also have an adverse effect on the training effect. However, the difficult target blocks in training, whether they are involved in training as positive or negative samples, can negatively impact detector training.

In other scenes, for example, the quality is poor or the difficult target cannot be distinguished due to severe occlusion, motion blur, too small, and the like, the above problem still exists in the detection of the difficult target.

The current solutions to this problem include the following: repeatedly and strictly cleaning the data set; marking a target frame with poor quality, coding the target frame on an original image, and deleting information of the target frame in the original image; these unmarked low quality objects are ignored and are not marked.

However, in the optimization process, the labor cost and time for strictly and repeatedly cleaning data are high, the local information and authenticity are damaged by coding, the coding is premised on that the position needing coding needs to be correctly marked and manual operation is also needed, and the marking method for making difficult-example targets usually does not process the difficult examples at present, so that the precision of the detector is undoubtedly reduced.

Taking an anchor-base detection algorithm as an example, in a series of detection algorithms such as fast RCNN, SSD, YOLO, etc., a distribution rule of anchors is set by self-definition, and when an intersection ratio (IOU) of the anchors and the mark box is greater than or equal to a set threshold, the sample is a positive sample, otherwise, the sample is a negative sample. If some objects are omitted for reasons of difficulty in labeling one by one, severe occlusion, motion blur, too small and the like due to high target density during labeling, anchors matched with the objects during training become negative samples. The partial negative samples participating in the training have high similarity with the positive samples, which directly results in low precision of the training of the detector. Therefore, it can be seen that the difficult target with poor quality in the graph, whether it participates in training as a positive sample or a negative sample, affects the training effect of the classifier.

Prior art documents:

patent document 1: CN111598175A is a detector training optimization method based on an online hard case mining mode.

Disclosure of Invention

The invention aims to provide a method and a system for optimizing a detector for a difficult example target and detecting the target, wherein the difficult example with poor quality is discarded as a neglected sample under a certain condition, and the neglected sample does not participate in the training of positive and negative samples, so that the detection precision of the trained target detector is improved, and the false detection is reduced.

To achieve the above object, a first aspect of the present invention provides a method for optimizing a detector for difficult-to-sample targets, including:

a method for optimizing an object detector, comprising the steps of:

marking a target marking frame and a neglecting marking frame, wherein the target marking frame and the neglecting marking frame are rectangular frames;

determining a positive sample and a negative sample based on the intersection ratio of the target labeling box and the anchor;

in the negative samples, based on the fact that the area ratio of the neglected labeling box to the anchor is larger than or equal to a set threshold value, setting the corresponding negative samples as the neglected samples; the area ratio of the neglected labeling box to the anchor is the area intersection of the neglected labeling box and the anchor/the anchor area;

processing each anchor and any neglected detection frame according to the determination mode of the neglected samples, defining all neglected samples, and removing the neglected samples from the negative samples to be used as corrected negative samples; and

the target detector is trained based on the positive samples and the modified negative sample classification.

Wherein the ignore mark box is a single object or a plurality of objects.

Wherein, the negative sample difficult case mining adopts an OHEM algorithm.

In the training process of the classification training of the target detector, all positive samples and negative samples mined from the corrected negative samples are difficult to participate in the classification training, and only all positive samples are used for participating in regression training.

In the training process of the classification training of the target detector, the classification Loss is calculated on the basis of the positive sample and the corrected negative sample, and the parameters of the classification training process are reversely adjusted according to the classification Loss.

According to a second aspect of the present invention, there is also provided an optimization system of a target detector, comprising:

the marking module is used for marking a target marking frame and a neglecting marking frame, wherein the target marking frame and the neglecting marking frame are rectangular frames;

the positive and negative sample determining module is used for determining a positive sample and a negative sample based on the intersection ratio of the target labeling box and the anchor;

the ignored sample determination module is used for setting the corresponding negative sample as the ignored sample in the negative sample based on the condition that the area ratio of the ignored labeling box to the anchor is greater than or equal to the set threshold value; the area ratio of the neglected labeling box to the anchor is the area intersection of the neglected labeling box and the anchor/the anchor area;

the negative sample correction module is used for processing each anchor and any neglected detection frame according to the judgment mode of the neglected samples, defining all neglected samples, and removing the neglected samples from the negative samples to be used as corrected negative samples; and

and the target detector training module is used for training the target detector based on the positive samples and the modified negative samples in a classified mode.

According to a third aspect of the present invention, there is provided an optimization system for a target detector, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising a procedure of an optimization method of the target detector.

According to a fourth aspect of the present invention, there is also provided a computer-readable medium storing software comprising instructions executable by one or more computers, the instructions causing the one or more computers to perform operations by such execution, the operations comprising a process of performing an optimization method of the target detector.

According to the fifth aspect of the present invention, there is also provided an object detection method, wherein the object detector is trained by using the aforementioned optimization method of the object detector, wherein in the training process of the classification training of the object detector, all positive samples and negative samples mined from the corrected negative samples are involved in the training, and only all positive samples are involved in the regression training;

and performing target detection on the input scene picture by using a target detector obtained by training.

According to the implementation scheme of the invention, in the optimization scheme of the target detector provided by the invention, aiming at the difficult target, the training process of the detector is optimized, neglected samples are added except for positive and negative samples when anchors are matched, the neglected samples are selected from the negative samples according to a certain rule, and the neglected samples do not participate in the training of the positive and negative samples. Poor-quality samples are discarded under certain conditions and do not participate in training, so that false detection is reduced, and recall rate and detection accuracy are improved.

Meanwhile, in the implementation process of the scheme of the invention, the positive and negative samples can still be determined by using the existing mode, and only the negative sample meeting the condition of ignoring the sample is the true ignoring sample and is removed as a low-quality sample which is difficult to be ignored. I.e. choosing to ignore samples does not affect the selection of positive samples.

It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent. In addition, all combinations of claimed subject matter are considered a part of the presently disclosed subject matter.

The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.

Drawings

The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flow chart illustrating optimization of a target detector according to an exemplary embodiment of the present invention.

Fig. 2-4 are schematic diagrams of pedestrian labeling in accordance with exemplary embodiments of the present invention.

FIG. 5 is a schematic diagram of an optimization system of a target detector of an exemplary embodiment of the invention.

Detailed Description

In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.

In this disclosure, aspects of the present invention are described with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. Embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be appreciated that the various concepts and embodiments described above, as well as those described in greater detail below, may be implemented in any of numerous ways, as the disclosed concepts and embodiments are not limited to any one implementation. In addition, some aspects of the present disclosure may be used alone, or in any suitable combination with other aspects of the present disclosure.

According to the purpose of the present invention, the following embodiments of the present invention are directed to solve the problems of training and target detection of a classification detector of a difficult target in the prior art, and provide an optimization method for ignoring samples on the basis of the traditional positive and negative sample training, so as to change the problems of detection accuracy and false detection rate caused by using positive and negative samples defined based on an anchor in the training process of the original target detector and directly training on the basis.

According to the invention, the difficult example with poor quality is removed from the negative sample according to a certain condition, and the difficult example with poor quality is not classified into the positive sample or the negative sample to cause adverse effect on the detector in the training process, so that the difficult example with poor quality, such as poor quality or indistinguishable difficult example target formed due to serious shielding, motion blur, too small and the like, is removed from the negative sample in the optimization process, in the training process of the target detector, the positive sample and the negative sample data with omitted samples removed from the negative sample are used for participating in the training, and the low-quality difficult example target does not participate in the training, thereby reducing false detection, and improving recall rate and detection precision.

The optimization procedure for a target detector in connection with the exemplary embodiment of the invention shown in fig. 1 generally includes the following processes:

marking a target marking frame and a neglecting marking frame, wherein the target marking frame and the neglecting marking frame are rectangular frames; 2-4, ignore a callout box as containing a single target or multiple targets;

In conjunction with the figures, in fig. 2-4, for example, with pedestrian detection (i.e., pedestrian as a detection target), the preset pedestrian category ID is 1, and the ignore sample category ID is-1.

Marking all pedestrians in the graph, if the pedestrians are clear and the number of the shielded limbs is less than half, framing the pedestrians out and enabling the ID to be 1; when the occluded limbs are more than half, and the pedestrians are too dense and small to label one by one, framing out one by one or a plurality of the occluded limbs, and setting the ID to-1. The specific marking rule can be self-defined. The original pictures of the legends shown in fig. 2-4 are from an open source data set CrowdHuman, and on the basis of the pictures, we label not only a pedestrian labeling frame (pedestrian frame for short) but also an ignoring labeling frame (ignoring frame for short). For the visual marking effect, the black frame in the legend is a pedestrian frame, the white frame is an ignoring frame, and the ignoring frame is marked with a single or a plurality of difficult-case pedestrians.

Optionally, the pre-training image preprocessing may generally use random cropping to improve data diversity, and if only neglected frames (that is, the number of non-neglected frames of pedestrians is equal to 0) remain in the image after random cropping, the image is re-cropped until the number of frames of regular pedestrians is greater than 0, and the next operation is continued.

With reference to the figure, next positive and negative samples are obtained by anchor matching. In embodiments of the present invention, positive and negative samples may be obtained based on existing means.

As an embodiment, the method specifically includes the following steps:

respectively calculating the intersection-parallel ratio IOU of the pedestrian marking box A and all the custom distributed anchors B, wherein the IOU has the following calculation formula:

if the maximum intersection ratio of each anchor and all pedestrian marking frames is greater than or equal to a preset threshold value, the anchor is matched with the pedestrian marking frame corresponding to the maximum intersection ratio, and the anchor is a positive sample; conversely, the anchor is a negative sample.

Then, the area ratio α of the neglect labeling box C to all anchors B is calculated, respectively, and the calculation formula is:

anchor B may be a custom distributed anchor.

Wherein the area ratio alpha is in the range of [0,1 ]. The larger alpha is, the higher the degree of the anchor enclosed by the neglected marking box is; when α is 1, the anchor is completely contained by the omitted box, i.e., the anchor is completely in the white box in the above legend.

When α is 0, the anchor does not intersect the ignore box, i.e., the anchor is outside the white box.

If the area ratio alpha of each anchor to any neglected frame is greater than or equal to a preset threshold value, the anchor accords with the condition of ignoring the sample, and meanwhile, if the anchor also meets the judgment condition of the negative sample, the anchor is made to be the ignoring sample.

Therefore, the neglected samples are added on the basis of the positive and negative samples, the defining mode of the positive and negative samples is not changed, and only the negative samples meeting the neglected sample condition are the true neglected samples, namely the neglected samples are selected without influencing the selection of the positive samples.

Therefore, the negative samples are corrected according to the method, each anchor and any neglected detection frame are processed according to the judging method of the neglected samples, all the neglected samples are defined, the neglected samples are removed from the negative samples and then serve as corrected negative samples, so that positive samples, the neglected samples and the corrected negative samples are defined, and based on the positive samples and the corrected negative samples, the target detector is classified and trained according to the positive samples and the corrected negative samples.

Particularly preferably, in the training process of the classification training of the target detector, all positive samples and negative samples mined from the corrected negative samples are difficult to participate in the training, wherein only all positive samples are used for participating in the regression training. The difficult case mining of negative samples can be realized based on an OHEM mode.

In the classification training process of the present invention, the classification and regression training algorithm can be implemented by using the existing algorithm, and the implementation of the present invention is not limited thereto. Optionally, the classification uses a softmax loss function, and the classification uses a smoothL1 loss function.

In the training process of the classification training of the target detector, the classification Loss is calculated on the basis of the positive sample and the corrected negative sample, and the parameters of the classification training process are adjusted reversely according to the classification Loss.

In the current end-to-end target detection algorithm training, a difficult-case target is directly trained without being labeled, and when positive and negative samples are matched, a negative sample is mixed with a plurality of difficult-case samples which are actually positive cases and are difficult to distinguish, the probability of mining the negative sample in difficult-case mining is extremely high, and the poor classification effect of a detector is finally caused due to the unclean negative sample. This is one of the bottlenecks in controlling false detection and increasing the recall rate of the detection algorithm, especially in a densely distributed scenario.

The invention provides that the neglected samples are newly added during sample matching, the difficultly sample is divided into the neglected samples, the problems can be solved, the difficultly sample can cause adverse effects on the detector during training no matter the difficultly sample is divided into the positive sample or the negative sample, and the detection precision is improved by directly neglecting.

Next we tested the optimization of the target detector based on the above example in comparison to the prior art.

30000 training pictures are used in the experiment, 759363 pedestrian frames are used in total, and 11374 neglected frames are used in total; the test set contained a total of 5000 pictures, which contained 190115 valid pedestrian boxes. The same training set and test set are used in the experiment, and a common training algorithm directly obtains positive and negative samples based on the IOU anchor matching and trains without processing a neglected box when carrying out classification training.

When the method is used for testing, samples are ignored for low-quality targets with serious shielding (only head leakage), excessive density, undersize and the like; whether to ignore the sample is detected or not does not affect the final result. The formula for calculating the false detection rate FPR in the table is as follows:

FPR is false detection number FP/total number of test pictures

Wherein FP is the number of false detections occurring at present, and the total number of the test pictures is 5000;

the recall ratio TPR is calculated by the formula:

TPR ═ number of recalls TP/total number of positive samples

Wherein TP is the number of currently detected positive samples, and the total number of positive samples is 190115.

The experimental results show that: the detection algorithm of the training mode of the invention is used to make the false detection more controllable. Through a reasonable mechanism, the difficult samples are removed from the negative samples, so that a more accurate classifier at a training position is facilitated, and false detection is reduced.

Based on the optimization of the target detector of fig. 1 and the above embodiments, the invention can also be implemented in the following way.

Optimization system of target detector

Referring to fig. 5, an optimization system of a target detector according to an exemplary embodiment of the present invention includes:

Wherein the ignore mark box is a single object or a plurality of objects.

In the training process of the classification training of the target detector, all positive samples and negative samples mined from the corrected negative samples are difficult to participate in the training, and only all positive samples are used for participating in the regression training.

Optimization system of target detector

The optimization system of the target detector according to the embodiment of the invention comprises:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising procedures to perform the optimization method of the target detector of any of the preceding embodiments.

Computer readable medium

A computer-readable medium storing software according to an embodiment of the present invention, the software including instructions executable by one or more computers, the instructions causing the one or more computers to perform operations including a process of performing the optimization method of the target detector of any one of the foregoing embodiments by such execution.

Target detection method

According to the target detection method of the embodiment of the invention, the target detector is trained by adopting the optimization method of the target detector of the previous embodiment, wherein all positive samples and negative samples mined from the corrected negative samples are difficult to participate in training, and only all positive samples are used for participating in regression training;

During the training process of the classification training of the target detector, the difficult case excavation in the negative sample of the OHEM algorithm is adopted.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims

1. A method for optimizing a target detector, comprising the steps of:

2. The method of claim 1, wherein the ignore mark box is a box containing a single target or a plurality of targets.

3. The method for optimizing an object detector according to claim 1, wherein during the training process of the classification training of the object detector, all positive samples and negative samples mined from the modified negative samples are difficult to participate in the classification training, and only all positive samples are used for participating in the regression training.

4. The method of optimizing a target detector of claim 3, wherein the negative sample hard case mining employs an OHEM algorithm.

5. The method as claimed in claim 4, wherein the target detector is trained by calculating a class Loss based on the positive samples and the modified negative samples, and adjusting the parameters of the class training process in a backward direction.

6. An optimization system for a target detector, comprising:

7. The system of claim 6, wherein the ignore tag box is a box containing a single target or multiple targets.

8. The system of claim 6, wherein the training process of the classification training of the target detector involves training with all positive samples and negative samples mined from the modified negative samples, and wherein only all positive samples are used to participate in the regression training.

9. The system as claimed in claim 8, wherein the target detector is trained by calculating a class Loss based on the positive samples and the modified negative samples, and adjusting the class training process parameters in a backward direction according to the calculated class Loss.

10. An optimization system for a target detector, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising procedures of the optimization method of the target detector of any one of claims 1-5.

11. A computer-readable medium storing software, the software comprising instructions executable by one or more computers, the instructions causing the one or more computers to perform operations by such execution, the operations comprising performing the procedures of the optimization method of an object detector of any one of claims 1-5.

12. An object detection method, characterized in that, the object detector is trained by the optimization method of the object detector as claimed in claim 1, wherein, in the training process of the classification training of the object detector, all positive samples and negative samples mined from the corrected negative samples are used for training difficultly, and only all positive samples are used for training regression;