CN110909794A

CN110909794A - Target detection system suitable for embedded equipment

Info

Publication number: CN110909794A
Application number: CN201911153078.6A
Authority: CN
Inventors: 叶杭杨
Original assignee: Espressif Systems Shanghai Co Ltd
Current assignee: Espressif Systems Shanghai Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-03-24
Anticipated expiration: 2039-11-22
Also published as: CN110909794B; US20220398835A1; WO2021098831A1

Abstract

The invention provides a target detection system suitable for embedded equipment, which comprises the embedded equipment and a server; the target detection logic running on the embedded equipment consists of a multilayer shared basic network, a private basic network and a detection module; the parameters of the shared basic network directly come from the output of the upper layer; the image is processed by the shared basic network and the private basic network to obtain a characteristic diagram, and then the characteristic diagram is processed by the detection module, and the result merging module merges and outputs a target detection result. The target detection system also comprises a model online self-calibration system, the embedded equipment uploads the collected samples to the server at variable time, and the server marks the samples in an automatic and manual mode, trains the model and updates the model to the embedded equipment. The target detection system can obtain good performance on the embedded equipment, and utilizes the large target detection model on the server to complete automatic labeling, thereby reducing the workload and completing model correction more efficiently.

Description

Target detection system suitable for embedded equipment

Technical Field

The invention relates to the field of target detection and online correction of embedded equipment, in particular to a target detection system suitable for the embedded equipment.

Background

The current mainstream method for target detection is based on deep learning. The deep learning method also shows better effect than the traditional method, but has some defects in practical application:

1. huge computational load, and needs to be accelerated by a professional chip (GPU). This is particularly disadvantageous for mobile devices, especially embedded devices.

2. The model parameters are large in quantity, and a large amount of storage space is occupied. It is extremely disadvantageous for embedded devices with resource scarcity.

Therefore, such a network can only be deployed on a server, and the terminal device calls an interface of the server through the network to achieve a target detection function. Once the network is masked, all functions cannot be implemented.

In order to realize off-line target detection on terminal equipment and get rid of network constraint well, the simplest method is as follows: and simplifying the model to obtain a small network model to realize target detection. Although the small network model can reduce the detection model and reduce the number of parameters and the calculation amount at the same time, so that the realization of the off-line target detection in the embedded device is possible, the network structure has limited expression capability and cannot adapt to all background states. For example, in the experimental process, it is found that the detection rate is obviously reduced when the small network model performs target detection in a dark environment.

In addition, when a small network model is trained, missing detection is easy to occur when pictures shot by a camera are inconsistent with a training set (color saturation, exposure, sharpness and the like). The solution is to learn by using the picture actually collected by the camera. However, the establishment of the actual data training set consumes a lot of manpower and material resources, and the period is long. The data set is too small, and the trained network has no generalization.

Disclosure of Invention

The invention aims to provide a target detection system which has good expressive force and can use an actual training set to carry out effective model training correction for embedded equipment, and mainly solves the problems in the prior art. In order to achieve the above object, the technical solution adopted by the present invention is to provide a target detection system suitable for an embedded device, which is characterized by comprising an embedded device; running local service logic and target detection logic on the embedded equipment;

the target detection logic is composed of a multi-layer structure containing a plurality of branch modules and a result merging module; the branch module consists of a shared basic network, a private basic network and a detection module; the shared basic network of the first layer of the branch module receives a target detection input image; except the first layer of the branch module, the parameters of the shared basic network of the other branch modules are directly from the output of the shared basic network of the upper layer; the output of the shared infrastructure network is used as the input of the private infrastructure network; the private basic network outputs a characteristic diagram as the input of the detection module; the output of the detection module is the output of the single-layer branching module; the result merging module merges the output of each layer of the branch module and outputs a target detection result;

and the local service logic takes the target detection result as input and further completes service by utilizing the target detection result.

Further, the shared basic network is formed by stacking a plurality of basic network blocks; in the shared basic network of the branch module of the first layer, the basic network block of the first layer is a CNN network block, and the rest basic network blocks are MobileNet network blocks; in the shared basic network of the branch module of other layers, all the basic network blocks are MobileNet network blocks; in the shared basic network, the number of the MobileNet network blocks is dynamically increased and decreased along with the target difficulty.

Further, the private basic network is formed by stacking a plurality of MobileNet network blocks, and the number of the MobileNet network blocks is dynamically increased or decreased along with the expressive force; the parameters of the private base network are valid only for the currently branching module.

Further, the detection module divides the feature map into a first branch, a second branch and a third branch; the first branch is composed of one MobileNet network block, the second branch is composed of 2 MobileNet network blocks, and the third branch is composed of 3 MobileNet network blocks;

after the feature diagram passes through the first branch and the third branch, the number of feature dimensions is unchanged; after the characteristic diagram passes through the second branch circuit, the number of characteristic dimensions is doubled; the detection module combines the feature maps of the first branch, the second branch and the third branch, and obtains a score, a detection frame and a key point through convolution as the output of the branch module of the current layer.

Further, the system also comprises a server and an online model self-calibration system; the model online self-calibration system comprises sample collection logic running on the embedded device, and a sample labeling module and a model correction module running on the server;

after the sample collection logic collects samples, the samples are stored in a sample library, and the sample library is uploaded to the server at variable time;

and the sample labeling module is used for labeling the images in the sample library to form a labeled sample library, then the labeled sample library is used for completing the calibration of model network parameters through the model correction module, and the calibrated model network parameters are issued and updated to the embedded equipment.

Further, the sample collection function of the sample collection logic is started in the form of timing trigger or service trigger; the triggered sample collection logic performs the following steps:

step 1.1, setting a detection result queue as empty;

step 1.2, acquiring a new frame of image, carrying out target detection, and simultaneously sending the image and the detection result of the image into the detection result queue;

step 1.3, scanning towards the tail direction of the queue by taking the image of which the last detection result is the detected object as a starting point in the detection result queue, and jumping to step 1.4 by taking the image as an end point if the image of which the next detection result is the detected object is encountered, or jumping to step 1.2;

step 1.4, counting the number Z of the images of which the detection result is 'no object detected' in the interval from the starting point to the end point in the step 1.3;

step 1.5, if Z is more than Z_thresholdThen go back to step 1.1. If Z is less than or equal to Z_thresholdAnd if so, extracting one frame from the Z frame image, storing the frame into the sample library, and terminating the sample collection.

Further, the sample collection logic has a limited capacity of the sample library of N, and when the number of existing samples in the sample library is greater than or equal to the limited capacity of N, a new sample replaces the oldest sample in the sample library;

and after receiving the sample library uploaded by the embedded equipment, the server deletes the repeated images in the sample library by calculating the similarity of the images in the sample library.

Further, the sample labeling work performed by the sample labeling module comprises the following steps:

step 2.1, extracting a pair of images from the sample library, and simultaneously sending the images into a plurality of super-large networks for target identification to obtain a target identification result;

2.2, calculating a difficulty coefficient lambda of the image by using the target identification result;

step 2.3, if the difficulty coefficient lambda corresponding to the image is less than or equal to the difficulty threshold lambda_thresholdClassifying the image into a second-level hard sample; for the second-level difficult sample, removing the image from the sample library, integrating the target identification results of a plurality of super-large networks, and putting the result into the labeled sample library after completing automatic labeling;

step 2.4, if the difficulty coefficient lambda corresponding to the image is larger than the difficulty threshold lambda_thresholdClassifying the image into a first-class hard sample; for the primary difficult sample, removing the image from the sample library, storing the image additionally, and manually marking the image; after manual labeling, putting the picture into the labeled sample library;

and 2.5, returning to the step 2.1 if unprocessed images exist in the sample library, otherwise, completing the sample labeling work.

Further, step 2.2 specifically comprises the sub-steps of:

step 2.2.1, selecting the target identification result of one oversized network as a reference result;

step 2.2.2, IoU of detection frames in the target identification results and detection frames in the reference results of other super-large networks is calculated;

step 2.2.3, for each super-large network, selecting IoU being the largest and IoU value being larger than threshold C from a plurality of output target identification results_thresholdThe target recognition result of (2) and the corresponding reference result are grouped together; the target recognition results which cannot be grouped are independently grouped;

step 2.2.4, calculating the difficulty coefficient lambda, wherein:

step 2.3 is expanded to the steps:

step 2.3.1, if the difficulty coefficient lambda corresponding to the image is less than or equal to the difficulty threshold lambda_thresholdClassifying the image into a second-level hard sample;

step 2.3.2, removing the image from the sample library;

and 2.3.3, discarding the corresponding independent grouped target identification results for the second-level difficult samples, calculating the average value of the detection frames in the target identification results of the non-independent groups, using the average value as a final sample label, and finishing automatic labeling.

Further, the operation of the model correction module comprises the steps of:

step 3.1, dividing the labeled sample library into an actual training set and an actual verification set; using a general sample obtained in the public as a public verification set;

step 3.2, calculating LOSS values of the original model in the public verification set and the actual verification set respectively;

3.3, dividing an actual training set into a plurality of groups, and taking the original model as a pre-training model;

step 3.4, selecting a group of data in the actual training set;

step 3.5, performing model training on the model before training to obtain a trained model;

step 3.6, calculating LOSS values of the trained model in the public verification set and the actual verification set respectively;

step 3.7, if the difference of LOSS values of the original model and the trained model in the public verification set is larger than a threshold value L_thresholdAnd the difference in LOSS values in the actual verification set is greater than a threshold I_thresholdSkipping to step 3.8, otherwise, entering step 3.9;

3.8, if the actual training set has data which do not participate in training, setting the model after training as a new model before training, and skipping to the step 3.4, otherwise, entering the step 3.9;

step 3.9, stopping training; and after the training is stopped, taking the network parameters of the trained model as the output of the model correction module.

The invention reduces the overall network parameters and the calculated amount by sharing parameters among the shared basic networks and the layers of the dynamically adjustable shared basic networks and the private basic networks.

The model correction system collects difficult samples encountered by the embedded equipment in the current environment, submits the difficult samples to the server at variable time, automatically labels the samples by using a large target detection model of the server, and trains and updates a network model of the embedded equipment by using the labeled samples.

In view of the above technical features, the present invention has the following advantages:

1. the method is not limited by the shortage of resources of the embedded equipment and the limited computing speed, and can still obtain good performance on the embedded equipment.

2. The sample library is not uploaded in real time, so that the network dependence of the embedded equipment is greatly reduced.

3. The automatic labeling of the large target detection model on the server reduces the workload of manual labeling.

4. The embedded equipment can update the model network parameters of the embedded equipment by using the results of the large target detection model on the server, and the model upgrading is completed more efficiently.

Drawings

FIG. 1 is a system block diagram of a preferred embodiment of the present invention;

FIG. 2 is a network architecture diagram of a deep learning network in accordance with a preferred embodiment of the present invention;

FIG. 3 is a diagram illustrating the architecture of a shared infrastructure network in accordance with a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of a detection module according to a preferred embodiment of the present invention;

FIG. 5 is a flow chart of sample collection logic in a preferred embodiment of the present invention;

FIG. 6 is a flow chart of a sample annotation module in accordance with a preferred embodiment of the present invention;

FIG. 7 is a diagram of an example grouping of the calculation difficulty coefficients in a preferred embodiment of the present invention;

FIG. 8 is a flow chart of the model correction module in a preferred embodiment of the present invention.

In the figure: 1-branch module, 1.1-shared basic network, 1.2-private basic network, 1.3-detection module, 2-result combination module, 3.1-network block, 3.2-optional network block, 4.1-first branch, 4.2-second branch, 4.3-third branch, 5-embedded device, 5.1-target detection logic, 5.2-local business logic, 5.3-sample collection logic, 6-server, 6.1-sample marking module, 6.2-model correction module, 7-sample library, 8-network model parameter, 9-Faster-RCNN network, 10-SSD network 10.

Detailed Description

The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Referring to fig. 1, a target detection system for an embedded device according to the present invention includes an embedded device 5 and a server 6. Running the teleservice logic on the server 6; target detection logic 5.1 and local service logic 5.2 are run on the embedded device 5. The target detection logic 5.1 comprises a deep learning network model.

The invention relates to a target detection system suitable for an embedded device, which further comprises a model online self-calibration system, and is used for solving the problem of reduced learning capability caused by the reduction of parameters for reducing the calculation amount of a small model. The online self-calibration system comprises a sample collection logic 5.3 running on the embedded device 5, a sample marking module 6.1 and a model correction module 6.2 running on the server 6;

on the embedded device 5, all actually acquired images enter the target detection logic 5.1, and the detection results of the target detection logic 5.1 are respectively sent to the local service logic 5.2 and the sample collection logic 5.3. The local service logic 5.2 completes the service related logic, the sample collection logic 5.3 is used as a part of the on-line self-calibration system, and the controlled collected sample is put into the sample base 7 to prepare for the subsequent calibration.

The samples in the sample library 7 may be transmitted to the server 6 by a variety of means, such as bluetooth, Wi-Fi, etc.

After the sample library 7 is uploaded to the server 6, repeated pictures are deleted by calculating the similarity between the pictures, and the result enters a sample labeling module 6.1. The marked samples are used as a training set and a test set, and enter a model correction module 6.2, so as to train new target detection network model parameters 8, and then the updated network model parameters 8 are deployed on the embedded device 5.

Referring to fig. 2, the deep learning network model in the target detection logic is composed of a multi-layer structure including a plurality of branch modules 1 and a result merging module 2. The network consists of several branching modules 1: m1 and M2 … … Mx. Each branching module 1 corresponds to one or more anchors. For example, the following design is made: (1) the number of branching modules is 2, namely M1 and M2; (2) m1 corresponds to an anchor size of 16 × 16; (3) m2 corresponds to two anchors (32X 32, 64X 56) and eventually this model can detect targets around the anchor size.

Each branching module 1 is in turn composed of three major components: a shared basic network 1.1, a private basic network 1.2 and a detection module 1.3.

1. The shared basic network 1.1 is formed by stacking MobileNet network blocks. MobileNet is a network architecture suitable for mobile devices that greatly reduces the amount of computation and parameters compared to CNN, while having the "scaling" characteristic of CNN. Wherein the shared basic network 1.1(backbone _1) of the first layer is designed differently from the shared basic network 1.1 of the other layers: to prevent MobileNet from losing too many features, the first layer network uses CNN.

The function of the shared underlying network 1.1 is mainly to determine the scaling of the tributary modules by stride. Taking the design of backbone _1 as an example, stride is multiplied by 8, i.e., the feature map obtained by the branching module is 1/8 of the original image in size. When the detected object is large, a large stride can be adopted, so that the size of the characteristic diagram can be rapidly reduced, and the parameter quantity and the calculated quantity are reduced.

The shallow shared basic network 1.1 shares parameters with the deep shared basic network 1.1, and the overall network parameters and the calculated amount are reduced. If the output of backbone _1 becomes the input of backbone _2, the output of backbone _2 becomes the input of backbone _3, and so on.

2. The private infrastructure network 1.2 is also stacked from MobileNet. Unlike the shared infrastructure 1.1, the parameters of the private infrastructure 1.2 are valid only for the current module and are not affected by other modules.

The private infrastructure network 1.2 can also be increased or decreased according to the actual detection effect. When the expressive force is too poor, the network layer can be properly added to improve the expressive force; when the expression is still available, the network can be reduced appropriately to increase the speed.

3. The detection module 1.3 improves the detection effect of the model by fusing the characteristic diagrams of different receptive fields.

And a result merging module 2 of the target detection logic gathers the detection frames predicted by all the branch modules, and eliminates redundant detection frames through NMS (network management system) to obtain a final prediction result.

Referring to fig. 3, the shared basic network is formed by stacking a plurality of network blocks 3.1, wherein the convolution corresponding to the dashed line frame is an optional network block 3.2. The selectable network blocks 3.2 can be increased or decreased according to the difficulty of the detected object. If the detected object is difficult to detect or has a lot of false detections, the optional network blocks 3.2 can be added; otherwise, subtract.

Referring to fig. 4, the input feature map enters from the input end of the detection module, and has information of C dimensions, and after entering the module, the feature map is divided into a first branch 4.1, a second branch 4.2, and a third branch 4.3. The dimension number of the feature diagram is increased from C to 2C after the feature diagram passes through 2 MobileNet modules on the second branch 4.2. The reception field of the second branch 4.2 is between the upper and lower branches, and the number of dimensions is increased to make the reception field become main characteristic information. The characteristics of the first branch 4.1 and the third branch 4.3 serve as side information. And finally, connecting the information of the three branches together to form a new characteristic diagram. And respectively obtaining the fraction and the detection frame by different 1 × 1 convolutions of the new feature graph, and if the requirement of the key point exists, adding one 1 × 1 convolution to obtain the key point.

Referring to fig. 5, the sample collection logic running on the embedded device is triggered by a custom condition. For example, the sample collection logic may be triggered periodically and started every hour, or may be triggered by a service, for example, the device is performing face entry, and at this time, if a picture of "no object detected" appears, there is a high probability that detection is missed, the sample collection logic is started. A workflow of a sample collection logic, comprising the steps of:

step 501, the sample collection logic is triggered.

Step 502, sending each frame detection result into a detection result queue, and calculating the number Z of continuous failure frames, specifically including:

step 502.1, starting with the last object detection;

step 502.2, recording the number of frames without detecting the object;

and step 502.3, counting the total number of frames without the detected object after the object detection is finished next time.

Step 503, setting threshold value Z_thresholdWhen Z is greater than Z_thresholdJudging that the Z frame picture does not have an object, and ending the sample collection logic; when Z is less than Z_thresholdIf yes, the Z-frame picture is judged to be the missed object, and the process proceeds to step 504.

And step 504, extracting 1 frame from the Z frames which are missed to be detected.

And step 505, storing the frame picture into a sample library, and ending the sample collection logic.

Wherein the size of the sample library is limited and when the limit is exceeded, the new sample replaces the oldest sample. The freshness of the sample data and not occupying too much storage resources (which can better reflect the recent environmental conditions) are ensured.

Referring to fig. 6, a sample labeling module running on a server automatically labels or manually labels each frame of image in a collected sample library, and the specific steps are as follows:

601, enabling each frame of image in a sample library to enter a sample labeling module;

step 602, the image sample is sent to a plurality of super large networks, such as YOLO, SSD, fast-RCNN, etc.

Step 603, obtaining results L respectively₁、L₂To L_X。

Step 604, synthesize results (L) of multiple super large networks₁、L₂To L_X) And calculating an image difficulty coefficient lambda.

Step 605, if the difficulty coefficient lambda is less than or equal to the difficulty threshold lambda_thresholdStep 606 is entered; if the difficulty coefficient lambda is larger than the difficulty threshold lambda_thresholdStep 608 is entered.

And 606, integrating the target recognition results of the plurality of the super-large networks to finish automatic annotation of the image.

And step 607, classifying the image into a second-level difficult sample, putting the second-level difficult sample into a labeled sample library, and entering step 610.

And 608, submitting manual processing to finish manual annotation of the image.

And step 609, classifying the image into a first-class difficulty sample, and putting the first-class difficulty sample into a labeled sample library.

Step 610, forming a data set.

Therefore, the method can realize rapid collection of the difficult-to-sample data set and simultaneously ensure the correctness of sample labeling. And finally, the data set simultaneously contains image samples with automatic labeling and manual labeling.

In step 604, the specific process of calculating the sample difficulty coefficient is to group the samples first and then obtain the result according to the grouping information. Wherein the step of grouping comprises:

and 701, obtaining a target identification result of each super large network.

Step 702, selecting the target identification result of one of the super large networks as a reference group (i.e. each detection frame is used as a reference detection frame of one group), and classifying the target identification results of the remaining super large networks into a to-be-classified state.

And 703, selecting a super large network to be classified, taking a target identification result, and calculating IoU values between a plurality of detection frames and a reference detection frame.

Step 704, selecting IoU the detection frame with the largest value from the plurality of detection frames to be classified. If the value of IoU for this detection box is greater than the threshold value C_thresholdAnd the current detection frame is programmed into the group where the reference detection frame is located. The detection boxes that fail to be grouped are each grouped.

Step 705, if there is a non-processed very large network, go to step 703. Otherwise, ending.

An example of a specific grouping is shown in fig. 7. In this example, the results of the Faster-RCNN network 9 are taken as the reference group. IoU of the detection frame 1 of the SSD network 10 and the detection frames 1 to 5 of the Faster-RCNN network 9 are calculated respectively, and finally, IoU of the detection frame 2 of the Faster-RCNN network 9 is found to be the largest and larger than C_thresholdThen, the detection box 1 of the SSD network 10 and the detection box 2 of the Faster-RCNN network 9 are grouped, and so on. The detection blocks 5 of the SSD network 10 are not grouped and are therefore independently grouped.

After grouping is completed, counting the number of detection frames of each group, and recording as N₁To N_k. The difficulty factor λ is calculated from the following equation:

wherein

The number of the super large networks. Taking fig. 7 as an example, λ can be obtained as 0.1.

In step 606, the automatic labeling of the image is performed by first discarding the independently grouped detection frames, and then using the average of the non-independently grouped detection frames as the final label of the image sample.

The expression is as follows:

wherein

And x, y, w and h respectively represent the abscissa and the ordinate of the upper left corner of the detection frame, and the width and the height of the detection frame.

Referring to fig. 8, the labeled samples are used to fine tune the original model to adapt to the current environment. And dividing a data set generated by the marked sample into an actual training set and an actual verification set, and taking the public data set as a public verification set. The training data is in minimum units of batch.

The correction process comprises the steps of:

step 801, preparing an original model (the model after the last correction, if the model is the original model after the first correction), and calculating the Loss value L of the original model on the public verification set and the actual verification set₀And I₀。

Step 802, prepare an actual training set of batch, and proceed to step 803. If all samples in the actual training set have been traversed, the training is stopped, and the process jumps to step 806.

Step 803, training is started.

And step 804, after each batch training, calculating the Loss values, L and I of the trained model on the public verification set and the actual verification set.

Step 805, if L₀-L>L_ThresholdAnd I₀-I>I_ThresholdConsidering as one-time effective training, updating the network parameters of the model, and jumping to step 801; otherwise, the iteration is stopped and step 806 is entered.

And step 806, finishing correction and generating new model network data.

On the embedded device, the initial model establishment for the first time adopts an open-source data set. The open source data set generally covers various scenes, and the richness is high. The model trained by the data can be relatively averaged to adapt to each scene. This initial model would be deployed to the device first. In the service operation process, the embedded equipment utilizes the model online self-calibration system to update image samples to the server at variable time, model network parameters corrected by the online self-calibration system are sent back to the embedded equipment by the server through means of Bluetooth, Wi-Fi and the like, and the network parameters in the equipment are updated.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A target detection system suitable for embedded equipment is characterized by comprising the embedded equipment; running local service logic and target detection logic on the embedded equipment;

2. The object detection system of claim 1, wherein the shared underlying network is stacked from a plurality of underlying network blocks; in the shared basic network of the branch module of the first layer, the basic network block of the first layer is a CNN network block, and the rest basic network blocks are MobileNet network blocks; in the shared basic network of the branch module of other layers, all the basic network blocks are MobileNet network blocks; in the shared basic network, the number of the MobileNet network blocks is dynamically increased and decreased along with the target difficulty.

3. The object detection system of claim 1, wherein the private base network is formed by stacking a plurality of MobileNet network blocks, the number of MobileNet network blocks dynamically increasing or decreasing with expressiveness; the parameters of the private base network are valid only for the currently branching module.

4. The object detection system of claim 1, wherein the detection module divides the signature graph into a first branch, a second branch, and a third branch; the first branch is composed of one MobileNet network block, the second branch is composed of 2 MobileNet network blocks, and the third branch is composed of 3 MobileNet network blocks;

5. The object detection system of claim 1, further comprising a server and a model online self-calibration system; the model online self-calibration system comprises sample collection logic running on the embedded device, and a sample labeling module and a model correction module running on the server;

6. The object detection system of claim 5, wherein the sample collection function of the sample collection logic is initiated in the form of a timing trigger or a traffic trigger; the triggered sample collection logic performs the following steps:

step 1.1, setting a detection result queue as empty;

7. The object detection system of claim 5, wherein the sample collection logic has a defined capacity of the sample bank of N, and when the number of existing samples of the sample bank is greater than or equal to the defined capacity of N, a new sample replaces the oldest sample in the sample bank;

8. The object detection system of claim 5, wherein the sample labeling module performs sample labeling operations comprising the steps of:

9. The object detection system of claim 8,

step 2.2 specifically comprises the substeps of:

step 2.2.4, calculating the difficulty coefficient lambda, wherein:

step 2.3 is expanded to the steps:

step 2.3.2, removing the image from the sample library;

10. The object detection system of claim 5, wherein the model correction module is operative to include the steps of:

step 3.4, selecting a group of data in the actual training set;