CN110909794A - Target detection system suitable for embedded equipment - Google Patents

Target detection system suitable for embedded equipment Download PDF

Info

Publication number
CN110909794A
CN110909794A CN201911153078.6A CN201911153078A CN110909794A CN 110909794 A CN110909794 A CN 110909794A CN 201911153078 A CN201911153078 A CN 201911153078A CN 110909794 A CN110909794 A CN 110909794A
Authority
CN
China
Prior art keywords
sample
network
model
module
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911153078.6A
Other languages
Chinese (zh)
Other versions
CN110909794B (en
Inventor
叶杭杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Espressif Systems Shanghai Co Ltd
Original Assignee
Espressif Systems Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Espressif Systems Shanghai Co Ltd filed Critical Espressif Systems Shanghai Co Ltd
Priority to CN201911153078.6A priority Critical patent/CN110909794B/en
Publication of CN110909794A publication Critical patent/CN110909794A/en
Priority to US17/778,788 priority patent/US20220398835A1/en
Priority to PCT/CN2020/130499 priority patent/WO2021098831A1/en
Application granted granted Critical
Publication of CN110909794B publication Critical patent/CN110909794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Stored Programmes (AREA)
  • Train Traffic Observation, Control, And Security (AREA)

Abstract

The invention provides a target detection system suitable for embedded equipment, which comprises the embedded equipment and a server; the target detection logic running on the embedded equipment consists of a multilayer shared basic network, a private basic network and a detection module; the parameters of the shared basic network directly come from the output of the upper layer; the image is processed by the shared basic network and the private basic network to obtain a characteristic diagram, and then the characteristic diagram is processed by the detection module, and the result merging module merges and outputs a target detection result. The target detection system also comprises a model online self-calibration system, the embedded equipment uploads the collected samples to the server at variable time, and the server marks the samples in an automatic and manual mode, trains the model and updates the model to the embedded equipment. The target detection system can obtain good performance on the embedded equipment, and utilizes the large target detection model on the server to complete automatic labeling, thereby reducing the workload and completing model correction more efficiently.

Description

Target detection system suitable for embedded equipment
Technical Field
The invention relates to the field of target detection and online correction of embedded equipment, in particular to a target detection system suitable for the embedded equipment.
Background
The current mainstream method for target detection is based on deep learning. The deep learning method also shows better effect than the traditional method, but has some defects in practical application:
1. huge computational load, and needs to be accelerated by a professional chip (GPU). This is particularly disadvantageous for mobile devices, especially embedded devices.
2. The model parameters are large in quantity, and a large amount of storage space is occupied. It is extremely disadvantageous for embedded devices with resource scarcity.
Therefore, such a network can only be deployed on a server, and the terminal device calls an interface of the server through the network to achieve a target detection function. Once the network is masked, all functions cannot be implemented.
In order to realize off-line target detection on terminal equipment and get rid of network constraint well, the simplest method is as follows: and simplifying the model to obtain a small network model to realize target detection. Although the small network model can reduce the detection model and reduce the number of parameters and the calculation amount at the same time, so that the realization of the off-line target detection in the embedded device is possible, the network structure has limited expression capability and cannot adapt to all background states. For example, in the experimental process, it is found that the detection rate is obviously reduced when the small network model performs target detection in a dark environment.
In addition, when a small network model is trained, missing detection is easy to occur when pictures shot by a camera are inconsistent with a training set (color saturation, exposure, sharpness and the like). The solution is to learn by using the picture actually collected by the camera. However, the establishment of the actual data training set consumes a lot of manpower and material resources, and the period is long. The data set is too small, and the trained network has no generalization.
Disclosure of Invention
The invention aims to provide a target detection system which has good expressive force and can use an actual training set to carry out effective model training correction for embedded equipment, and mainly solves the problems in the prior art. In order to achieve the above object, the technical solution adopted by the present invention is to provide a target detection system suitable for an embedded device, which is characterized by comprising an embedded device; running local service logic and target detection logic on the embedded equipment;
the target detection logic is composed of a multi-layer structure containing a plurality of branch modules and a result merging module; the branch module consists of a shared basic network, a private basic network and a detection module; the shared basic network of the first layer of the branch module receives a target detection input image; except the first layer of the branch module, the parameters of the shared basic network of the other branch modules are directly from the output of the shared basic network of the upper layer; the output of the shared infrastructure network is used as the input of the private infrastructure network; the private basic network outputs a characteristic diagram as the input of the detection module; the output of the detection module is the output of the single-layer branching module; the result merging module merges the output of each layer of the branch module and outputs a target detection result;
and the local service logic takes the target detection result as input and further completes service by utilizing the target detection result.
Further, the shared basic network is formed by stacking a plurality of basic network blocks; in the shared basic network of the branch module of the first layer, the basic network block of the first layer is a CNN network block, and the rest basic network blocks are MobileNet network blocks; in the shared basic network of the branch module of other layers, all the basic network blocks are MobileNet network blocks; in the shared basic network, the number of the MobileNet network blocks is dynamically increased and decreased along with the target difficulty.
Further, the private basic network is formed by stacking a plurality of MobileNet network blocks, and the number of the MobileNet network blocks is dynamically increased or decreased along with the expressive force; the parameters of the private base network are valid only for the currently branching module.
Further, the detection module divides the feature map into a first branch, a second branch and a third branch; the first branch is composed of one MobileNet network block, the second branch is composed of 2 MobileNet network blocks, and the third branch is composed of 3 MobileNet network blocks;
after the feature diagram passes through the first branch and the third branch, the number of feature dimensions is unchanged; after the characteristic diagram passes through the second branch circuit, the number of characteristic dimensions is doubled; the detection module combines the feature maps of the first branch, the second branch and the third branch, and obtains a score, a detection frame and a key point through convolution as the output of the branch module of the current layer.
Further, the system also comprises a server and an online model self-calibration system; the model online self-calibration system comprises sample collection logic running on the embedded device, and a sample labeling module and a model correction module running on the server;
after the sample collection logic collects samples, the samples are stored in a sample library, and the sample library is uploaded to the server at variable time;
and the sample labeling module is used for labeling the images in the sample library to form a labeled sample library, then the labeled sample library is used for completing the calibration of model network parameters through the model correction module, and the calibrated model network parameters are issued and updated to the embedded equipment.
Further, the sample collection function of the sample collection logic is started in the form of timing trigger or service trigger; the triggered sample collection logic performs the following steps:
step 1.1, setting a detection result queue as empty;
step 1.2, acquiring a new frame of image, carrying out target detection, and simultaneously sending the image and the detection result of the image into the detection result queue;
step 1.3, scanning towards the tail direction of the queue by taking the image of which the last detection result is the detected object as a starting point in the detection result queue, and jumping to step 1.4 by taking the image as an end point if the image of which the next detection result is the detected object is encountered, or jumping to step 1.2;
step 1.4, counting the number Z of the images of which the detection result is 'no object detected' in the interval from the starting point to the end point in the step 1.3;
step 1.5, if Z is more than ZthresholdThen go back to step 1.1. If Z is less than or equal to ZthresholdAnd if so, extracting one frame from the Z frame image, storing the frame into the sample library, and terminating the sample collection.
Further, the sample collection logic has a limited capacity of the sample library of N, and when the number of existing samples in the sample library is greater than or equal to the limited capacity of N, a new sample replaces the oldest sample in the sample library;
and after receiving the sample library uploaded by the embedded equipment, the server deletes the repeated images in the sample library by calculating the similarity of the images in the sample library.
Further, the sample labeling work performed by the sample labeling module comprises the following steps:
step 2.1, extracting a pair of images from the sample library, and simultaneously sending the images into a plurality of super-large networks for target identification to obtain a target identification result;
2.2, calculating a difficulty coefficient lambda of the image by using the target identification result;
step 2.3, if the difficulty coefficient lambda corresponding to the image is less than or equal to the difficulty threshold lambdathresholdClassifying the image into a second-level hard sample; for the second-level difficult sample, removing the image from the sample library, integrating the target identification results of a plurality of super-large networks, and putting the result into the labeled sample library after completing automatic labeling;
step 2.4, if the difficulty coefficient lambda corresponding to the image is larger than the difficulty threshold lambdathresholdClassifying the image into a first-class hard sample; for the primary difficult sample, removing the image from the sample library, storing the image additionally, and manually marking the image; after manual labeling, putting the picture into the labeled sample library;
and 2.5, returning to the step 2.1 if unprocessed images exist in the sample library, otherwise, completing the sample labeling work.
Further, step 2.2 specifically comprises the sub-steps of:
step 2.2.1, selecting the target identification result of one oversized network as a reference result;
step 2.2.2, IoU of detection frames in the target identification results and detection frames in the reference results of other super-large networks is calculated;
step 2.2.3, for each super-large network, selecting IoU being the largest and IoU value being larger than threshold C from a plurality of output target identification resultsthresholdThe target recognition result of (2) and the corresponding reference result are grouped together; the target recognition results which cannot be grouped are independently grouped;
step 2.2.4, calculating the difficulty coefficient lambda, wherein:
Figure BDA0002284089370000051
step 2.3 is expanded to the steps:
step 2.3.1, if the difficulty coefficient lambda corresponding to the image is less than or equal to the difficulty threshold lambdathresholdClassifying the image into a second-level hard sample;
step 2.3.2, removing the image from the sample library;
and 2.3.3, discarding the corresponding independent grouped target identification results for the second-level difficult samples, calculating the average value of the detection frames in the target identification results of the non-independent groups, using the average value as a final sample label, and finishing automatic labeling.
Further, the operation of the model correction module comprises the steps of:
step 3.1, dividing the labeled sample library into an actual training set and an actual verification set; using a general sample obtained in the public as a public verification set;
step 3.2, calculating LOSS values of the original model in the public verification set and the actual verification set respectively;
3.3, dividing an actual training set into a plurality of groups, and taking the original model as a pre-training model;
step 3.4, selecting a group of data in the actual training set;
step 3.5, performing model training on the model before training to obtain a trained model;
step 3.6, calculating LOSS values of the trained model in the public verification set and the actual verification set respectively;
step 3.7, if the difference of LOSS values of the original model and the trained model in the public verification set is larger than a threshold value LthresholdAnd the difference in LOSS values in the actual verification set is greater than a threshold IthresholdSkipping to step 3.8, otherwise, entering step 3.9;
3.8, if the actual training set has data which do not participate in training, setting the model after training as a new model before training, and skipping to the step 3.4, otherwise, entering the step 3.9;
step 3.9, stopping training; and after the training is stopped, taking the network parameters of the trained model as the output of the model correction module.
The invention reduces the overall network parameters and the calculated amount by sharing parameters among the shared basic networks and the layers of the dynamically adjustable shared basic networks and the private basic networks.
The model correction system collects difficult samples encountered by the embedded equipment in the current environment, submits the difficult samples to the server at variable time, automatically labels the samples by using a large target detection model of the server, and trains and updates a network model of the embedded equipment by using the labeled samples.
In view of the above technical features, the present invention has the following advantages:
1. the method is not limited by the shortage of resources of the embedded equipment and the limited computing speed, and can still obtain good performance on the embedded equipment.
2. The sample library is not uploaded in real time, so that the network dependence of the embedded equipment is greatly reduced.
3. The automatic labeling of the large target detection model on the server reduces the workload of manual labeling.
4. The embedded equipment can update the model network parameters of the embedded equipment by using the results of the large target detection model on the server, and the model upgrading is completed more efficiently.
Drawings
FIG. 1 is a system block diagram of a preferred embodiment of the present invention;
FIG. 2 is a network architecture diagram of a deep learning network in accordance with a preferred embodiment of the present invention;
FIG. 3 is a diagram illustrating the architecture of a shared infrastructure network in accordance with a preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of a detection module according to a preferred embodiment of the present invention;
FIG. 5 is a flow chart of sample collection logic in a preferred embodiment of the present invention;
FIG. 6 is a flow chart of a sample annotation module in accordance with a preferred embodiment of the present invention;
FIG. 7 is a diagram of an example grouping of the calculation difficulty coefficients in a preferred embodiment of the present invention;
FIG. 8 is a flow chart of the model correction module in a preferred embodiment of the present invention.
In the figure: 1-branch module, 1.1-shared basic network, 1.2-private basic network, 1.3-detection module, 2-result combination module, 3.1-network block, 3.2-optional network block, 4.1-first branch, 4.2-second branch, 4.3-third branch, 5-embedded device, 5.1-target detection logic, 5.2-local business logic, 5.3-sample collection logic, 6-server, 6.1-sample marking module, 6.2-model correction module, 7-sample library, 8-network model parameter, 9-Faster-RCNN network, 10-SSD network 10.
Detailed Description
The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
Referring to fig. 1, a target detection system for an embedded device according to the present invention includes an embedded device 5 and a server 6. Running the teleservice logic on the server 6; target detection logic 5.1 and local service logic 5.2 are run on the embedded device 5. The target detection logic 5.1 comprises a deep learning network model.
The invention relates to a target detection system suitable for an embedded device, which further comprises a model online self-calibration system, and is used for solving the problem of reduced learning capability caused by the reduction of parameters for reducing the calculation amount of a small model. The online self-calibration system comprises a sample collection logic 5.3 running on the embedded device 5, a sample marking module 6.1 and a model correction module 6.2 running on the server 6;
on the embedded device 5, all actually acquired images enter the target detection logic 5.1, and the detection results of the target detection logic 5.1 are respectively sent to the local service logic 5.2 and the sample collection logic 5.3. The local service logic 5.2 completes the service related logic, the sample collection logic 5.3 is used as a part of the on-line self-calibration system, and the controlled collected sample is put into the sample base 7 to prepare for the subsequent calibration.
The samples in the sample library 7 may be transmitted to the server 6 by a variety of means, such as bluetooth, Wi-Fi, etc.
After the sample library 7 is uploaded to the server 6, repeated pictures are deleted by calculating the similarity between the pictures, and the result enters a sample labeling module 6.1. The marked samples are used as a training set and a test set, and enter a model correction module 6.2, so as to train new target detection network model parameters 8, and then the updated network model parameters 8 are deployed on the embedded device 5.
Referring to fig. 2, the deep learning network model in the target detection logic is composed of a multi-layer structure including a plurality of branch modules 1 and a result merging module 2. The network consists of several branching modules 1: m1 and M2 … … Mx. Each branching module 1 corresponds to one or more anchors. For example, the following design is made: (1) the number of branching modules is 2, namely M1 and M2; (2) m1 corresponds to an anchor size of 16 × 16; (3) m2 corresponds to two anchors (32X 32, 64X 56) and eventually this model can detect targets around the anchor size.
Each branching module 1 is in turn composed of three major components: a shared basic network 1.1, a private basic network 1.2 and a detection module 1.3.
1. The shared basic network 1.1 is formed by stacking MobileNet network blocks. MobileNet is a network architecture suitable for mobile devices that greatly reduces the amount of computation and parameters compared to CNN, while having the "scaling" characteristic of CNN. Wherein the shared basic network 1.1(backbone _1) of the first layer is designed differently from the shared basic network 1.1 of the other layers: to prevent MobileNet from losing too many features, the first layer network uses CNN.
The function of the shared underlying network 1.1 is mainly to determine the scaling of the tributary modules by stride. Taking the design of backbone _1 as an example, stride is multiplied by 8, i.e., the feature map obtained by the branching module is 1/8 of the original image in size. When the detected object is large, a large stride can be adopted, so that the size of the characteristic diagram can be rapidly reduced, and the parameter quantity and the calculated quantity are reduced.
The shallow shared basic network 1.1 shares parameters with the deep shared basic network 1.1, and the overall network parameters and the calculated amount are reduced. If the output of backbone _1 becomes the input of backbone _2, the output of backbone _2 becomes the input of backbone _3, and so on.
2. The private infrastructure network 1.2 is also stacked from MobileNet. Unlike the shared infrastructure 1.1, the parameters of the private infrastructure 1.2 are valid only for the current module and are not affected by other modules.
The private infrastructure network 1.2 can also be increased or decreased according to the actual detection effect. When the expressive force is too poor, the network layer can be properly added to improve the expressive force; when the expression is still available, the network can be reduced appropriately to increase the speed.
3. The detection module 1.3 improves the detection effect of the model by fusing the characteristic diagrams of different receptive fields.
And a result merging module 2 of the target detection logic gathers the detection frames predicted by all the branch modules, and eliminates redundant detection frames through NMS (network management system) to obtain a final prediction result.
Referring to fig. 3, the shared basic network is formed by stacking a plurality of network blocks 3.1, wherein the convolution corresponding to the dashed line frame is an optional network block 3.2. The selectable network blocks 3.2 can be increased or decreased according to the difficulty of the detected object. If the detected object is difficult to detect or has a lot of false detections, the optional network blocks 3.2 can be added; otherwise, subtract.
Referring to fig. 4, the input feature map enters from the input end of the detection module, and has information of C dimensions, and after entering the module, the feature map is divided into a first branch 4.1, a second branch 4.2, and a third branch 4.3. The dimension number of the feature diagram is increased from C to 2C after the feature diagram passes through 2 MobileNet modules on the second branch 4.2. The reception field of the second branch 4.2 is between the upper and lower branches, and the number of dimensions is increased to make the reception field become main characteristic information. The characteristics of the first branch 4.1 and the third branch 4.3 serve as side information. And finally, connecting the information of the three branches together to form a new characteristic diagram. And respectively obtaining the fraction and the detection frame by different 1 × 1 convolutions of the new feature graph, and if the requirement of the key point exists, adding one 1 × 1 convolution to obtain the key point.
Referring to fig. 5, the sample collection logic running on the embedded device is triggered by a custom condition. For example, the sample collection logic may be triggered periodically and started every hour, or may be triggered by a service, for example, the device is performing face entry, and at this time, if a picture of "no object detected" appears, there is a high probability that detection is missed, the sample collection logic is started. A workflow of a sample collection logic, comprising the steps of:
step 501, the sample collection logic is triggered.
Step 502, sending each frame detection result into a detection result queue, and calculating the number Z of continuous failure frames, specifically including:
step 502.1, starting with the last object detection;
step 502.2, recording the number of frames without detecting the object;
and step 502.3, counting the total number of frames without the detected object after the object detection is finished next time.
Step 503, setting threshold value ZthresholdWhen Z is greater than ZthresholdJudging that the Z frame picture does not have an object, and ending the sample collection logic; when Z is less than ZthresholdIf yes, the Z-frame picture is judged to be the missed object, and the process proceeds to step 504.
And step 504, extracting 1 frame from the Z frames which are missed to be detected.
And step 505, storing the frame picture into a sample library, and ending the sample collection logic.
Wherein the size of the sample library is limited and when the limit is exceeded, the new sample replaces the oldest sample. The freshness of the sample data and not occupying too much storage resources (which can better reflect the recent environmental conditions) are ensured.
Referring to fig. 6, a sample labeling module running on a server automatically labels or manually labels each frame of image in a collected sample library, and the specific steps are as follows:
601, enabling each frame of image in a sample library to enter a sample labeling module;
step 602, the image sample is sent to a plurality of super large networks, such as YOLO, SSD, fast-RCNN, etc.
Step 603, obtaining results L respectively1、L2To LX
Step 604, synthesize results (L) of multiple super large networks1、L2To LX) And calculating an image difficulty coefficient lambda.
Step 605, if the difficulty coefficient lambda is less than or equal to the difficulty threshold lambdathresholdStep 606 is entered; if the difficulty coefficient lambda is larger than the difficulty threshold lambdathresholdStep 608 is entered.
And 606, integrating the target recognition results of the plurality of the super-large networks to finish automatic annotation of the image.
And step 607, classifying the image into a second-level difficult sample, putting the second-level difficult sample into a labeled sample library, and entering step 610.
And 608, submitting manual processing to finish manual annotation of the image.
And step 609, classifying the image into a first-class difficulty sample, and putting the first-class difficulty sample into a labeled sample library.
Step 610, forming a data set.
Therefore, the method can realize rapid collection of the difficult-to-sample data set and simultaneously ensure the correctness of sample labeling. And finally, the data set simultaneously contains image samples with automatic labeling and manual labeling.
In step 604, the specific process of calculating the sample difficulty coefficient is to group the samples first and then obtain the result according to the grouping information. Wherein the step of grouping comprises:
and 701, obtaining a target identification result of each super large network.
Step 702, selecting the target identification result of one of the super large networks as a reference group (i.e. each detection frame is used as a reference detection frame of one group), and classifying the target identification results of the remaining super large networks into a to-be-classified state.
And 703, selecting a super large network to be classified, taking a target identification result, and calculating IoU values between a plurality of detection frames and a reference detection frame.
Step 704, selecting IoU the detection frame with the largest value from the plurality of detection frames to be classified. If the value of IoU for this detection box is greater than the threshold value CthresholdAnd the current detection frame is programmed into the group where the reference detection frame is located. The detection boxes that fail to be grouped are each grouped.
Step 705, if there is a non-processed very large network, go to step 703. Otherwise, ending.
An example of a specific grouping is shown in fig. 7. In this example, the results of the Faster-RCNN network 9 are taken as the reference group. IoU of the detection frame 1 of the SSD network 10 and the detection frames 1 to 5 of the Faster-RCNN network 9 are calculated respectively, and finally, IoU of the detection frame 2 of the Faster-RCNN network 9 is found to be the largest and larger than CthresholdThen, the detection box 1 of the SSD network 10 and the detection box 2 of the Faster-RCNN network 9 are grouped, and so on. The detection blocks 5 of the SSD network 10 are not grouped and are therefore independently grouped.
After grouping is completed, counting the number of detection frames of each group, and recording as N1To Nk. The difficulty factor λ is calculated from the following equation:
Figure BDA0002284089370000141
wherein
Figure BDA0002284089370000142
The number of the super large networks. Taking fig. 7 as an example, λ can be obtained as 0.1.
In step 606, the automatic labeling of the image is performed by first discarding the independently grouped detection frames, and then using the average of the non-independently grouped detection frames as the final label of the image sample.
The expression is as follows:
Figure BDA0002284089370000143
wherein
Figure BDA0002284089370000144
And x, y, w and h respectively represent the abscissa and the ordinate of the upper left corner of the detection frame, and the width and the height of the detection frame.
Referring to fig. 8, the labeled samples are used to fine tune the original model to adapt to the current environment. And dividing a data set generated by the marked sample into an actual training set and an actual verification set, and taking the public data set as a public verification set. The training data is in minimum units of batch.
The correction process comprises the steps of:
step 801, preparing an original model (the model after the last correction, if the model is the original model after the first correction), and calculating the Loss value L of the original model on the public verification set and the actual verification set0And I0
Step 802, prepare an actual training set of batch, and proceed to step 803. If all samples in the actual training set have been traversed, the training is stopped, and the process jumps to step 806.
Step 803, training is started.
And step 804, after each batch training, calculating the Loss values, L and I of the trained model on the public verification set and the actual verification set.
Step 805, if L0-L>LThresholdAnd I0-I>IThresholdConsidering as one-time effective training, updating the network parameters of the model, and jumping to step 801; otherwise, the iteration is stopped and step 806 is entered.
And step 806, finishing correction and generating new model network data.
On the embedded device, the initial model establishment for the first time adopts an open-source data set. The open source data set generally covers various scenes, and the richness is high. The model trained by the data can be relatively averaged to adapt to each scene. This initial model would be deployed to the device first. In the service operation process, the embedded equipment utilizes the model online self-calibration system to update image samples to the server at variable time, model network parameters corrected by the online self-calibration system are sent back to the embedded equipment by the server through means of Bluetooth, Wi-Fi and the like, and the network parameters in the equipment are updated.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A target detection system suitable for embedded equipment is characterized by comprising the embedded equipment; running local service logic and target detection logic on the embedded equipment;
the target detection logic is composed of a multi-layer structure containing a plurality of branch modules and a result merging module; the branch module consists of a shared basic network, a private basic network and a detection module; the shared basic network of the first layer of the branch module receives a target detection input image; except the first layer of the branch module, the parameters of the shared basic network of the other branch modules are directly from the output of the shared basic network of the upper layer; the output of the shared infrastructure network is used as the input of the private infrastructure network; the private basic network outputs a characteristic diagram as the input of the detection module; the output of the detection module is the output of the single-layer branching module; the result merging module merges the output of each layer of the branch module and outputs a target detection result;
and the local service logic takes the target detection result as input and further completes service by utilizing the target detection result.
2. The object detection system of claim 1, wherein the shared underlying network is stacked from a plurality of underlying network blocks; in the shared basic network of the branch module of the first layer, the basic network block of the first layer is a CNN network block, and the rest basic network blocks are MobileNet network blocks; in the shared basic network of the branch module of other layers, all the basic network blocks are MobileNet network blocks; in the shared basic network, the number of the MobileNet network blocks is dynamically increased and decreased along with the target difficulty.
3. The object detection system of claim 1, wherein the private base network is formed by stacking a plurality of MobileNet network blocks, the number of MobileNet network blocks dynamically increasing or decreasing with expressiveness; the parameters of the private base network are valid only for the currently branching module.
4. The object detection system of claim 1, wherein the detection module divides the signature graph into a first branch, a second branch, and a third branch; the first branch is composed of one MobileNet network block, the second branch is composed of 2 MobileNet network blocks, and the third branch is composed of 3 MobileNet network blocks;
after the feature diagram passes through the first branch and the third branch, the number of feature dimensions is unchanged; after the characteristic diagram passes through the second branch circuit, the number of characteristic dimensions is doubled; the detection module combines the feature maps of the first branch, the second branch and the third branch, and obtains a score, a detection frame and a key point through convolution as the output of the branch module of the current layer.
5. The object detection system of claim 1, further comprising a server and a model online self-calibration system; the model online self-calibration system comprises sample collection logic running on the embedded device, and a sample labeling module and a model correction module running on the server;
after the sample collection logic collects samples, the samples are stored in a sample library, and the sample library is uploaded to the server at variable time;
and the sample labeling module is used for labeling the images in the sample library to form a labeled sample library, then the labeled sample library is used for completing the calibration of model network parameters through the model correction module, and the calibrated model network parameters are issued and updated to the embedded equipment.
6. The object detection system of claim 5, wherein the sample collection function of the sample collection logic is initiated in the form of a timing trigger or a traffic trigger; the triggered sample collection logic performs the following steps:
step 1.1, setting a detection result queue as empty;
step 1.2, acquiring a new frame of image, carrying out target detection, and simultaneously sending the image and the detection result of the image into the detection result queue;
step 1.3, scanning towards the tail direction of the queue by taking the image of which the last detection result is the detected object as a starting point in the detection result queue, and jumping to step 1.4 by taking the image as an end point if the image of which the next detection result is the detected object is encountered, or jumping to step 1.2;
step 1.4, counting the number Z of the images of which the detection result is 'no object detected' in the interval from the starting point to the end point in the step 1.3;
step 1.5, if Z is more than ZthresholdThen go back to step 1.1. If Z is less than or equal to ZthresholdAnd if so, extracting one frame from the Z frame image, storing the frame into the sample library, and terminating the sample collection.
7. The object detection system of claim 5, wherein the sample collection logic has a defined capacity of the sample bank of N, and when the number of existing samples of the sample bank is greater than or equal to the defined capacity of N, a new sample replaces the oldest sample in the sample bank;
and after receiving the sample library uploaded by the embedded equipment, the server deletes the repeated images in the sample library by calculating the similarity of the images in the sample library.
8. The object detection system of claim 5, wherein the sample labeling module performs sample labeling operations comprising the steps of:
step 2.1, extracting a pair of images from the sample library, and simultaneously sending the images into a plurality of super-large networks for target identification to obtain a target identification result;
2.2, calculating a difficulty coefficient lambda of the image by using the target identification result;
step 2.3, if the difficulty coefficient lambda corresponding to the image is less than or equal to the difficulty threshold lambdathresholdClassifying the image into a second-level hard sample; for the second-level difficult sample, removing the image from the sample library, integrating the target identification results of a plurality of super-large networks, and putting the result into the labeled sample library after completing automatic labeling;
step 2.4, if the difficulty coefficient lambda corresponding to the image is larger than the difficulty threshold lambdathresholdClassifying the image into a first-class hard sample; for the primary difficult sample, removing the image from the sample library, storing the image additionally, and manually marking the image; after manual labeling, putting the picture into the labeled sample library;
and 2.5, returning to the step 2.1 if unprocessed images exist in the sample library, otherwise, completing the sample labeling work.
9. The object detection system of claim 8,
step 2.2 specifically comprises the substeps of:
step 2.2.1, selecting the target identification result of one oversized network as a reference result;
step 2.2.2, IoU of detection frames in the target identification results and detection frames in the reference results of other super-large networks is calculated;
step 2.2.3, for each super-large network, selecting IoU being the largest and IoU value being larger than threshold C from a plurality of output target identification resultsthresholdThe target recognition result of (2) and the corresponding reference result are grouped together; the target recognition results which cannot be grouped are independently grouped;
step 2.2.4, calculating the difficulty coefficient lambda, wherein:
Figure FDA0002284089360000041
step 2.3 is expanded to the steps:
step 2.3.1, if the difficulty coefficient lambda corresponding to the image is less than or equal to the difficulty threshold lambdathresholdClassifying the image into a second-level hard sample;
step 2.3.2, removing the image from the sample library;
and 2.3.3, discarding the corresponding independent grouped target identification results for the second-level difficult samples, calculating the average value of the detection frames in the target identification results of the non-independent groups, using the average value as a final sample label, and finishing automatic labeling.
10. The object detection system of claim 5, wherein the model correction module is operative to include the steps of:
step 3.1, dividing the labeled sample library into an actual training set and an actual verification set; using a general sample obtained in the public as a public verification set;
step 3.2, calculating LOSS values of the original model in the public verification set and the actual verification set respectively;
3.3, dividing an actual training set into a plurality of groups, and taking the original model as a pre-training model;
step 3.4, selecting a group of data in the actual training set;
step 3.5, performing model training on the model before training to obtain a trained model;
step 3.6, calculating LOSS values of the trained model in the public verification set and the actual verification set respectively;
step 3.7, if the difference of LOSS values of the original model and the trained model in the public verification set is larger than a threshold value LthresholdAnd the difference in LOSS values in the actual verification set is greater than a threshold IthresholdSkipping to step 3.8, otherwise, entering step 3.9;
3.8, if the actual training set has data which do not participate in training, setting the model after training as a new model before training, and skipping to the step 3.4, otherwise, entering the step 3.9;
step 3.9, stopping training; and after the training is stopped, taking the network parameters of the trained model as the output of the model correction module.
CN201911153078.6A 2019-11-22 2019-11-22 Target detection system suitable for embedded equipment Active CN110909794B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201911153078.6A CN110909794B (en) 2019-11-22 2019-11-22 Target detection system suitable for embedded equipment
US17/778,788 US20220398835A1 (en) 2019-11-22 2020-11-20 Target detection system suitable for embedded device
PCT/CN2020/130499 WO2021098831A1 (en) 2019-11-22 2020-11-20 Target detection system suitable for embedded device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911153078.6A CN110909794B (en) 2019-11-22 2019-11-22 Target detection system suitable for embedded equipment

Publications (2)

Publication Number Publication Date
CN110909794A true CN110909794A (en) 2020-03-24
CN110909794B CN110909794B (en) 2022-09-13

Family

ID=69818851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911153078.6A Active CN110909794B (en) 2019-11-22 2019-11-22 Target detection system suitable for embedded equipment

Country Status (3)

Country Link
US (1) US20220398835A1 (en)
CN (1) CN110909794B (en)
WO (1) WO2021098831A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112118366A (en) * 2020-07-31 2020-12-22 中标慧安信息技术股份有限公司 Method and device for transmitting face picture data
CN112183558A (en) * 2020-09-30 2021-01-05 北京理工大学 Target detection and feature extraction integrated network based on YOLOv3
WO2021098831A1 (en) * 2019-11-22 2021-05-27 乐鑫信息科技(上海)股份有限公司 Target detection system suitable for embedded device
CN114913419A (en) * 2022-05-10 2022-08-16 西南石油大学 Intelligent parking target detection method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780358A (en) * 2021-08-16 2021-12-10 华北电力大学(保定) Real-time hardware fitting detection method based on anchor-free network
CN116188767B (en) * 2023-01-13 2023-09-08 湖北普罗格科技股份有限公司 Neural network-based stacked wood board counting method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073869A (en) * 2016-11-18 2018-05-25 法乐第(北京)网络科技有限公司 A kind of system of scene cut and detection of obstacles
CN108549852A (en) * 2018-03-28 2018-09-18 中山大学 Pedestrian detector's Auto-learning Method under special scenes based on the enhancing of depth network
CN108573238A (en) * 2018-04-23 2018-09-25 济南浪潮高新科技投资发展有限公司 A kind of vehicle checking method based on dual network structure
CN109145798A (en) * 2018-08-13 2019-01-04 浙江零跑科技有限公司 A kind of Driving Scene target identification and travelable region segmentation integrated approach
CN109919108A (en) * 2019-03-11 2019-06-21 西安电子科技大学 Remote sensing images fast target detection method based on depth Hash auxiliary network
CN110047069A (en) * 2019-04-22 2019-07-23 北京青燕祥云科技有限公司 A kind of image detection device
US10423860B1 (en) * 2019-01-22 2019-09-24 StradVision, Inc. Learning method and learning device for object detector based on CNN to be used for multi-camera or surround view monitoring using image concatenation and target object merging network, and testing method and testing device using the same

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330439B (en) * 2017-07-14 2022-11-04 腾讯科技(深圳)有限公司 Method for determining posture of object in image, client and server
CN108710897A (en) * 2018-04-24 2018-10-26 江苏科海智能***有限公司 A kind of online general target detecting system in distal end based on SSD-T
CN109801265B (en) * 2018-12-25 2020-11-20 国网河北省电力有限公司电力科学研究院 Real-time transmission equipment foreign matter detection system based on convolutional neural network
CN110909794B (en) * 2019-11-22 2022-09-13 乐鑫信息科技(上海)股份有限公司 Target detection system suitable for embedded equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073869A (en) * 2016-11-18 2018-05-25 法乐第(北京)网络科技有限公司 A kind of system of scene cut and detection of obstacles
CN108549852A (en) * 2018-03-28 2018-09-18 中山大学 Pedestrian detector's Auto-learning Method under special scenes based on the enhancing of depth network
CN108573238A (en) * 2018-04-23 2018-09-25 济南浪潮高新科技投资发展有限公司 A kind of vehicle checking method based on dual network structure
CN109145798A (en) * 2018-08-13 2019-01-04 浙江零跑科技有限公司 A kind of Driving Scene target identification and travelable region segmentation integrated approach
US10423860B1 (en) * 2019-01-22 2019-09-24 StradVision, Inc. Learning method and learning device for object detector based on CNN to be used for multi-camera or surround view monitoring using image concatenation and target object merging network, and testing method and testing device using the same
CN109919108A (en) * 2019-03-11 2019-06-21 西安电子科技大学 Remote sensing images fast target detection method based on depth Hash auxiliary network
CN110047069A (en) * 2019-04-22 2019-07-23 北京青燕祥云科技有限公司 A kind of image detection device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FUKAI ZHANG,FENG YANG,CE LI,AND GUAN YUAN: "CMNet:A Connect-and-Merge Convolutional Neural Network for Fast Vehicle Detection in Urban Traffic Surveillance", 《IEEE ACCESS》 *
韩凯: "基于深度学习的目标检测研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021098831A1 (en) * 2019-11-22 2021-05-27 乐鑫信息科技(上海)股份有限公司 Target detection system suitable for embedded device
CN112118366A (en) * 2020-07-31 2020-12-22 中标慧安信息技术股份有限公司 Method and device for transmitting face picture data
CN112183558A (en) * 2020-09-30 2021-01-05 北京理工大学 Target detection and feature extraction integrated network based on YOLOv3
CN114913419A (en) * 2022-05-10 2022-08-16 西南石油大学 Intelligent parking target detection method and system

Also Published As

Publication number Publication date
CN110909794B (en) 2022-09-13
US20220398835A1 (en) 2022-12-15
WO2021098831A1 (en) 2021-05-27

Similar Documents

Publication Publication Date Title
CN110909794B (en) Target detection system suitable for embedded equipment
CN111126472B (en) SSD (solid State disk) -based improved target detection method
WO2021238262A1 (en) Vehicle recognition method and apparatus, device, and storage medium
CN112150821B (en) Lightweight vehicle detection model construction method, system and device
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN111767927A (en) Lightweight license plate recognition method and system based on full convolution network
CN109118519A (en) Target Re-ID method, system, terminal and the storage medium of Case-based Reasoning segmentation
CN110991311A (en) Target detection method based on dense connection deep network
CN114937151A (en) Lightweight target detection method based on multi-receptive-field and attention feature pyramid
CN112613375B (en) Tire damage detection and identification method and equipment
CN109657715B (en) Semantic segmentation method, device, equipment and medium
CN112329881B (en) License plate recognition model training method, license plate recognition method and device
CN113177560A (en) Universal lightweight deep learning vehicle detection method
CN112862093A (en) Graph neural network training method and device
CN113361645A (en) Target detection model construction method and system based on meta-learning and knowledge memory
CN116089883B (en) Training method for improving classification degree of new and old categories in existing category increment learning
CN110458047A (en) A kind of country scene recognition method and system based on deep learning
CN110751191A (en) Image classification method and system
CN110163169A (en) Face identification method, device, electronic equipment and storage medium
CN114783021A (en) Intelligent detection method, device, equipment and medium for wearing of mask
CN109858349A (en) A kind of traffic sign recognition method and its device based on improvement YOLO model
CN116665092A (en) Method and system for identifying sewage suspended matters based on IA-YOLOV7
CN111462090A (en) Multi-scale image target detection method
WO2021258955A1 (en) Method and apparatus for marking object outline in target image, and storage medium and electronic apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant