CN112052826A - Intelligent enforcement multi-scale target detection method, device and system based on YOLOv4 algorithm and storage medium - Google Patents
Intelligent enforcement multi-scale target detection method, device and system based on YOLOv4 algorithm and storage medium Download PDFInfo
- Publication number
- CN112052826A CN112052826A CN202010989852.3A CN202010989852A CN112052826A CN 112052826 A CN112052826 A CN 112052826A CN 202010989852 A CN202010989852 A CN 202010989852A CN 112052826 A CN112052826 A CN 112052826A
- Authority
- CN
- China
- Prior art keywords
- data
- scale
- model
- yolov4
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 65
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 63
- 238000012544 monitoring process Methods 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000012795 verification Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims abstract description 21
- 230000010354 integration Effects 0.000 claims abstract description 15
- 238000013480 data collection Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 12
- 238000003064 k means clustering Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 238000002372 labelling Methods 0.000 claims description 11
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 6
- 238000009826 distribution Methods 0.000 abstract description 5
- 238000007726 management method Methods 0.000 description 10
- 238000011176 pooling Methods 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000005557 antagonist Substances 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 208000003464 asthenopia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000003960 organic solvent Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an intelligent law enforcement multi-scale target detection method, a detection device, a detection system and a storage medium based on a YOLOv4 algorithm, which are based on a YOLOv4 algorithm and face to an intelligent law enforcement multi-scale target object for detection and alarm. The method comprises a data collection step, a data integration step, a data annotation step, a data division step, a multi-scale feature map distribution step, a Yolov4 model training step, a Yolov4 model verification step and a target detection step. The method provided by the invention has the advantages of high speed and good effect, can process multi-scale and large-scale data, supports multiple languages, supports user-defined loss functions and the like, and is a better alternative scheme for monitoring the multi-scale target by the intelligent enforcement.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a multi-scale object target detection method, a detection device, a detection system and a storage medium.
Background
With the development of economy, the number of mobile or unfixed illegal vendor motor vehicles is increased rapidly, and aiming at the management of the illegal vendor motor vehicles which are not operated, mobile and unfixed, a video monitoring system needs to be built at a key part in a jurisdiction area for real-time image monitoring. Meanwhile, a mobile video monitoring system is equipped for urban management law enforcement vehicles, and a command center can monitor and manage the positions of the law enforcement vehicles through a GPS (global positioning system), monitor the inside and outside states of the vehicles in real time through videos, and achieve the purposes of mobile monitoring, non-long-term fixed place monitoring and law enforcement team management. The intelligent early warning of the fortification area is to perform key fortification on the intelligent law enforcement management area, and perform intelligent early warning on various abnormal conditions through a video monitoring system, so that the safety and stability of the law enforcement management process are guaranteed.
On one hand, on the basis of a multi-scale object target detection method, a traditional image processing method and a deep learning detection method are mainly used. The target detection method based on the traditional image processing mainly comprises the steps of HOG, HOG + SVM, SURF, SIFT and the like; under a target detection method based on traditional image processing, when a large-focus monitoring scene is encountered, a near-end target and a far-end target are very different, and targets with multiple scales exist in the same scene. When the target prediction area is selected, the size and the length-width ratio of the sliding window cannot be effectively set by adopting the sliding window mode, so the exhaustion mode of the sliding window has long time consumption and high redundancy. And the target detection method based on deep learning mainly comprises R-CNN, Fast R-CNN, Faster R-CNN and the like. Most of target detection methods based on deep learning use a mode based on fixed anchor regression, and the fixed anchor cannot adapt to the condition of considering the size difference of multiple scales of targets, so that a detection network cannot be converged or the quality of a training network is low, and missed detection and false detection of the targets are easily caused.
On the other hand, in the multi-scale target detection scene of intelligent law enforcement monitoring management, more identification is still carried out by means of human, but because the intelligent law enforcement monitoring environment relates to urban road traffic conditions, if manual judgment is carried out, long-time observation is needed, the labor cost of monitoring in a defense area is high easily, and visual fatigue easily occurs in manual work, so that the situations of erroneous judgment and missed judgment are generated.
Disclosure of Invention
In order to overcome the defects of the prior art, one of the objectives of the present invention is to provide a method for detecting an intelligent law enforcement multi-scale target based on the YOLOv4 algorithm, which has the advantages of high speed, good effect, capability of processing multi-scale and large-scale data, supporting multiple languages, supporting custom loss functions, etc., and is a better alternative for monitoring the intelligent law enforcement multi-scale target.
The second objective of the present invention is to provide an intelligent law enforcement multi-scale target detection device based on the YOLOv4 algorithm.
The invention further aims to provide an intelligent law enforcement multi-scale target detection system based on the YOLOv4 algorithm.
It is a further object of the present invention to provide a computer readable storage medium.
One of the purposes of the invention is realized by adopting the following technical scheme:
a method for detecting an intelligent law-enforcement multi-scale target based on a YOLOv4 algorithm comprises the following steps:
a multi-scale object target detection method comprises the following steps:
a data collection step: collecting video image data of different time points and different angles of a multi-scale object target scene;
a data integration step: integrating the collected video image data;
data labeling: marking the integrated video image data and forming source data;
a data dividing step: dividing the source data into a training data set and a verification data set according to a preset proportion;
and (3) multi-scale feature map allocation step: aiming at the picture data set, the size of a prior frame is obtained by adopting a K-means clustering algorithm, and the flow of the K-means clustering algorithm is as follows: randomly selecting 9 prior frame center points from a data set as a centroid; calculating the Euclidean distance between the center point of each prior frame and the centroid, and dividing the closer the distance is, the corresponding set is obtained; after the sets are grouped, 3 sets exist, and the mass center of each set is recalculated; setting thresholds with different sizes according to different resolutions of large, medium and small, if the distance between the new centroid and the original centroid is smaller than the set threshold, terminating the algorithm, otherwise iterating the steps 2-4; finally, clustering prior frames with 9 sizes according to different scales;
YOLOv4 model training step: training learning is carried out on the training data set by using Yolov4, and the operation is as follows: inputting the extracted multi-size feature map into a CSPDarknet53 backbone network, wherein a CSPDarknet53 is a CSPNet network added on the basis of a Darknet53 backbone network of YOLOv3, the CSP network is called a cross-level connection part network, and the accuracy can be ensured while the calculated amount is reduced by integrating the gradient change into the feature map from head to tail, so that a feature pyramid under multiple scales is obtained; inputting the multi-scale features into the SPPNet network, wherein the SPPNet network is called a space pyramid pooling network, and aims to increase the receptive field of the network (namely the identification area of the target), and the characteristic dimension reduction is realized by alternately connecting the convolution layer and the pooling layer, so that the dimension-reduced features are obtained; accelerating information fusion of the shallow feature and the deep feature through a PANet path aggregation network to obtain fusion features of different scales; fourthly, finally outputting a training result through the full connection layer, wherein the training result comprises frame regression coordinates, a target classification result and confidence degrees; calculating a loss function value according to a corresponding result, wherein the loss function comprises three contents: frame regression loss, classification loss and confidence loss, wherein the maximum iteration number set by parameters is 50000 times, the initial learning rate is 0.01, the batch processing size is 32, the attenuation rate is 0.0005 and the momentum rate is 0.9, the learning rate and the batch processing size are properly adjusted according to the descending trend of the loss value, the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, and a trained network model is obtained and recorded as a prediction model;
YOLOv4 model verification step: verifying the prediction model through the verification data set to obtain a model score, evaluating the model, screening out the model with the optimal prediction performance through model evaluation, and marking the model as a final model;
and a target detection step: and monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored.
Further, in the data collection step, the multi-scale object target scene is a traffic law enforcement monitoring management area; the different time points at least comprise 6 time points of morning congestion, morning unobstructed, afternoon congestion, afternoon unobstructed, evening congestion and evening unobstructed; the congested and unobstructed scene conditions are divided according to the number of the vehicle targets of the road conditions, the congested road section is divided when the number of the snapshotted vehicles reaches more than 50 in the scene of snapshotting by the monitoring camera, the congested road section is generally divided in the peak hours of getting on and off duty in the city center, such as 8: 00-10: 00, 16: 00-18: 00 and 18: 00-20: 00, and the rest time of the city or the places near the suburb are generally unobstructed. The different points in time preferably comprise different weather and different seasons of the location of the multi-scale object target scene. The objects in the video image data include: any combination of a dolly, police car, taxi, minibus, bus for passenger, single-person electric vehicle, express electric vehicle, truck, sanitation vehicle, tank car, engineering vehicle, fire truck, ambulance, police motorcycle, other non-motor vehicle and pedestrian.
Further, in the data integration step, the collected video image data are placed in the same folder; in the YOLOv4 model verification step, model evaluation is performed by three indexes, including: recall, accuracy and average accuracy.
Further, in the data annotation step, depth model training annotation is performed on the integrated video image data to form source data, and an annotation range includes: the position of the image, the image name, the image width and height, the image dimension, the labeled object name and the xy coordinate value of the bbox; the labeled object name comprises: any combination of a dolly, police car, taxi, minibus, bus for passenger, single-person electric vehicle, express electric vehicle, truck, sanitation vehicle, tank car, engineering vehicle, fire truck, ambulance, police motorcycle, other non-motor vehicle and pedestrian. Preferably, the data annotation can be selected from an annotation tool LabelImg which basically contains all information of the object detection scene, including the picture name, the picture size, the picture storage path, the target position coordinates and the target category name. Of course, other labeling tools, such as Labelme, may be used.
Further, in the data partitioning step, the ratio of the training data set to the validation data set is 3:1, 7:3, 8:2, or 98: 2. The division of the training data set and the verification data set is generally divided into 7:3 or 8:2 and the like for a small sample (such as 10000); for large samples (e.g., 1000000), the proportion of the validation data set may be correspondingly smaller, e.g., 98:2, etc.
Further, in the data dividing step, samples are randomly sampled from each video division, the number of randomly sampled samples of each scene is consistent, uniform data distribution is achieved, and the sampling ratio of the training set to the testing set of each video is 3: 1.
Further, in the multi-scale feature map allocation step, large, medium and small multi-scale feature maps are dynamically allocated for different vehicle types.
Further, in the multi-scale feature map allocation step, the sizes of prior frames are obtained by adopting K-means clustering, 3 prior frames are set for each downsampling scale, and 9 prior frames with sizes are clustered in total; in the COCO dataset these 9 prior boxes are: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x198), and (373x 326); dynamic assignment, the 13x13 feature map applies a priori blocks (116x90), (156x198), (373x 326); the 26x26 signature applies a priori blocks (30x61), (62x45), (59x 119); the 52x52 signature applies a priori boxes (10x13), (16x30), (33x 23).
Further, in the target detection step, the traffic law enforcement monitoring management area is monitored by using the final model, a camera is used as the input of the model, and when target objects with different sizes are identified, alarm information is generated.
Further, in the YOLOv4 model prediction step, a model takes a picture, a video or a camera ip address as an input, and a model output is a target detection result. The input mode is different, but the principle is to process and analyze the picture. If the input is a video or camera IP address, reading one frame of picture from the video or camera, taking the frame of picture as the input of the model, further outputting a target detection result, and continuously reading the next frame of picture for analysis after the analysis is finished.
Further, in the YOLOv4 model verification step, model evaluation is performed through three indexes, including: recall, accuracy and average accuracy.
The second purpose of the invention is realized by adopting the following technical scheme:
an intelligent law enforcement multi-scale target detection device based on a YOLOv4 algorithm, comprising: one or more processors, and memory for storing one or more computer programs which, when executed by the one or more processors, perform the object detection step of one of the purposes: monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
the process of establishing the final model comprises a data collection step, a data integration step, a data annotation step, a data division step, a multi-scale feature map distribution step, a Yolov4 model training step and a Yolov4 model verification step which are one of purposes.
The third purpose of the invention is realized by adopting the following technical scheme:
an intelligent law enforcement multi-scale target detection system based on a YOLOv4 algorithm, comprising:
an image acquisition device for acquiring image data to be analyzed;
a computing device coupled with the image acquisition device and comprising: one or more processors, and memory for storing one or more computer programs which, when executed by the one or more processors, perform the object detection step of one of the purposes: monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
an alert device coupled with the computing device and configured to alert the alert information;
the process of establishing the final model comprises a data collection step, a data integration step, a data annotation step, a data division step, a multi-scale feature map distribution step, a Yolov4 model training step and a Yolov4 model verification step which are one of purposes.
The fourth purpose of the invention is realized by adopting the following technical scheme:
a computer readable storage medium having one or more computer programs stored thereon, wherein the one or more computer programs, when executed by one or more processors, perform the object detection step of one of the purposes: monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
the process of establishing the final model comprises a data collection step, a data integration step, a data annotation step, a data division step, a multi-scale feature map distribution step, a Yolov4 model training step and a Yolov4 model verification step which are one of purposes.
Compared with the prior art, the invention has the beneficial effects that:
(1) the intelligent law enforcement multi-scale target detection method based on the YOLOv4 algorithm is suitable for complex scenes, can be applied to the field of intelligent law enforcement monitoring management, and is a multi-scale object target detection and alarm method for complex scene intelligent law enforcement monitoring management. The present invention employs the Prior detection (Prior detection) system of YOLOv4 to re-use classifiers or locators for performing detection tasks, assigning model dynamic applications to multiple locations and scales of the image. In addition, a completely different approach is used with respect to other object detection methods, applying a single neural network to the entire image, which network divides the image into different regions, thus predicting the bounding box and probability of each block of regions, which bounding boxes will be weighted by the predicted probability. The model has some advantages over classifier-based systems. Unlike R-CNN, which requires thousands of single target images, Yolov4 predicts through a single network evaluation, which makes Yolov4 very Fast, typically 1000 times faster than R-CNN, 100 times faster than Fast R-CNN.
(2) The method for detecting the intelligent law enforcement multi-scale target based on the YOLOv4 algorithm has the advantages of high speed, good effect, capability of processing multi-scale and large-scale data, supporting multiple languages, supporting custom loss functions and the like, and is a better alternative scheme for monitoring the intelligent law enforcement multi-scale target. The specific analysis is as follows:
the speed is high: discarding softmax, and performing multi-scale prediction by using Anchor bbox;
cross-platform: the method is suitable for Windows, Linux, macOS and a plurality of cloud platforms;
multilingual: support C + +, Python, R, Java, Scala, Julia, etc.;
the effect is good: wins many data science and machine learning challenges and is available for production by many companies.
(3) The intelligent law enforcement multi-scale target detection method based on the YOLOv4 algorithm further has the following advantages:
dynamically allocating anchors: and acquiring training data, performing data fitting on a training target, dynamically analyzing the characteristics of the anchor in different scales through big data fitting, and dynamically setting the value of the anchor.
Design network structure YOLOv 4: the target detection multi-scale branch in the YOLOv4 is designed, the problem of missed detection and false detection of target detection is solved through the input of multi-scale features, the accuracy of target detection can be effectively improved, and the overall effect of target identification is improved. Relative to YOLOv3, its backbone network meets the following requirements: firstly, the input resolution is high, and the detection accuracy of small objects is improved; secondly, more layers are provided, and the receptive field is improved to adapt to the increase of input; and thirdly, more parameters are used for improving the capability of detecting single-image multi-size targets. Overall, the accuracy is improved by nearly 10 points, and the speed is improved by a small amount.
Drawings
Fig. 1 is a flowchart of an intelligent-implementation multi-scale target detection method based on YOLOv4 algorithm according to embodiment 1 of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and the detailed description, and it should be noted that any combination of the embodiments or technical features described below can be used to form a new embodiment without conflict.
Example 1
As shown in fig. 1, a law enforcement intelligence oriented multi-scale object target detection method based on YOLOv4 algorithm includes the following steps:
a data collection step: collecting video image data of the intelligent law enforcement monitoring scene at different time points and different angles; the specific operation mode is as follows:
firstly, collecting time points: the method is divided into 6 scenes including morning congestion, morning unobstructed, afternoon congestion, afternoon unobstructed, evening congestion and evening unobstructed according to the scenes. The congested and unobstructed scene conditions are divided according to the number of the vehicle targets of the road conditions, the congested road section is divided when the number of the snapshotted vehicles reaches more than 50 in the scene of snapshotting by the monitoring camera, the congested road section is generally divided in the peak hours of getting on and off duty in the city center, such as 8: 00-10: 00, 16: 00-18: 00 and 18: 00-20: 00, and the rest time of the city or the places near the suburb are generally unobstructed.
Collecting places: near the construction area, near the city center, near the station and on other overpasses (such as fire departments, environmental protection departments, areas near hospitals and the like), because the shooting range is more consistent with the height of the intelligent traffic law enforcement camera.
Collecting modes and quantity: 30-second videos including road one-way and two-way angles and lateral angles are shot on the overpasses at all places. Because the frame rate of the video is generally about 1 second and 30 frames, the frame taking frequency is set to be 1 second and 3 frames are taken, and the total number of samples is at least 100 or more, namely the number of samples is at least about 10000. The number of the data set pictures is about 10000 video frame pictures, and the number of the sample videos is at least more than 100.
Acquiring data in the video image comprises the following steps: any combination of a dolly, police car, taxi, minibus, bus for passenger, single-person electric vehicle, express electric vehicle, truck, sanitation vehicle, tank car, engineering vehicle, fire truck, ambulance, police motorcycle, other non-motor vehicle and pedestrian.
A data integration step: integrating the collected video image data; the VOC2007 folder is newly created under the catalogue, and three folders of antagonists, ImageSets and JPEGImages are created under the VOC2007 folder. Then build new Main folder under ImageSets. And copying the collected data set picture to a JPEGImages directory.
Data labeling: carrying out depth model training and labeling on the integrated video image data and forming source data, wherein the specific operation mode is as follows:
the tool comprises: the used labeling tool is labellimg, and an xml labeling file is generated;
data set numbering: planning data, reducing the possibility of errors, randomly numbering about 10000 video frame pictures, and coding a reasonable sequence number for the pictures, such as 000001-000999;
marking data: and labeling data by using labelimg software, wherein each picture name corresponds to an xml label file with a corresponding name, such as a picture 000001.jpg, and the label file is 000001. xml. The range of labels includes: the position of the image, the name of the image (such as 000001.jpg), the width and height of the image, the dimension of the image, the name of the annotated object and the xy coordinate value of bbox; the annotated object names include: 1. a trolley car; 2. police car police-car; 3. taxi; 4. a minibus van; 5. bus of the bus; 6. minibus; 7. coach bus coach for passenger transport; 8. electric-vehicle of single person; 9. express-vehicle; 10. truck; 11. sanitation vehicle sanitation-truck; 12. tank truck tanker-truck; 13. engineering-truck; 14. fire-truck; 15. ambulance; 16. police motorcycle police-motorcycle; 17. other non-motor vehicles others; 18. pedestrian.
A data dividing step: dividing the source data according to a preset proportion, wherein a training data set accounts for 75%, and a verification data set accounts for 25%;
and (3) multi-scale feature map allocation step: and (3) obtaining the sizes of prior frames by adopting K-means clustering for the vehicle picture data set, setting 3 prior frames for each downsampling scale, and clustering the prior frames with 9 sizes in total. In the COCO dataset these 9 prior boxes are: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x198), (373x 326). In dynamic assignment, applying larger a priori boxes (116x90), (156x198), (373x326) on the smallest 13x13 feature map (with the largest receptive field) is suitable for detecting larger objects. Medium prior boxes (30x61), (62x45), (59x119) were applied on the medium 26x26 signature (medium receptive field), suitable for detecting medium sized objects. Smaller a priori boxes (10x13), (16x30) and (33x23) are applied to the larger 52x52 signature (smaller receptive field), which is suitable for detecting smaller objects.
TABLE 1 eigenmap multiscale assignment
YOLOv4 model training step: training learning is performed on the training data set by using Yolov4, and the operation is as follows: inputting the extracted multi-size feature map into a CSPDarknet53 backbone network, wherein a CSPDarknet53 is a CSPNet network added on the basis of a Darknet53 backbone network of YOLOv3, the CSP network is called a cross-level connection part network, and the accuracy can be ensured while the calculated amount is reduced by integrating the gradient change into the feature map from head to tail, so that a feature pyramid under multiple scales is obtained; inputting the multi-scale features into the SPPNet network, wherein the SPPNet network is called a space pyramid pooling network, and aims to increase the receptive field of the network (namely the identification area of the target), and the characteristic dimension reduction is realized by alternately connecting the convolution layer and the pooling layer, so that the dimension-reduced features are obtained; accelerating information fusion of the shallow feature and the deep feature through a PANet path aggregation network to obtain fusion features of different scales; fourthly, finally outputting a training result through the full connection layer, wherein the training result comprises frame regression coordinates, a target classification result and confidence degrees; calculating a loss function value according to a corresponding result, wherein the loss function comprises three contents: frame regression loss, classification loss and confidence loss, wherein the maximum iteration number set by parameters is 50000 times, the initial learning rate is 0.01, the batch processing size is 32, the attenuation rate is 0.0005 and the momentum rate is 0.9, the learning rate and the batch processing size are properly adjusted according to the descending trend of the loss value, the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, and a trained network model is obtained and recorded as a prediction model;
YOLOv4 model verification step: and verifying the prediction model through the verification data set to obtain a model score, evaluating the model, and screening out the model with the optimal prediction performance through model evaluation. Model evaluation is performed through three indexes to verify the quality of the model, including:
recall (R: call): i.e. how many positive samples of the samples are predicted correctly;
precision (P: precision): i.e. how many of the samples predicted to be positive are true;
mean average precision (mAP): the mAP is the average of all classes of AP (average precision), and the average degree of goodness in all classes is measured.
Their calculation formulas are respectively as follows:
R=TP/(TP+FN);P=TP/(TP+FP);mAP=∫P(R)dR
in the formula, the first and second organic solvents are,
TP (true Positives): true positive samples (i.e., positive samples are correctly predicted as positive samples);
TN (true neurons): true negative examples (i.e., negative examples are correctly predicted as negative examples);
FP (false positives): false positive samples (i.e., negative samples are mispredicted as positive samples);
FN (false negatives): false negative samples (i.e., positive samples are mispredicted as negative samples);
p: the accuracy rate;
r: the recall ratio is as follows:
AP: area under PR curve (PR curve: Precision-Recall curve) measures whether detection is good or bad for one class, and mAP measures whether detection is good or bad for a plurality of classes.
The AP formula is as follows:
in this example, the accuracy was 94.23%, the recall was 93.82%, and the average accuracy value was 89.35%.
And a target detection step: the final model is utilized to monitor the intelligent law enforcement multi-scale target, a camera for intelligent law enforcement monitoring is used as the input of the model, and when the threatened object targets of various motor vehicles (such as a trolley, a police car, a taxi, a minibus, a bus, a minibus, a passenger bus, a single electric vehicle, an express electric vehicle, a truck, a sanitation vehicle, an oil tank truck, an engineering vehicle, a fire truck, an ambulance, a police motorcycle and other non-motor vehicles) with different scales are identified, the warning information is pushed to achieve the effect of monitoring the intelligent law enforcement multi-scale target.
According to the method for detecting the multi-scale object target facing the intelligent law enforcement based on the YOLOv4 algorithm, when the multi-scale feature map is distributed, 3 prior frames are subjected to size division in an unsupervised learning mode of kmeans clustering according to vehicle pictures in a sample scene, the size of the target is divided into large, medium and small, the size of the target corresponds to the size of the 3 prior frames respectively, and the method can be used for detecting the target at far and near distances in multiple scenes. However, the conventional target detection network generally only detects targets with similar sizes, and easily ignores targets with small object sizes in a large scene.
Example 2
An intelligent law enforcement multi-scale target detection device based on a YOLOv4 algorithm, comprising: one or more processors, and memory for storing one or more computer programs, the one or more computer programs when executed by the one or more processors, performing the object detecting step of: monitoring a multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
wherein, the establishment process of the final model is as follows:
a data collection step: collecting video image data of different time points and different angles of a multi-scale object target scene;
a data integration step: integrating the collected video image data;
data labeling: marking the integrated video image data and forming source data;
a data dividing step: dividing the source data into a training data set and a verification data set according to a preset proportion;
and (3) multi-scale feature map allocation step: aiming at the picture data set, the size of a prior frame is obtained by adopting a K-means clustering algorithm, and the flow of the K-means clustering algorithm is as follows: randomly selecting 9 prior frame center points from a data set as a centroid; calculating the Euclidean distance between the center point of each prior frame and the centroid, and dividing the closer the distance is, the corresponding set is obtained; after the sets are grouped, 3 sets exist, and the mass center of each set is recalculated; setting thresholds with different sizes according to different resolutions of large, medium and small, if the distance between the new centroid and the original centroid is smaller than the set threshold, terminating the algorithm, otherwise iterating the steps 2-4; finally, clustering prior frames with 9 sizes according to different scales.
YOLOv4 model training step: training learning is carried out on the training data set by using Yolov4, and the operation is as follows: inputting the extracted multi-size feature map into a CSPDarknet53 backbone network, wherein a CSPDarknet53 is a CSPNet network added on the basis of a Darknet53 backbone network of YOLOv3, the CSP network is called a cross-level connection part network, and the accuracy can be ensured while the calculated amount is reduced by integrating the gradient change into the feature map from head to tail, so that a feature pyramid under multiple scales is obtained; inputting the multi-scale features into the SPPNet network, wherein the SPPNet network is called a space pyramid pooling network, and aims to increase the receptive field of the network (namely the identification area of the target), and the characteristic dimension reduction is realized by alternately connecting the convolution layer and the pooling layer, so that the dimension-reduced features are obtained; accelerating information fusion of the shallow feature and the deep feature through a PANet path aggregation network to obtain fusion features of different scales; fourthly, finally outputting a training result through the full connection layer, wherein the training result comprises frame regression coordinates, a target classification result and confidence degrees; calculating a loss function value according to a corresponding result, wherein the loss function comprises three contents: frame regression loss, classification loss and confidence loss, wherein the maximum iteration number set by parameters is 50000 times, the initial learning rate is 0.01, the batch processing size is 32, the attenuation rate is 0.0005 and the momentum rate is 0.9, the learning rate and the batch processing size are properly adjusted according to the descending trend of the loss value, the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, and a trained network model is obtained and recorded as a prediction model;
YOLOv4 model verification step: and verifying the prediction model through a verification data set, screening out a model with the optimal prediction performance through model evaluation, and marking as a final model.
In some embodiments, the processor may include various processing circuitry, such as, but not limited to, one or more of a central processor or a communications processor. The processor may perform control of at least one other component of the multi-scale object target detection apparatus, and/or perform operations or data processing related to communications. The memory may include volatile and/or non-volatile memory. The multi-scale object detection device may include, for example, a smart phone, a tablet computer, a desktop computer, an e-book reader, an MP3 player, an electronic bracelet, a smart watch, and the like.
Example 3
An intelligent law enforcement multi-scale target detection system based on a YOLOv4 algorithm, comprising:
an image acquisition device for acquiring image data to be analyzed;
a computing device coupled with the image acquisition device and comprising: one or more processors, and memory for storing one or more computer programs, the one or more computer programs when executed by the one or more processors, performing the object detecting step of: monitoring a multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
a warning device coupled to the computing device and configured to warn of the warning information.
Wherein, the establishment process of the final model is as follows:
a data collection step: collecting video image data of different time points and different angles of a multi-scale object target scene;
a data integration step: integrating the collected video image data;
data labeling: marking the integrated video image data and forming source data;
a data dividing step: dividing the source data into a training data set and a verification data set according to a preset proportion;
and (3) multi-scale feature map allocation step: aiming at the picture data set, the size of a prior frame is obtained by adopting a K-means clustering algorithm, and the flow of the K-means clustering algorithm is as follows: randomly selecting 9 prior frame center points from a data set as a centroid; calculating the Euclidean distance between the center point of each prior frame and the centroid, and dividing the closer the distance is, the corresponding set is obtained; after the sets are grouped, 3 sets exist, and the mass center of each set is recalculated; setting thresholds with different sizes according to different resolutions of large, medium and small, if the distance between the new centroid and the original centroid is smaller than the set threshold, terminating the algorithm, otherwise iterating the steps 2-4; finally, clustering prior frames with 9 sizes according to different scales.
YOLOv4 model training step: training learning is carried out on the training data set by using Yolov4, and the operation is as follows: inputting the extracted multi-size feature map into a CSPDarknet53 backbone network, wherein a CSPDarknet53 is a CSPNet network added on the basis of a Darknet53 backbone network of YOLOv3, the CSP network is called a cross-level connection part network, and the accuracy can be ensured while the calculated amount is reduced by integrating the gradient change into the feature map from head to tail, so that a feature pyramid under multiple scales is obtained; inputting the multi-scale features into the SPPNet network, wherein the SPPNet network is called a space pyramid pooling network, and aims to increase the receptive field of the network (namely the identification area of the target), and the characteristic dimension reduction is realized by alternately connecting the convolution layer and the pooling layer, so that the dimension-reduced features are obtained; accelerating information fusion of the shallow feature and the deep feature through a PANet path aggregation network to obtain fusion features of different scales; fourthly, finally outputting a training result through the full connection layer, wherein the training result comprises frame regression coordinates, a target classification result and confidence degrees; calculating a loss function value according to a corresponding result, wherein the loss function comprises three contents: frame regression loss, classification loss and confidence loss, wherein the maximum iteration number set by parameters is 50000 times, the initial learning rate is 0.01, the batch processing size is 32, the attenuation rate is 0.0005 and the momentum rate is 0.9, the learning rate and the batch processing size are properly adjusted according to the descending trend of the loss value, the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, and a trained network model is obtained and recorded as a prediction model;
YOLOv4 model verification step: and verifying the prediction model through a verification data set, screening out a model with the optimal prediction performance through model evaluation, and marking as a final model.
In some embodiments, the computing device may be in wired or wireless connection with the image acquisition device. The warning device may be integrated with the computing device or the warning device and the computing device may be 2 separate components. The computing device may include, for example, a smart phone, a tablet computer, a desktop computer, an electronic book reader, an MP3 player, an electronic bracelet, a smart watch, and the like.
Example 4
A computer readable storage medium having one or more computer programs stored thereon, wherein the one or more computer programs, when executed by one or more processors, implement the object detection steps of: and monitoring a multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored.
Wherein, the establishment process of the final model is as follows:
a data collection step: collecting video image data of different time points and different angles of a multi-scale object target scene;
a data integration step: integrating the collected video image data;
data labeling: marking the integrated video image data and forming source data;
a data dividing step: dividing the source data into a training data set and a verification data set according to a preset proportion;
and (3) multi-scale feature map allocation step: aiming at the picture data set, the size of a prior frame is obtained by adopting a K-means clustering algorithm, and the flow of the K-means clustering algorithm is as follows: randomly selecting 9 prior frame center points from a data set as a centroid; calculating the Euclidean distance between the center point of each prior frame and the centroid, and dividing the closer the distance is, the corresponding set is obtained; after the sets are grouped, 3 sets exist, and the mass center of each set is recalculated; setting thresholds with different sizes according to different resolutions of large, medium and small, if the distance between the new centroid and the original centroid is smaller than the set threshold, terminating the algorithm, otherwise iterating the steps 2-4; finally, clustering prior frames with 9 sizes according to different scales.
YOLOv4 model training step: training learning is carried out on the training data set by using Yolov4, and the operation is as follows: inputting the extracted multi-size feature map into a CSPDarknet53 backbone network, wherein a CSPDarknet53 is a CSPNet network added on the basis of a Darknet53 backbone network of YOLOv3, the CSP network is called a cross-level connection part network, and the accuracy can be ensured while the calculated amount is reduced by integrating the gradient change into the feature map from head to tail, so that a feature pyramid under multiple scales is obtained; inputting the multi-scale features into the SPPNet network, wherein the SPPNet network is called a space pyramid pooling network, and aims to increase the receptive field of the network (namely the identification area of the target), and the characteristic dimension reduction is realized by alternately connecting the convolution layer and the pooling layer, so that the dimension-reduced features are obtained; accelerating information fusion of the shallow feature and the deep feature through a PANet path aggregation network to obtain fusion features of different scales; fourthly, finally outputting a training result through the full connection layer, wherein the training result comprises frame regression coordinates, a target classification result and confidence degrees; calculating a loss function value according to a corresponding result, wherein the loss function comprises three contents: frame regression loss, classification loss and confidence loss, wherein the maximum iteration number set by parameters is 50000 times, the initial learning rate is 0.01, the batch processing size is 32, the attenuation rate is 0.0005 and the momentum rate is 0.9, the learning rate and the batch processing size are properly adjusted according to the descending trend of the loss value, the training is stopped until the loss function value output by the training data set is less than or equal to the threshold value or the set maximum iteration number is reached, and a trained network model is obtained and recorded as a prediction model;
YOLOv4 model verification step: and verifying the prediction model through a verification data set, screening out a model with the optimal prediction performance through model evaluation, and marking as a final model.
In some embodiments, the computer readable medium may include, for example, a hard disk, a floppy disk, a magnetic medium, an optical recording medium, a DVD, a magneto-optical medium, and the like.
The above embodiments are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, and any insubstantial changes and substitutions made by those skilled in the art based on the present invention are within the protection scope of the present invention.
Claims (10)
1. A method for detecting an intelligent law-enforcement multi-scale target based on a YOLOv4 algorithm is characterized by comprising the following steps:
a data collection step: collecting video image data of different time points and different angles of a multi-scale object target scene;
a data integration step: integrating the collected video image data;
data labeling: marking the integrated video image data and forming source data;
a data dividing step: dividing the source data into a training data set and a verification data set according to a preset proportion;
and (3) multi-scale feature map allocation step: adopting a K-means clustering algorithm to obtain the sizes of prior frames aiming at the picture data set, and clustering the prior frames with 9 sizes according to different scales;
YOLOv4 model training step: training learning is carried out on the training data set by using Yolov4, and the operation is as follows: inputting the extracted 9 size characteristic maps into a CSPDarknet53 backbone network, wherein the CSPDarknet53 is a CSPNet network added on the basis of a Darknet53 backbone network of YOLOv 3; inputting the multi-scale features into the SPPNet network; accelerating information fusion of the shallow feature and the deep feature through a PANet path aggregation network to obtain fusion features of different scales; fourthly, finally outputting the training result through the full connection layer; calculating a loss function value according to a corresponding result, adjusting the learning rate and the batch processing size according to the descending trend of the loss function value, stopping training until the loss function value output by the training data set is less than or equal to a threshold value or reaches a set maximum iteration number, obtaining a trained network model, and marking as a prediction model;
YOLOv4 model verification step: verifying the prediction model through the verification data set, screening out a model with optimal prediction performance through model evaluation, and marking as a final model;
and a target detection step: and monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored.
2. The method for intelligently enforcing multi-scale object detection based on YOLOv4 algorithm according to claim 1, wherein in the data collecting step, the multi-scale object scene is a traffic enforcement monitoring management area; the different time points at least comprise 6 time points of morning congestion, morning unobstructed, afternoon congestion, afternoon unobstructed, evening congestion and evening unobstructed; the objects in the video image data include: any combination of a dolly, police car, taxi, minibus, bus for passenger, single-person electric vehicle, express electric vehicle, truck, sanitation vehicle, tank car, engineering vehicle, fire truck, ambulance, police motorcycle, other non-motor vehicle and pedestrian.
3. The method of claim 1, wherein in the data integration step, the collected video image data are placed in the same folder; in the YOLOv4 model verification step, model evaluation is performed by three indexes, including: recall, accuracy and average accuracy.
4. The method as claimed in claim 2, wherein in the step of data annotation, the integrated video image data is annotated and source data is formed, and the annotation range includes: the position of the image, the image name, the image width and height, the image dimension, the labeled object name and the xy coordinate value of the bbox; the labeled object name comprises: any combination of a dolly, police car, taxi, minibus, bus for passenger, single-person electric vehicle, express electric vehicle, truck, sanitation vehicle, tank car, engineering vehicle, fire truck, ambulance, police motorcycle, other non-motor vehicle and pedestrian.
5. The method of claim 1, wherein in the step of data partitioning, the ratio of the training data set to the validation data set is 3:1, 7:3, 8:2 or 98: 2.
6. The method for intelligently enforcing the multi-scale object detection based on the YOLOv4 algorithm of claim 1, wherein in the step of assigning the multi-scale feature map, K-means clustering is used to obtain the sizes of the prior frames, 3 prior frames are set for each downsampling scale, and 9 prior frames are clustered together; in the COCO dataset these 9 prior boxes are: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x198), and (373x 326); dynamic assignment, the 13x13 feature map applies a priori blocks (116x90), (156x198), (373x 326); the 26x26 signature applies a priori blocks (30x61), (62x45), (59x 119); the 52x52 signature applies a priori boxes (10x13), (16x30), (33x 23).
7. The method of claim 1, wherein in the step of detecting the target, a traffic enforcement monitoring management area is monitored by using the final model, a camera is used as an input of the model, and when various target objects with different sizes are identified, warning information is generated.
8. An intelligent law enforcement multi-scale target detection device based on a YOLOv4 algorithm, comprising: one or more processors, and memory for storing one or more computer programs which, when executed by the one or more processors, perform the object detection steps of claim 1: monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
the process of establishing the final model comprises the steps of data collection, data integration, data annotation, data division, multi-scale feature map allocation, Yolov4 model training and Yolov4 model verification as claimed in claim 1.
9. An intelligent law enforcement multi-scale target detection system based on a YOLOv4 algorithm, comprising:
an image acquisition device for acquiring image data to be analyzed;
a computing device coupled with the image acquisition device and comprising: one or more processors, and memory for storing one or more computer programs which, when executed by the one or more processors, perform the object detection steps of claim 1: monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
an alert device coupled with the computing device and configured to alert the alert information;
the process of establishing the final model comprises the steps of data collection, data integration, data annotation, data division, multi-scale feature map allocation, Yolov4 model training and Yolov4 model verification as claimed in claim 1.
10. A computer readable storage medium, having one or more computer programs stored thereon, wherein the one or more computer programs, when executed by one or more processors, implement the object detection step of claim 1: monitoring the multi-scale object target scene by using the final model, and generating alarm information when a specific target object is monitored;
the process of establishing the final model comprises the steps of data collection, data integration, data annotation, data division, multi-scale feature map allocation, Yolov4 model training and Yolov4 model verification as claimed in claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010989852.3A CN112052826A (en) | 2020-09-18 | 2020-09-18 | Intelligent enforcement multi-scale target detection method, device and system based on YOLOv4 algorithm and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010989852.3A CN112052826A (en) | 2020-09-18 | 2020-09-18 | Intelligent enforcement multi-scale target detection method, device and system based on YOLOv4 algorithm and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112052826A true CN112052826A (en) | 2020-12-08 |
Family
ID=73604140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010989852.3A Pending CN112052826A (en) | 2020-09-18 | 2020-09-18 | Intelligent enforcement multi-scale target detection method, device and system based on YOLOv4 algorithm and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112052826A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308054A (en) * | 2020-12-29 | 2021-02-02 | 广东科凯达智能机器人有限公司 | Automatic reading method of multifunctional digital meter based on target detection algorithm |
CN112465072A (en) * | 2020-12-22 | 2021-03-09 | 浙江工业大学 | Excavator image identification method based on YOLOv4 model |
CN112560816A (en) * | 2021-02-20 | 2021-03-26 | 北京蒙帕信创科技有限公司 | Equipment indicator lamp identification method and system based on YOLOv4 |
CN112785557A (en) * | 2020-12-31 | 2021-05-11 | 神华黄骅港务有限责任公司 | Belt material flow detection method and device and belt material flow detection system |
CN112802302A (en) * | 2020-12-31 | 2021-05-14 | 国网浙江省电力有限公司双创中心 | Electronic fence method and system based on multi-source algorithm |
CN112989606A (en) * | 2021-03-16 | 2021-06-18 | 上海哥瑞利软件股份有限公司 | Data algorithm model checking method, system and computer storage medium |
CN112990131A (en) * | 2021-04-27 | 2021-06-18 | 广东科凯达智能机器人有限公司 | Method, device, equipment and medium for acquiring working gear of voltage change-over switch |
CN113033604A (en) * | 2021-02-03 | 2021-06-25 | 淮阴工学院 | Vehicle detection method, system and storage medium based on SF-YOLOv4 network model |
CN113158962A (en) * | 2021-05-06 | 2021-07-23 | 北京工业大学 | Swimming pool drowning detection method based on YOLOv4 |
CN113221646A (en) * | 2021-04-07 | 2021-08-06 | 山东捷讯通信技术有限公司 | Method for detecting abnormal objects of urban underground comprehensive pipe gallery based on Scaled-YOLOv4 |
CN113298032A (en) * | 2021-06-16 | 2021-08-24 | 武汉卓目科技有限公司 | Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning |
CN113420607A (en) * | 2021-05-31 | 2021-09-21 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Multi-scale target detection and identification method for unmanned aerial vehicle |
CN113516643A (en) * | 2021-07-13 | 2021-10-19 | 重庆大学 | Method for detecting retinal vessel bifurcation and intersection points in OCTA image |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109615868A (en) * | 2018-12-20 | 2019-04-12 | 北京以萨技术股份有限公司 | A kind of video frequency vehicle based on deep learning is separated to stop detection method |
CN110059554A (en) * | 2019-03-13 | 2019-07-26 | 重庆邮电大学 | A kind of multiple branch circuit object detection method based on traffic scene |
CN110136449A (en) * | 2019-06-17 | 2019-08-16 | 珠海华园信息技术有限公司 | Traffic video frequency vehicle based on deep learning disobeys the method for stopping automatic identification candid photograph |
CN110889324A (en) * | 2019-10-12 | 2020-03-17 | 南京航空航天大学 | Thermal infrared image target identification method based on YOLO V3 terminal-oriented guidance |
-
2020
- 2020-09-18 CN CN202010989852.3A patent/CN112052826A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109615868A (en) * | 2018-12-20 | 2019-04-12 | 北京以萨技术股份有限公司 | A kind of video frequency vehicle based on deep learning is separated to stop detection method |
CN110059554A (en) * | 2019-03-13 | 2019-07-26 | 重庆邮电大学 | A kind of multiple branch circuit object detection method based on traffic scene |
CN110136449A (en) * | 2019-06-17 | 2019-08-16 | 珠海华园信息技术有限公司 | Traffic video frequency vehicle based on deep learning disobeys the method for stopping automatic identification candid photograph |
CN110889324A (en) * | 2019-10-12 | 2020-03-17 | 南京航空航天大学 | Thermal infrared image target identification method based on YOLO V3 terminal-oriented guidance |
Non-Patent Citations (1)
Title |
---|
ALEXEY BOCHKOVSKIY ET.AL: "YOLOv4: Optimal Speed and Accuracy of Object Detection", 《ARXIV:2004.10934V1》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112465072A (en) * | 2020-12-22 | 2021-03-09 | 浙江工业大学 | Excavator image identification method based on YOLOv4 model |
CN112465072B (en) * | 2020-12-22 | 2024-02-13 | 浙江工业大学 | Excavator image recognition method based on YOLOv4 model |
CN112308054A (en) * | 2020-12-29 | 2021-02-02 | 广东科凯达智能机器人有限公司 | Automatic reading method of multifunctional digital meter based on target detection algorithm |
CN112785557A (en) * | 2020-12-31 | 2021-05-11 | 神华黄骅港务有限责任公司 | Belt material flow detection method and device and belt material flow detection system |
CN112802302A (en) * | 2020-12-31 | 2021-05-14 | 国网浙江省电力有限公司双创中心 | Electronic fence method and system based on multi-source algorithm |
CN113033604A (en) * | 2021-02-03 | 2021-06-25 | 淮阴工学院 | Vehicle detection method, system and storage medium based on SF-YOLOv4 network model |
CN113033604B (en) * | 2021-02-03 | 2022-11-15 | 淮阴工学院 | Vehicle detection method, system and storage medium based on SF-YOLOv4 network model |
CN112560816A (en) * | 2021-02-20 | 2021-03-26 | 北京蒙帕信创科技有限公司 | Equipment indicator lamp identification method and system based on YOLOv4 |
CN112989606A (en) * | 2021-03-16 | 2021-06-18 | 上海哥瑞利软件股份有限公司 | Data algorithm model checking method, system and computer storage medium |
CN113221646A (en) * | 2021-04-07 | 2021-08-06 | 山东捷讯通信技术有限公司 | Method for detecting abnormal objects of urban underground comprehensive pipe gallery based on Scaled-YOLOv4 |
CN112990131A (en) * | 2021-04-27 | 2021-06-18 | 广东科凯达智能机器人有限公司 | Method, device, equipment and medium for acquiring working gear of voltage change-over switch |
CN113158962A (en) * | 2021-05-06 | 2021-07-23 | 北京工业大学 | Swimming pool drowning detection method based on YOLOv4 |
CN113420607A (en) * | 2021-05-31 | 2021-09-21 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Multi-scale target detection and identification method for unmanned aerial vehicle |
CN113298032A (en) * | 2021-06-16 | 2021-08-24 | 武汉卓目科技有限公司 | Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning |
CN113516643A (en) * | 2021-07-13 | 2021-10-19 | 重庆大学 | Method for detecting retinal vessel bifurcation and intersection points in OCTA image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112052826A (en) | Intelligent enforcement multi-scale target detection method, device and system based on YOLOv4 algorithm and storage medium | |
CN109816024B (en) | Real-time vehicle logo detection method based on multi-scale feature fusion and DCNN | |
Al-qaness et al. | An improved YOLO-based road traffic monitoring system | |
Kenk et al. | DAWN: vehicle detection in adverse weather nature dataset | |
CN109241349B (en) | Monitoring video multi-target classification retrieval method and system based on deep learning | |
CN110738857B (en) | Vehicle violation evidence obtaining method, device and equipment | |
CN113033604B (en) | Vehicle detection method, system and storage medium based on SF-YOLOv4 network model | |
CN111507989A (en) | Training generation method of semantic segmentation model, and vehicle appearance detection method and device | |
CN113822247B (en) | Method and system for identifying illegal building based on aerial image | |
CN106446150A (en) | Method and device for precise vehicle retrieval | |
CN104615986A (en) | Method for utilizing multiple detectors to conduct pedestrian detection on video images of scene change | |
CN107862072B (en) | Method for analyzing vehicle urban-entering fake plate crime based on big data technology | |
CN115424217A (en) | AI vision-based intelligent vehicle identification method and device and electronic equipment | |
CN112785610B (en) | Lane line semantic segmentation method integrating low-level features | |
Kiew et al. | Vehicle route tracking system based on vehicle registration number recognition using template matching algorithm | |
EP3244344A1 (en) | Ground object tracking system | |
CN116413740B (en) | Laser radar point cloud ground detection method and device | |
CN110765900B (en) | Automatic detection illegal building method and system based on DSSD | |
Kamenetsky et al. | Aerial car detection and urban understanding | |
CN111832463A (en) | Deep learning-based traffic sign detection method | |
Amala Ruby Florence et al. | Accident Detection System Using Deep Learning | |
Caballo et al. | YOLO-based Tricycle Detection from Traffic Video | |
Hou et al. | Application of YOLO V2 in construction vehicle detection | |
CN109800685A (en) | The determination method and device of object in a kind of video | |
CN112052824A (en) | Gas pipeline specific object target detection alarm method, device and system based on YOLOv3 algorithm and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201208 |
|
RJ01 | Rejection of invention patent application after publication |