CN112633086A - Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet - Google Patents
Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet Download PDFInfo
- Publication number
- CN112633086A CN112633086A CN202011427301.4A CN202011427301A CN112633086A CN 112633086 A CN112633086 A CN 112633086A CN 202011427301 A CN202011427301 A CN 202011427301A CN 112633086 A CN112633086 A CN 112633086A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- efficientdet
- detection
- multitask
- infrared
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 238000012544 monitoring process Methods 0.000 title claims abstract description 46
- 238000001514 detection method Methods 0.000 claims abstract description 155
- 230000000694 effects Effects 0.000 claims abstract description 93
- 230000011218 segmentation Effects 0.000 claims abstract description 80
- 238000011176 pooling Methods 0.000 claims abstract description 22
- 238000012805 post-processing Methods 0.000 claims abstract description 14
- 230000002708 enhancing effect Effects 0.000 claims abstract description 11
- 239000011800 void material Substances 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 29
- 230000007246 mechanism Effects 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 230000001965 increasing effect Effects 0.000 claims description 4
- 238000003709 image segmentation Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 238000003384 imaging method Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003331 infrared imaging Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011796 hollow space material Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention belongs to the technical field of near-infrared image pedestrian detection, and discloses a near-infrared pedestrian monitoring method, a system, a medium and equipment based on multitask EfficientDet, wherein pedestrian activity area distribution under different scenes is obtained by utilizing a near-infrared image pedestrian detection data set; adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and enhancing the segmentation performance by the attention of a void space pyramid pooling and convolution module; and based on the multitask pedestrian detection model, post-processing the pedestrian target detection result through the predicted pedestrian activity region to obtain a final pedestrian detection result and a pedestrian activity region result. The method has higher detection performance, can reduce false positive samples in the result, and has important significance for monitoring the pedestrian activity in the night monitoring scene.
Description
Technical Field
The invention belongs to the technical field of near-infrared image pedestrian detection, and particularly relates to a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet.
Background
The study of pedestrian detection began in the nineties of the twentieth century, from early traditional manual feature-based methods, such as HOG feature-based methods, Harr wavelet feature-based methods, and edgelet feature-based methods, to current deep learning feature extraction methods, such as those using the ResNet model, VGG model, and other convolutional neural network-based models. With the development of science and technology, the pedestrian detection technology is rapidly developed. Pedestrian detection is one of the branches of the computer vision field with wide application, and plays an important role in the fields of intelligent security, automatic driving, robots and the like.
The pedestrian detection technology of the near-infrared image is a technology which is beneficial to obtaining the position of a pedestrian by the near-infrared image, and is widely applied to intelligent security and automatic driving technologies. Under the night scene, the visible light imaging can not obtain the high-quality imaging effect, and compared with the traditional visible light detection, the infrared pedestrian detection has the following advantages: (1) under the condition of weak light, the imaging effect is still excellent, and the pedestrian features can be obtained from the background easily. (2) Infrared imaging can reduce background color interference. By virtue of these advantages, infrared pedestrian detection has been highly distinctive in many fields in recent years. The method has unusual performance in the fields of intelligent video monitoring, vehicle-mounted auxiliary driving, military early warning and the like. However, compared with visible light imaging, infrared imaging also has some disadvantages, such as less abundant texture and contour features than visible light imaging, and single color information, which adds a little difficulty to recognition and detection. But the method has very important significance in the aspects of concealed yoga and infrared pedestrian detection and also has serious challenge. Therefore, aiming at the existing problems, the design of an efficient and robust infrared pedestrian detection algorithm has a profound significance.
The pedestrian target detection of the near-infrared image is essentially a classification and regression problem, namely frame prediction is regarded as a regression task, and pedestrians and backgrounds are separated and regarded as a classification task. In the past decades, scholars at home and abroad make great contribution to the research of various pedestrian detection methods, the research methods can be divided into two categories by means of feature extraction, and the method based on manual features is gradually developed into the method based on deep learning features.
The early pedestrian detection method mainly performs feature extraction by manually designing a feature extraction operator, wherein the most typical feature is a Histogram of gradient directions (HOG) feature proposed by Dalal in 2005, which is used as a landmark mark on a pedestrian detection history, lays a foundation for excellent feature extraction, and simultaneously obtains a better detection effect by combining a learning method of a Support Vector Machine (SVM). Other methods based on manual features include methods based on Harr wavelet features and methods based on edgelet features.
The traditional infrared pedestrian image detection method based on manual feature extraction mainly has the following defects: (1) the manual feature design is difficult, the effectiveness of the artificial feature cannot be guaranteed (2) the hierarchy of the artificial feature is shallow, and the detection of pedestrians under a complex background is difficult to deal with, so that the detection performance is not high.
With the rapid development of deep learning theory and technology, there are researchers attempting to solve the pedestrian detection problem on infrared images using deep learning. Xingguo Zhang et al constructs a network model for visible light spectrum and infrared image by analyzing the target detection performance of R-CNN/Faster RCNN on visible light image, under the condition of using the same network model as fast RCNN, the omission ratio on VS data set and NIR data set is far lower than that of RPN _ BF and HOG + SVM, Wang Dongwei et al provides an improved YOLOv3 infrared video image pedestrian detection algorithm aiming at the problems of low accuracy and high omission ratio when YOLOv3 detects infrared video image pedestrians, and the result shows that the accuracy of the improved YOLOv3 algorithm in infrared pedestrian detection is as high as 90.63%, and is obviously superior to the fast-RCNN and YOLOv3 algorithms, and the improved network can detect more targets at the same time, and reduces the omission ratio.
Although the pedestrian detection problem of deep learning near infrared images obtains higher detection precision, the method has the following defects: (1) due to the lack of understanding of the overall semantics of the image and interference factors such as noise, error detection which is contrary to common knowledge information occurs in the detection result; (2) the effective features of the pedestrians may not be extracted due to the fact that the image information is single.
In summary, the problems of the prior art are as follows: due to the lack of understanding of the overall semantic information of the end-to-end convolutional neural network, a lot of misdetections against common sense can occur. Because the color of the near-infrared image is single and the outline is fuzzy, how to effectively extract the characteristics of the pedestrian target from the near-infrared image is also one of the main problems existing at present.
The difficulty of solving the technical problems is as follows: how to reduce semantic errors in detection results and how to effectively extract pedestrian features from near-infrared images without reducing detection performance.
The significance of solving the technical problems is as follows: the pedestrian detection algorithm of the near-infrared image is widely applied to the fields of intelligent security, automatic driving and robots, and how to obtain a high-precision detection result in as short a time as possible has important significance for intelligent security, automatic driving and the like.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet.
The invention is realized in this way, a near-infrared pedestrian monitoring method based on multitask EfficientDet, the near-infrared pedestrian monitoring method based on multitask EfficientDet includes:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set, and using the pedestrian activity area distribution to train segmentation branches of a model;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, sharing bottom layer characteristics through target detection and semantic segmentation, improving generalization capability of a model through multi-task learning, and further post-processing detection results through obtained prediction results of the pedestrian activity areas to improve performance;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area, reducing FP (false positive sample) in the prediction result, and obtaining the final pedestrian detection result and the pedestrian activity area result.
Further, the near-infrared pedestrian monitoring method based on multitask EfficientDet specifically comprises the following steps:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;
inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result.
Further, the first step comprises:
a) the used image training set is a single-channel near-infrared image data set in a monitoring scene;
b) traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.
Further, the semantic segmentation branch of the second step includes: adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112 and 320), performing 2-time upsampling and P4 channel splicing on P5 after passing through a cavity space pyramid pooling module, performing attention mechanism through a convolution module, then performing convolution layer on the channel, reducing the number of the channels to 128 by a BN layer and an activation layer, and obtaining a characteristic map O1 through attention mechanism of the convolution module; performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4; model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.
Further, the third step includes:
a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;
b) the method comprises the following steps of adopting a void space pyramid pooling module and a convolution module in a segmentation branch to pay attention to a mechanism attention model, improving the segmentation performance of the model, and adopting different learning rates in the segmentation branch and a detection branch to reduce overfitting of the model;
c) using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:
wherein X is a prediction mask, Y is a marking mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y;
b) using the Focal local as a Loss function of the classification, the definition of the Focal local is:
wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05;
c) SmoothL1 Loss was used as a Loss function for bounding box regression:
wherein y is a label and y' is a model prediction result;
d) setting training parameters:
learning Rate: 3 e-4;
learning rate reduction mode: cosine down;
batch Size: setting the batch size to 16;
input image size: input image size 768 × 512;
optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;
because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.
further, the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:
the output result processing process comprises the following steps:
D=S(D');
then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:
taking the predicted activity area of the k image segmentation branches as prior information, and filtering out a target point of a frame center point which is not in the predicted activity area in a final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where sorce (x, y) is the confidence score of the target at point (x, y).
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
The invention also aims to provide an information data processing terminal, which is used for realizing the near-infrared pedestrian monitoring method based on the multitask EfficientDet.
Another object of the present invention is to provide a system for a near-infrared pedestrian monitoring method based on multitask EfficientDet, which implements the method for near-infrared pedestrian monitoring based on multitask EfficientDet, wherein the system for the method for near-infrared pedestrian monitoring based on multitask EfficientDet comprises:
the pedestrian activity area distribution obtaining module is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the shared bottom layer feature segmentation module is used for adopting EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
By combining all the technical schemes, the invention has the advantages and positive effects that: the method is mainly applied to the pedestrian detection field of near-infrared images, not only solves the problem of unreasonable semantic errors in the prediction result, maintains higher AP index, reduces the false positive rate of the prediction result, but also solves the problem of multi-task learning and sharing of a feature extraction network, and effectively improves the generalization capability of the model; the invention converts the target detection problem into a two-classification semantic segmentation problem and a normal target detection problem, and provides a pixel-level semantic segmentation pedestrian activity region prediction and pedestrian position detection network: the method adopts a void space pyramid pooling module with an increased convolution module attention mechanism to combine upsampling as a pedestrian activity area prediction branch, improves the generalization capability of the model through multi-task learning, and further improves the performance of pedestrian target detection through filtering false positive samples in a detection result through a pedestrian activity area.
The invention jointly acts semantic segmentation and target detection on the pedestrian target detection process, considers that some false positive samples which do not accord with semantic information exist in the pedestrian detection process, such as the prediction that some pedestrian targets may appear on trees or other areas which are contrary to common knowledge, can filter the false positive samples of the part through the prediction of the pedestrian areas, and improves the generalization capability of the model through multi-task learning, thereby obtaining more accurate pedestrian detection results.
According to the method, a semantic segmentation network and a target detection network are integrated, a feature extraction network is shared, a single model simultaneously completes two tasks of semantic segmentation and target detection, training cost is greatly reduced, the generalization capability of the model is greatly improved, overfitting detection data of the model is reduced, and the method provided by the invention can obtain higher detection performance and has better generalization capability through multi-task learning and post-processing of detection results by using pedestrian activity areas.
Compared with the prior art, the method has the following advantages:
(1) according to the invention, pedestrian activity region segmentation branches are introduced into the target detection model, and the generalization capability of the model is improved through multi-task learning;
(2) according to the invention, the prediction result of the pedestrian activity area is obtained through the additional segmentation branches, and the post-processing is carried out on the pedestrian target detection result through the pedestrian activity area, so that the detection performance is improved;
(3) in a fixed monitoring scene, the pedestrian activity area is pre-calculated and used for subsequent processing, so that the time overhead of activity area prediction each time is reduced, and the frame rate is always kept in a real-time monitoring state.
TABLE 1 comparison of the present invention with other target detection methods
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of a near-infrared pedestrian monitoring method based on multitask EfficientDet according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a near-infrared pedestrian monitoring method system based on multitask EfficientDet according to an embodiment of the present invention;
in fig. 2: 1. a pedestrian activity area distribution obtaining module; 2. a segmentation and detection module sharing bottom layer features; 3. and the pedestrian detection result and pedestrian activity area result output module.
Fig. 3 is a flowchart of an implementation of the near-infrared pedestrian monitoring method based on multitask EfficientDet according to the embodiment of the present invention.
Fig. 4 is a diagram of a multitasking EfficientDet-D0 network structure according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet, and the invention is described in detail with reference to the attached drawings.
As shown in fig. 1, the near-infrared pedestrian monitoring method based on multitask EfficientDet provided by the invention comprises the following steps:
s101: acquiring the distribution of the activity areas of descending people in a fixed scene by using a training data set, inputting a near-infrared image, performing channel replication on the input image, and dividing the training data set and a test data set by adopting a random division mode;
s102: training the improved multitask EfficientDet-D0 model, and giving a lower learning rate aiming at the segmentation branches to prevent overfitting;
s103: and inputting k images to predict to obtain a pedestrian activity area prediction result, inputting a single near-infrared image to test, and obtaining a final detection result according to a pedestrian detection result and a pedestrian activity area prediction result.
A person skilled in the art can also implement the method for monitoring near-infrared pedestrians based on multitask EfficientDet provided by the present invention by using other steps, and the method for monitoring near-infrared pedestrians based on multitask EfficientDet provided by the present invention shown in fig. 1 is only a specific example.
As shown in fig. 2, the near-infrared pedestrian monitoring method system based on multitask EfficientDet provided by the invention comprises:
the pedestrian activity area distribution obtaining module 1 is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the shared bottom layer feature segmentation module 2 is used for adopting EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module 3 is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
According to the method, pedestrian activity area distribution under different scenes is obtained by utilizing a near-infrared image pedestrian detection data set; the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and a feature attention module, sharing bottom layer features by utilizing target detection and semantic segmentation, and improving model generalization capability; and based on the multitask pedestrian detection model, post-processing the pedestrian target detection result through the predicted pedestrian activity region to obtain a final pedestrian detection result and a pedestrian activity region result. The method effectively improves and reduces the prediction of false positive samples by combining the segmentation branches and the target detection branches, has higher detection performance on the pedestrian target in the near-infrared image, can reduce FP samples, and has important significance on pedestrian activity monitoring in night monitoring scenes.
As shown in fig. 3, the near-infrared pedestrian monitoring method based on the near-infrared image and based on the multitask EfficientDet provided by the embodiment of the present invention specifically includes the following steps:
firstly, a used image training set is a single-channel near-infrared image data set in a monitoring scene; traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.
Secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
2a) the design of the segmentation branch is as follows, and the segmentation performance is improved mainly by using a void space pyramid pooling module and a convolution module attention mechanism module:
the method comprises the steps of constructing semantic segmentation branches by adopting P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, enabling the number of corresponding characteristic channels to be (24, 40, 112 and 320), conducting 2-time upsampling and P4 channel splicing on P5 after passing through a hollow space pyramid pooling module, conducting attention mechanism through a convolution module, reducing the number of channels to 128 through a BN layer and an activation layer, and obtaining a characteristic image O1 through attention mechanism of the convolution module. Performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4. Model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.
Training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;
3a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;
3b) the cavity space pyramid pooling module and the convolution module are adopted in the segmentation branch to pay attention to the attention mechanism model, so that the segmentation performance of the model is improved, and different learning rates are adopted in the segmentation branch and the detection branch to reduce overfitting of the model.
3c) Using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:
wherein X is a prediction mask, Y is a labeling mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y.
3d) The following is the definition of the Focal local as a Loss function of the classification
Wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05.
3e) SmoothL1 Loss as a Loss function for bounding box regression
Wherein y is a label and y' is a model prediction result.
3f) Setting training parameters:
learning Rate: 3 e-4;
learning rate reduction mode: cosine down;
batch Size: setting the batch size to 16;
input image size: input image size 768 × 512;
optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;
because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.
inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result. The specific process is as follows:
firstly, predicting pedestrian activity regions of k (k is 10) images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:
the output result processing process comprises the following steps:
D=S(D');
then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:
the activity area predicted by the k image segmentation branches is used as prior information, and a target point of the frame center point which is not in the predicted activity area is filtered out in a final detection result, so that the aim of reducing false positive samples is fulfilled. The treatment process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where sorce (x, y) is the confidence score of the target at point (x, y).
The technical effects of the present invention will be described in detail with reference to simulations.
1. Simulation conditions
The invention uses Pycharm software to complete the simulation experiment of the invention on a PC with a CPU of Intel (R) core (TM) i7-7820X, CPU3.60GHz, RAM 32.00GB, 2X 2080Ti and ubuntu18.0 operating system.
2. Content of simulation experiment
This experiment was trained and tested using a self-collected data set of 10 scenes, with a raw image resolution of 2560 x 1440, scaled to 768 x 512. The scene A, B is taken as a test set, other scenes are taken as a training set, the test set comprises 3380 images, and the training set comprises 13072 images.
3. Simulation experiment results and analysis
Table 2 shows the comparison of the method of the present invention with Cascade RCNN original version, EfficientDet-D0 original version and EfficientDet-D0 method after Anchor clustering
TABLE 1 comparison of the present invention with other target detection methods
It can be seen from table 1 that the present invention maintains a high frame rate detection and a high detection accuracy through a multi-task learning mode, and improves the detection performance by 1.5 points when compared with an original EfficientDet-D0+ Anchor clustering model, and compared with other methods, the method of the present invention reduces the number of false positive samples and restricts the existence regions of the positive samples, thereby improving the detection performance. In conclusion, the invention successfully utilizes the multi-task detection model to improve the detection performance, reduces the number of false positive samples, obtains higher detection precision, and has important significance for researching intelligent security, automatic driving and the like.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A near-infrared pedestrian monitoring method based on multitask EfficientDet is characterized by comprising the following steps:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set, and using the pedestrian activity area distribution to train segmentation branches;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer characteristics through target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
2. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 1, wherein the multitask EfficientDet-based near-infrared pedestrian monitoring method specifically comprises the following steps:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;
inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result.
3. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the first step comprises:
a) the used image training set is a single-channel near-infrared image data set in a monitoring scene;
b) traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.
4. The multitask EfficientDet-based near-infrared pedestrian monitoring method as claimed in claim 2, wherein the semantic division branch of the second step comprises: adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112 and 320), performing 2-time upsampling and P4 channel splicing on P5 after passing through a cavity space pyramid pooling module, performing attention mechanism through a convolution module, then performing convolution layer on the channel, reducing the number of the channels to 128 by a BN layer and an activation layer, and obtaining a characteristic map O1 through attention mechanism of the convolution module; performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4; model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.
5. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the third step comprises:
a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;
b) the method comprises the following steps of adopting a void space pyramid pooling module and a convolution module in a segmentation branch to pay attention to a mechanism attention model, improving the segmentation performance of the model, and adopting different learning rates in the segmentation branch and a detection branch to reduce overfitting of the model;
c) using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:
wherein X is a prediction mask, Y is a marking mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y;
b) using the Focal local as a Loss function of the classification, the definition of the Focal local is:
wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05;
c) SmoothL1 Loss was used as a Loss function for bounding box regression:
wherein y is a label and y' is a model prediction result;
d) setting training parameters:
learning Rate: 3 e-4;
learning rate reduction mode: cosine down;
batch Size: setting the batch size to 16;
input image size: input image size 768 × 512;
optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;
because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.
6. the multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:
the output result processing process comprises the following steps:
D=S(D');
then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:
taking the predicted activity area of the k image segmentation branches as prior information, and filtering out a target point of a frame center point which is not in the predicted activity area in a final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where sorce (x, y) is the confidence score of the target at point (x, y).
7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
9. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the near-infrared pedestrian monitoring method based on the multitask EfficientDet according to any one of claims 1-6.
10. A near-infrared pedestrian monitoring system based on the multitask EfficientDet for implementing the near-infrared pedestrian monitoring method based on the multitask EfficientDet according to any one of claims 1-6, wherein the near-infrared pedestrian monitoring system based on the multitask EfficientDet comprises:
the pedestrian activity area distribution obtaining module is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the segmentation and detection module is used for adopting an EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing the bottom layer characteristics through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011427301.4A CN112633086B (en) | 2020-12-09 | 2020-12-09 | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011427301.4A CN112633086B (en) | 2020-12-09 | 2020-12-09 | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112633086A true CN112633086A (en) | 2021-04-09 |
CN112633086B CN112633086B (en) | 2024-01-26 |
Family
ID=75308801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011427301.4A Active CN112633086B (en) | 2020-12-09 | 2020-12-09 | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112633086B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486898A (en) * | 2021-07-08 | 2021-10-08 | 西安电子科技大学 | Radar signal RD image interference identification method and system based on improved ShuffleNet |
CN115187783A (en) * | 2022-09-09 | 2022-10-14 | 之江实验室 | Multi-task hybrid supervision medical image segmentation method and system based on federal learning |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160328850A1 (en) * | 2015-05-08 | 2016-11-10 | Vida Diagnostics, Inc. | Systems and methods for quantifying regional fissure features |
CN109063559A (en) * | 2018-06-28 | 2018-12-21 | 东南大学 | A kind of pedestrian detection method returned based on improvement region |
CN109584248A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object instance dividing method based on Fusion Features and dense connection network |
WO2019144575A1 (en) * | 2018-01-24 | 2019-08-01 | 中山大学 | Fast pedestrian detection method and device |
CN110633632A (en) * | 2019-08-06 | 2019-12-31 | 厦门大学 | Weak supervision combined target detection and semantic segmentation method based on loop guidance |
CN110941995A (en) * | 2019-11-01 | 2020-03-31 | 中山大学 | Real-time target detection and semantic segmentation multi-task learning method based on lightweight network |
CN110969124A (en) * | 2019-12-02 | 2020-04-07 | 重庆邮电大学 | Two-dimensional human body posture estimation method and system based on lightweight multi-branch network |
CN111027493A (en) * | 2019-12-13 | 2020-04-17 | 电子科技大学 | Pedestrian detection method based on deep learning multi-network soft fusion |
CN111507381A (en) * | 2020-03-31 | 2020-08-07 | 上海商汤智能科技有限公司 | Image recognition method and related device and equipment |
CN111652213A (en) * | 2020-05-24 | 2020-09-11 | 浙江理工大学 | Ship water gauge reading identification method based on deep learning |
CN111798425A (en) * | 2020-06-30 | 2020-10-20 | 天津大学 | Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning |
CN111860316A (en) * | 2020-07-20 | 2020-10-30 | 上海汽车集团股份有限公司 | Driving behavior recognition method and device and storage medium |
-
2020
- 2020-12-09 CN CN202011427301.4A patent/CN112633086B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160328850A1 (en) * | 2015-05-08 | 2016-11-10 | Vida Diagnostics, Inc. | Systems and methods for quantifying regional fissure features |
WO2019144575A1 (en) * | 2018-01-24 | 2019-08-01 | 中山大学 | Fast pedestrian detection method and device |
CN109063559A (en) * | 2018-06-28 | 2018-12-21 | 东南大学 | A kind of pedestrian detection method returned based on improvement region |
CN109584248A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object instance dividing method based on Fusion Features and dense connection network |
CN110633632A (en) * | 2019-08-06 | 2019-12-31 | 厦门大学 | Weak supervision combined target detection and semantic segmentation method based on loop guidance |
CN110941995A (en) * | 2019-11-01 | 2020-03-31 | 中山大学 | Real-time target detection and semantic segmentation multi-task learning method based on lightweight network |
CN110969124A (en) * | 2019-12-02 | 2020-04-07 | 重庆邮电大学 | Two-dimensional human body posture estimation method and system based on lightweight multi-branch network |
CN111027493A (en) * | 2019-12-13 | 2020-04-17 | 电子科技大学 | Pedestrian detection method based on deep learning multi-network soft fusion |
CN111507381A (en) * | 2020-03-31 | 2020-08-07 | 上海商汤智能科技有限公司 | Image recognition method and related device and equipment |
CN111652213A (en) * | 2020-05-24 | 2020-09-11 | 浙江理工大学 | Ship water gauge reading identification method based on deep learning |
CN111798425A (en) * | 2020-06-30 | 2020-10-20 | 天津大学 | Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning |
CN111860316A (en) * | 2020-07-20 | 2020-10-30 | 上海汽车集团股份有限公司 | Driving behavior recognition method and device and storage medium |
Non-Patent Citations (2)
Title |
---|
YUNCHAO WEI 等,: "TS2C:Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection", 《ECCV 2018》 * |
胡敏 等,: "利用边界校正网络提取建筑物轮廓", 《遥感信息》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486898A (en) * | 2021-07-08 | 2021-10-08 | 西安电子科技大学 | Radar signal RD image interference identification method and system based on improved ShuffleNet |
CN113486898B (en) * | 2021-07-08 | 2024-05-31 | 西安电子科技大学 | Radar signal RD image interference identification method and system based on improvement ShuffleNet |
CN115187783A (en) * | 2022-09-09 | 2022-10-14 | 之江实验室 | Multi-task hybrid supervision medical image segmentation method and system based on federal learning |
Also Published As
Publication number | Publication date |
---|---|
CN112633086B (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109447034B (en) | Traffic sign detection method in automatic driving based on YOLOv3 network | |
Tian et al. | A dual neural network for object detection in UAV images | |
Zhang et al. | Pedestrian detection method based on Faster R-CNN | |
Guo et al. | Fast object detection based on selective visual attention | |
CN112633086A (en) | Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet | |
CN113033321A (en) | Training method of target pedestrian attribute identification model and pedestrian attribute identification method | |
CN107315990A (en) | A kind of pedestrian detection algorithm based on XCS LBP features and cascade AKSVM | |
Li et al. | Research on a product quality monitoring method based on multi scale PP-YOLO | |
Jiang et al. | A fast and high-performance object proposal method for vision sensors: Application to object detection | |
CN110991374B (en) | Fingerprint singular point detection method based on RCNN | |
CN115984238A (en) | Power grid insulator defect detection method and system based on deep neural network | |
Zhang et al. | A small target detection method based on deep learning with considerate feature and effectively expanded sample size | |
Wang et al. | Small object detection based on modified FSSD and model compression | |
CN115761834A (en) | Multi-task mixed model for face recognition and face recognition method | |
Panigrahi et al. | MS-ML-SNYOLOv3: A robust lightweight modification of SqueezeNet based YOLOv3 for pedestrian detection | |
CN109284752A (en) | A kind of rapid detection method of vehicle | |
Wang et al. | Deep learning-based human activity analysis for aerial images | |
Le et al. | Smart Elevator Cotrol System Based on Human Hand Gesture Recognition | |
Youssef et al. | Real-time egyptian license plate detection and recognition using yolo | |
CN113343903B (en) | License plate recognition method and system in natural scene | |
Wang et al. | Research on Road Object Detection Model Based on YOLOv4 of Autonomous Vehicle | |
CN115273131A (en) | Animal identification method based on dual-channel feature fusion | |
CN108830166B (en) | Real-time bus passenger flow volume statistical method | |
Min et al. | Vehicle detection method based on deep learning and multi-layer feature fusion | |
Xu et al. | Fast vehicle detection based on feature and real-time prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |