CN112633086A - Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet - Google Patents

Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet Download PDF

Info

Publication number
CN112633086A
CN112633086A CN202011427301.4A CN202011427301A CN112633086A CN 112633086 A CN112633086 A CN 112633086A CN 202011427301 A CN202011427301 A CN 202011427301A CN 112633086 A CN112633086 A CN 112633086A
Authority
CN
China
Prior art keywords
pedestrian
efficientdet
detection
multitask
infrared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011427301.4A
Other languages
Chinese (zh)
Other versions
CN112633086B (en
Inventor
张建龙
何建辉
李桥
王斌
郭鑫宇
刘池帅
崔梦莹
时国强
余鑫城
方光祖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202011427301.4A priority Critical patent/CN112633086B/en
Publication of CN112633086A publication Critical patent/CN112633086A/en
Application granted granted Critical
Publication of CN112633086B publication Critical patent/CN112633086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention belongs to the technical field of near-infrared image pedestrian detection, and discloses a near-infrared pedestrian monitoring method, a system, a medium and equipment based on multitask EfficientDet, wherein pedestrian activity area distribution under different scenes is obtained by utilizing a near-infrared image pedestrian detection data set; adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and enhancing the segmentation performance by the attention of a void space pyramid pooling and convolution module; and based on the multitask pedestrian detection model, post-processing the pedestrian target detection result through the predicted pedestrian activity region to obtain a final pedestrian detection result and a pedestrian activity region result. The method has higher detection performance, can reduce false positive samples in the result, and has important significance for monitoring the pedestrian activity in the night monitoring scene.

Description

Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet
Technical Field
The invention belongs to the technical field of near-infrared image pedestrian detection, and particularly relates to a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet.
Background
The study of pedestrian detection began in the nineties of the twentieth century, from early traditional manual feature-based methods, such as HOG feature-based methods, Harr wavelet feature-based methods, and edgelet feature-based methods, to current deep learning feature extraction methods, such as those using the ResNet model, VGG model, and other convolutional neural network-based models. With the development of science and technology, the pedestrian detection technology is rapidly developed. Pedestrian detection is one of the branches of the computer vision field with wide application, and plays an important role in the fields of intelligent security, automatic driving, robots and the like.
The pedestrian detection technology of the near-infrared image is a technology which is beneficial to obtaining the position of a pedestrian by the near-infrared image, and is widely applied to intelligent security and automatic driving technologies. Under the night scene, the visible light imaging can not obtain the high-quality imaging effect, and compared with the traditional visible light detection, the infrared pedestrian detection has the following advantages: (1) under the condition of weak light, the imaging effect is still excellent, and the pedestrian features can be obtained from the background easily. (2) Infrared imaging can reduce background color interference. By virtue of these advantages, infrared pedestrian detection has been highly distinctive in many fields in recent years. The method has unusual performance in the fields of intelligent video monitoring, vehicle-mounted auxiliary driving, military early warning and the like. However, compared with visible light imaging, infrared imaging also has some disadvantages, such as less abundant texture and contour features than visible light imaging, and single color information, which adds a little difficulty to recognition and detection. But the method has very important significance in the aspects of concealed yoga and infrared pedestrian detection and also has serious challenge. Therefore, aiming at the existing problems, the design of an efficient and robust infrared pedestrian detection algorithm has a profound significance.
The pedestrian target detection of the near-infrared image is essentially a classification and regression problem, namely frame prediction is regarded as a regression task, and pedestrians and backgrounds are separated and regarded as a classification task. In the past decades, scholars at home and abroad make great contribution to the research of various pedestrian detection methods, the research methods can be divided into two categories by means of feature extraction, and the method based on manual features is gradually developed into the method based on deep learning features.
The early pedestrian detection method mainly performs feature extraction by manually designing a feature extraction operator, wherein the most typical feature is a Histogram of gradient directions (HOG) feature proposed by Dalal in 2005, which is used as a landmark mark on a pedestrian detection history, lays a foundation for excellent feature extraction, and simultaneously obtains a better detection effect by combining a learning method of a Support Vector Machine (SVM). Other methods based on manual features include methods based on Harr wavelet features and methods based on edgelet features.
The traditional infrared pedestrian image detection method based on manual feature extraction mainly has the following defects: (1) the manual feature design is difficult, the effectiveness of the artificial feature cannot be guaranteed (2) the hierarchy of the artificial feature is shallow, and the detection of pedestrians under a complex background is difficult to deal with, so that the detection performance is not high.
With the rapid development of deep learning theory and technology, there are researchers attempting to solve the pedestrian detection problem on infrared images using deep learning. Xingguo Zhang et al constructs a network model for visible light spectrum and infrared image by analyzing the target detection performance of R-CNN/Faster RCNN on visible light image, under the condition of using the same network model as fast RCNN, the omission ratio on VS data set and NIR data set is far lower than that of RPN _ BF and HOG + SVM, Wang Dongwei et al provides an improved YOLOv3 infrared video image pedestrian detection algorithm aiming at the problems of low accuracy and high omission ratio when YOLOv3 detects infrared video image pedestrians, and the result shows that the accuracy of the improved YOLOv3 algorithm in infrared pedestrian detection is as high as 90.63%, and is obviously superior to the fast-RCNN and YOLOv3 algorithms, and the improved network can detect more targets at the same time, and reduces the omission ratio.
Although the pedestrian detection problem of deep learning near infrared images obtains higher detection precision, the method has the following defects: (1) due to the lack of understanding of the overall semantics of the image and interference factors such as noise, error detection which is contrary to common knowledge information occurs in the detection result; (2) the effective features of the pedestrians may not be extracted due to the fact that the image information is single.
In summary, the problems of the prior art are as follows: due to the lack of understanding of the overall semantic information of the end-to-end convolutional neural network, a lot of misdetections against common sense can occur. Because the color of the near-infrared image is single and the outline is fuzzy, how to effectively extract the characteristics of the pedestrian target from the near-infrared image is also one of the main problems existing at present.
The difficulty of solving the technical problems is as follows: how to reduce semantic errors in detection results and how to effectively extract pedestrian features from near-infrared images without reducing detection performance.
The significance of solving the technical problems is as follows: the pedestrian detection algorithm of the near-infrared image is widely applied to the fields of intelligent security, automatic driving and robots, and how to obtain a high-precision detection result in as short a time as possible has important significance for intelligent security, automatic driving and the like.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet.
The invention is realized in this way, a near-infrared pedestrian monitoring method based on multitask EfficientDet, the near-infrared pedestrian monitoring method based on multitask EfficientDet includes:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set, and using the pedestrian activity area distribution to train segmentation branches of a model;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, sharing bottom layer characteristics through target detection and semantic segmentation, improving generalization capability of a model through multi-task learning, and further post-processing detection results through obtained prediction results of the pedestrian activity areas to improve performance;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area, reducing FP (false positive sample) in the prediction result, and obtaining the final pedestrian detection result and the pedestrian activity area result.
Further, the near-infrared pedestrian monitoring method based on multitask EfficientDet specifically comprises the following steps:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;
inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result.
Further, the first step comprises:
a) the used image training set is a single-channel near-infrared image data set in a monitoring scene;
b) traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.
Further, the semantic segmentation branch of the second step includes: adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112 and 320), performing 2-time upsampling and P4 channel splicing on P5 after passing through a cavity space pyramid pooling module, performing attention mechanism through a convolution module, then performing convolution layer on the channel, reducing the number of the channels to 128 by a BN layer and an activation layer, and obtaining a characteristic map O1 through attention mechanism of the convolution module; performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4; model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.
Further, the third step includes:
a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;
b) the method comprises the following steps of adopting a void space pyramid pooling module and a convolution module in a segmentation branch to pay attention to a mechanism attention model, improving the segmentation performance of the model, and adopting different learning rates in the segmentation branch and a detection branch to reduce overfitting of the model;
c) using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:
Figure BDA0002825480600000051
wherein X is a prediction mask, Y is a marking mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y;
b) using the Focal local as a Loss function of the classification, the definition of the Focal local is:
Figure BDA0002825480600000052
wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05;
c) SmoothL1 Loss was used as a Loss function for bounding box regression:
Figure BDA0002825480600000053
wherein y is a label and y' is a model prediction result;
d) setting training parameters:
learning Rate: 3 e-4;
learning rate reduction mode: cosine down;
batch Size: setting the batch size to 16;
input image size: input image size 768 × 512;
optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;
because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.
further, the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:
Figure BDA0002825480600000061
the output result processing process comprises the following steps:
D=S(D');
then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:
Figure BDA0002825480600000062
taking the predicted activity area of the k image segmentation branches as prior information, and filtering out a target point of a frame center point which is not in the predicted activity area in a final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where sorce (x, y) is the confidence score of the target at point (x, y).
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
The invention also aims to provide an information data processing terminal, which is used for realizing the near-infrared pedestrian monitoring method based on the multitask EfficientDet.
Another object of the present invention is to provide a system for a near-infrared pedestrian monitoring method based on multitask EfficientDet, which implements the method for near-infrared pedestrian monitoring based on multitask EfficientDet, wherein the system for the method for near-infrared pedestrian monitoring based on multitask EfficientDet comprises:
the pedestrian activity area distribution obtaining module is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the shared bottom layer feature segmentation module is used for adopting EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
By combining all the technical schemes, the invention has the advantages and positive effects that: the method is mainly applied to the pedestrian detection field of near-infrared images, not only solves the problem of unreasonable semantic errors in the prediction result, maintains higher AP index, reduces the false positive rate of the prediction result, but also solves the problem of multi-task learning and sharing of a feature extraction network, and effectively improves the generalization capability of the model; the invention converts the target detection problem into a two-classification semantic segmentation problem and a normal target detection problem, and provides a pixel-level semantic segmentation pedestrian activity region prediction and pedestrian position detection network: the method adopts a void space pyramid pooling module with an increased convolution module attention mechanism to combine upsampling as a pedestrian activity area prediction branch, improves the generalization capability of the model through multi-task learning, and further improves the performance of pedestrian target detection through filtering false positive samples in a detection result through a pedestrian activity area.
The invention jointly acts semantic segmentation and target detection on the pedestrian target detection process, considers that some false positive samples which do not accord with semantic information exist in the pedestrian detection process, such as the prediction that some pedestrian targets may appear on trees or other areas which are contrary to common knowledge, can filter the false positive samples of the part through the prediction of the pedestrian areas, and improves the generalization capability of the model through multi-task learning, thereby obtaining more accurate pedestrian detection results.
According to the method, a semantic segmentation network and a target detection network are integrated, a feature extraction network is shared, a single model simultaneously completes two tasks of semantic segmentation and target detection, training cost is greatly reduced, the generalization capability of the model is greatly improved, overfitting detection data of the model is reduced, and the method provided by the invention can obtain higher detection performance and has better generalization capability through multi-task learning and post-processing of detection results by using pedestrian activity areas.
Compared with the prior art, the method has the following advantages:
(1) according to the invention, pedestrian activity region segmentation branches are introduced into the target detection model, and the generalization capability of the model is improved through multi-task learning;
(2) according to the invention, the prediction result of the pedestrian activity area is obtained through the additional segmentation branches, and the post-processing is carried out on the pedestrian target detection result through the pedestrian activity area, so that the detection performance is improved;
(3) in a fixed monitoring scene, the pedestrian activity area is pre-calculated and used for subsequent processing, so that the time overhead of activity area prediction each time is reduced, and the frame rate is always kept in a real-time monitoring state.
TABLE 1 comparison of the present invention with other target detection methods
Figure BDA0002825480600000091
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of a near-infrared pedestrian monitoring method based on multitask EfficientDet according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a near-infrared pedestrian monitoring method system based on multitask EfficientDet according to an embodiment of the present invention;
in fig. 2: 1. a pedestrian activity area distribution obtaining module; 2. a segmentation and detection module sharing bottom layer features; 3. and the pedestrian detection result and pedestrian activity area result output module.
Fig. 3 is a flowchart of an implementation of the near-infrared pedestrian monitoring method based on multitask EfficientDet according to the embodiment of the present invention.
Fig. 4 is a diagram of a multitasking EfficientDet-D0 network structure according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet, and the invention is described in detail with reference to the attached drawings.
As shown in fig. 1, the near-infrared pedestrian monitoring method based on multitask EfficientDet provided by the invention comprises the following steps:
s101: acquiring the distribution of the activity areas of descending people in a fixed scene by using a training data set, inputting a near-infrared image, performing channel replication on the input image, and dividing the training data set and a test data set by adopting a random division mode;
s102: training the improved multitask EfficientDet-D0 model, and giving a lower learning rate aiming at the segmentation branches to prevent overfitting;
s103: and inputting k images to predict to obtain a pedestrian activity area prediction result, inputting a single near-infrared image to test, and obtaining a final detection result according to a pedestrian detection result and a pedestrian activity area prediction result.
A person skilled in the art can also implement the method for monitoring near-infrared pedestrians based on multitask EfficientDet provided by the present invention by using other steps, and the method for monitoring near-infrared pedestrians based on multitask EfficientDet provided by the present invention shown in fig. 1 is only a specific example.
As shown in fig. 2, the near-infrared pedestrian monitoring method system based on multitask EfficientDet provided by the invention comprises:
the pedestrian activity area distribution obtaining module 1 is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the shared bottom layer feature segmentation module 2 is used for adopting EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module 3 is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
According to the method, pedestrian activity area distribution under different scenes is obtained by utilizing a near-infrared image pedestrian detection data set; the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and a feature attention module, sharing bottom layer features by utilizing target detection and semantic segmentation, and improving model generalization capability; and based on the multitask pedestrian detection model, post-processing the pedestrian target detection result through the predicted pedestrian activity region to obtain a final pedestrian detection result and a pedestrian activity region result. The method effectively improves and reduces the prediction of false positive samples by combining the segmentation branches and the target detection branches, has higher detection performance on the pedestrian target in the near-infrared image, can reduce FP samples, and has important significance on pedestrian activity monitoring in night monitoring scenes.
As shown in fig. 3, the near-infrared pedestrian monitoring method based on the near-infrared image and based on the multitask EfficientDet provided by the embodiment of the present invention specifically includes the following steps:
firstly, a used image training set is a single-channel near-infrared image data set in a monitoring scene; traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.
Secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
2a) the design of the segmentation branch is as follows, and the segmentation performance is improved mainly by using a void space pyramid pooling module and a convolution module attention mechanism module:
the method comprises the steps of constructing semantic segmentation branches by adopting P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, enabling the number of corresponding characteristic channels to be (24, 40, 112 and 320), conducting 2-time upsampling and P4 channel splicing on P5 after passing through a hollow space pyramid pooling module, conducting attention mechanism through a convolution module, reducing the number of channels to 128 through a BN layer and an activation layer, and obtaining a characteristic image O1 through attention mechanism of the convolution module. Performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4. Model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.
Training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;
3a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;
3b) the cavity space pyramid pooling module and the convolution module are adopted in the segmentation branch to pay attention to the attention mechanism model, so that the segmentation performance of the model is improved, and different learning rates are adopted in the segmentation branch and the detection branch to reduce overfitting of the model.
3c) Using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:
Figure BDA0002825480600000121
wherein X is a prediction mask, Y is a labeling mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y.
3d) The following is the definition of the Focal local as a Loss function of the classification
Figure BDA0002825480600000131
Wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05.
3e) SmoothL1 Loss as a Loss function for bounding box regression
Figure BDA0002825480600000132
Wherein y is a label and y' is a model prediction result.
3f) Setting training parameters:
learning Rate: 3 e-4;
learning rate reduction mode: cosine down;
batch Size: setting the batch size to 16;
input image size: input image size 768 × 512;
optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;
because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.
inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result. The specific process is as follows:
firstly, predicting pedestrian activity regions of k (k is 10) images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:
Figure BDA0002825480600000133
the output result processing process comprises the following steps:
D=S(D');
then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:
Figure BDA0002825480600000141
the activity area predicted by the k image segmentation branches is used as prior information, and a target point of the frame center point which is not in the predicted activity area is filtered out in a final detection result, so that the aim of reducing false positive samples is fulfilled. The treatment process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where sorce (x, y) is the confidence score of the target at point (x, y).
The technical effects of the present invention will be described in detail with reference to simulations.
1. Simulation conditions
The invention uses Pycharm software to complete the simulation experiment of the invention on a PC with a CPU of Intel (R) core (TM) i7-7820X, CPU3.60GHz, RAM 32.00GB, 2X 2080Ti and ubuntu18.0 operating system.
2. Content of simulation experiment
This experiment was trained and tested using a self-collected data set of 10 scenes, with a raw image resolution of 2560 x 1440, scaled to 768 x 512. The scene A, B is taken as a test set, other scenes are taken as a training set, the test set comprises 3380 images, and the training set comprises 13072 images.
3. Simulation experiment results and analysis
Table 2 shows the comparison of the method of the present invention with Cascade RCNN original version, EfficientDet-D0 original version and EfficientDet-D0 method after Anchor clustering
TABLE 1 comparison of the present invention with other target detection methods
Figure BDA0002825480600000142
Figure BDA0002825480600000151
It can be seen from table 1 that the present invention maintains a high frame rate detection and a high detection accuracy through a multi-task learning mode, and improves the detection performance by 1.5 points when compared with an original EfficientDet-D0+ Anchor clustering model, and compared with other methods, the method of the present invention reduces the number of false positive samples and restricts the existence regions of the positive samples, thereby improving the detection performance. In conclusion, the invention successfully utilizes the multi-task detection model to improve the detection performance, reduces the number of false positive samples, obtains higher detection precision, and has important significance for researching intelligent security, automatic driving and the like.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A near-infrared pedestrian monitoring method based on multitask EfficientDet is characterized by comprising the following steps:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set, and using the pedestrian activity area distribution to train segmentation branches;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer characteristics through target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
2. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 1, wherein the multitask EfficientDet-based near-infrared pedestrian monitoring method specifically comprises the following steps:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;
training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;
inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result.
3. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the first step comprises:
a) the used image training set is a single-channel near-infrared image data set in a monitoring scene;
b) traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.
4. The multitask EfficientDet-based near-infrared pedestrian monitoring method as claimed in claim 2, wherein the semantic division branch of the second step comprises: adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112 and 320), performing 2-time upsampling and P4 channel splicing on P5 after passing through a cavity space pyramid pooling module, performing attention mechanism through a convolution module, then performing convolution layer on the channel, reducing the number of the channels to 128 by a BN layer and an activation layer, and obtaining a characteristic map O1 through attention mechanism of the convolution module; performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4; model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.
5. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the third step comprises:
a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;
b) the method comprises the following steps of adopting a void space pyramid pooling module and a convolution module in a segmentation branch to pay attention to a mechanism attention model, improving the segmentation performance of the model, and adopting different learning rates in the segmentation branch and a detection branch to reduce overfitting of the model;
c) using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:
Figure FDA0002825480590000021
wherein X is a prediction mask, Y is a marking mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y;
b) using the Focal local as a Loss function of the classification, the definition of the Focal local is:
Figure FDA0002825480590000031
wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05;
c) SmoothL1 Loss was used as a Loss function for bounding box regression:
Figure FDA0002825480590000032
wherein y is a label and y' is a model prediction result;
d) setting training parameters:
learning Rate: 3 e-4;
learning rate reduction mode: cosine down;
batch Size: setting the batch size to 16;
input image size: input image size 768 × 512;
optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;
because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.
6. the multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:
Figure FDA0002825480590000033
the output result processing process comprises the following steps:
D=S(D');
then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:
Figure FDA0002825480590000041
taking the predicted activity area of the k image segmentation branches as prior information, and filtering out a target point of a frame center point which is not in the predicted activity area in a final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing process is as follows:
sorce(x,y)=D(x,y)*sorce(x,y);
where sorce (x, y) is the confidence score of the target at point (x, y).
7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:
acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;
and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.
9. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the near-infrared pedestrian monitoring method based on the multitask EfficientDet according to any one of claims 1-6.
10. A near-infrared pedestrian monitoring system based on the multitask EfficientDet for implementing the near-infrared pedestrian monitoring method based on the multitask EfficientDet according to any one of claims 1-6, wherein the near-infrared pedestrian monitoring system based on the multitask EfficientDet comprises:
the pedestrian activity area distribution obtaining module is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;
the segmentation and detection module is used for adopting an EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing the bottom layer characteristics through target detection and semantic segmentation;
and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.
CN202011427301.4A 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet Active CN112633086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011427301.4A CN112633086B (en) 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011427301.4A CN112633086B (en) 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet

Publications (2)

Publication Number Publication Date
CN112633086A true CN112633086A (en) 2021-04-09
CN112633086B CN112633086B (en) 2024-01-26

Family

ID=75308801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011427301.4A Active CN112633086B (en) 2020-12-09 2020-12-09 Near-infrared pedestrian monitoring method, system, medium and equipment based on multitasking EfficientDet

Country Status (1)

Country Link
CN (1) CN112633086B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486898A (en) * 2021-07-08 2021-10-08 西安电子科技大学 Radar signal RD image interference identification method and system based on improved ShuffleNet
CN115187783A (en) * 2022-09-09 2022-10-14 之江实验室 Multi-task hybrid supervision medical image segmentation method and system based on federal learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328850A1 (en) * 2015-05-08 2016-11-10 Vida Diagnostics, Inc. Systems and methods for quantifying regional fissure features
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110633632A (en) * 2019-08-06 2019-12-31 厦门大学 Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN110941995A (en) * 2019-11-01 2020-03-31 中山大学 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
CN110969124A (en) * 2019-12-02 2020-04-07 重庆邮电大学 Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111027493A (en) * 2019-12-13 2020-04-17 电子科技大学 Pedestrian detection method based on deep learning multi-network soft fusion
CN111507381A (en) * 2020-03-31 2020-08-07 上海商汤智能科技有限公司 Image recognition method and related device and equipment
CN111652213A (en) * 2020-05-24 2020-09-11 浙江理工大学 Ship water gauge reading identification method based on deep learning
CN111798425A (en) * 2020-06-30 2020-10-20 天津大学 Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning
CN111860316A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Driving behavior recognition method and device and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328850A1 (en) * 2015-05-08 2016-11-10 Vida Diagnostics, Inc. Systems and methods for quantifying regional fissure features
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN110633632A (en) * 2019-08-06 2019-12-31 厦门大学 Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN110941995A (en) * 2019-11-01 2020-03-31 中山大学 Real-time target detection and semantic segmentation multi-task learning method based on lightweight network
CN110969124A (en) * 2019-12-02 2020-04-07 重庆邮电大学 Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN111027493A (en) * 2019-12-13 2020-04-17 电子科技大学 Pedestrian detection method based on deep learning multi-network soft fusion
CN111507381A (en) * 2020-03-31 2020-08-07 上海商汤智能科技有限公司 Image recognition method and related device and equipment
CN111652213A (en) * 2020-05-24 2020-09-11 浙江理工大学 Ship water gauge reading identification method based on deep learning
CN111798425A (en) * 2020-06-30 2020-10-20 天津大学 Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning
CN111860316A (en) * 2020-07-20 2020-10-30 上海汽车集团股份有限公司 Driving behavior recognition method and device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUNCHAO WEI 等,: "TS2C:Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection", 《ECCV 2018》 *
胡敏 等,: "利用边界校正网络提取建筑物轮廓", 《遥感信息》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486898A (en) * 2021-07-08 2021-10-08 西安电子科技大学 Radar signal RD image interference identification method and system based on improved ShuffleNet
CN113486898B (en) * 2021-07-08 2024-05-31 西安电子科技大学 Radar signal RD image interference identification method and system based on improvement ShuffleNet
CN115187783A (en) * 2022-09-09 2022-10-14 之江实验室 Multi-task hybrid supervision medical image segmentation method and system based on federal learning

Also Published As

Publication number Publication date
CN112633086B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN109447034B (en) Traffic sign detection method in automatic driving based on YOLOv3 network
Tian et al. A dual neural network for object detection in UAV images
Zhang et al. Pedestrian detection method based on Faster R-CNN
Guo et al. Fast object detection based on selective visual attention
CN112633086A (en) Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet
CN113033321A (en) Training method of target pedestrian attribute identification model and pedestrian attribute identification method
CN107315990A (en) A kind of pedestrian detection algorithm based on XCS LBP features and cascade AKSVM
Li et al. Research on a product quality monitoring method based on multi scale PP-YOLO
Jiang et al. A fast and high-performance object proposal method for vision sensors: Application to object detection
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN115984238A (en) Power grid insulator defect detection method and system based on deep neural network
Zhang et al. A small target detection method based on deep learning with considerate feature and effectively expanded sample size
Wang et al. Small object detection based on modified FSSD and model compression
CN115761834A (en) Multi-task mixed model for face recognition and face recognition method
Panigrahi et al. MS-ML-SNYOLOv3: A robust lightweight modification of SqueezeNet based YOLOv3 for pedestrian detection
CN109284752A (en) A kind of rapid detection method of vehicle
Wang et al. Deep learning-based human activity analysis for aerial images
Le et al. Smart Elevator Cotrol System Based on Human Hand Gesture Recognition
Youssef et al. Real-time egyptian license plate detection and recognition using yolo
CN113343903B (en) License plate recognition method and system in natural scene
Wang et al. Research on Road Object Detection Model Based on YOLOv4 of Autonomous Vehicle
CN115273131A (en) Animal identification method based on dual-channel feature fusion
CN108830166B (en) Real-time bus passenger flow volume statistical method
Min et al. Vehicle detection method based on deep learning and multi-layer feature fusion
Xu et al. Fast vehicle detection based on feature and real-time prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant