CN112633086A

CN112633086A - Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet

Info

Publication number: CN112633086A
Application number: CN202011427301.4A
Authority: CN
Inventors: 张建龙; 何建辉; 李桥; 王斌; 郭鑫宇; 刘池帅; 崔梦莹; 时国强; 余鑫城; 方光祖
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-04-09
Anticipated expiration: 2040-12-09
Also published as: CN112633086B

Abstract

The invention belongs to the technical field of near-infrared image pedestrian detection, and discloses a near-infrared pedestrian monitoring method, a system, a medium and equipment based on multitask EfficientDet, wherein pedestrian activity area distribution under different scenes is obtained by utilizing a near-infrared image pedestrian detection data set; adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and enhancing the segmentation performance by the attention of a void space pyramid pooling and convolution module; and based on the multitask pedestrian detection model, post-processing the pedestrian target detection result through the predicted pedestrian activity region to obtain a final pedestrian detection result and a pedestrian activity region result. The method has higher detection performance, can reduce false positive samples in the result, and has important significance for monitoring the pedestrian activity in the night monitoring scene.

Description

Near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet

Technical Field

The invention belongs to the technical field of near-infrared image pedestrian detection, and particularly relates to a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet.

Background

The study of pedestrian detection began in the nineties of the twentieth century, from early traditional manual feature-based methods, such as HOG feature-based methods, Harr wavelet feature-based methods, and edgelet feature-based methods, to current deep learning feature extraction methods, such as those using the ResNet model, VGG model, and other convolutional neural network-based models. With the development of science and technology, the pedestrian detection technology is rapidly developed. Pedestrian detection is one of the branches of the computer vision field with wide application, and plays an important role in the fields of intelligent security, automatic driving, robots and the like.

The pedestrian detection technology of the near-infrared image is a technology which is beneficial to obtaining the position of a pedestrian by the near-infrared image, and is widely applied to intelligent security and automatic driving technologies. Under the night scene, the visible light imaging can not obtain the high-quality imaging effect, and compared with the traditional visible light detection, the infrared pedestrian detection has the following advantages: (1) under the condition of weak light, the imaging effect is still excellent, and the pedestrian features can be obtained from the background easily. (2) Infrared imaging can reduce background color interference. By virtue of these advantages, infrared pedestrian detection has been highly distinctive in many fields in recent years. The method has unusual performance in the fields of intelligent video monitoring, vehicle-mounted auxiliary driving, military early warning and the like. However, compared with visible light imaging, infrared imaging also has some disadvantages, such as less abundant texture and contour features than visible light imaging, and single color information, which adds a little difficulty to recognition and detection. But the method has very important significance in the aspects of concealed yoga and infrared pedestrian detection and also has serious challenge. Therefore, aiming at the existing problems, the design of an efficient and robust infrared pedestrian detection algorithm has a profound significance.

The pedestrian target detection of the near-infrared image is essentially a classification and regression problem, namely frame prediction is regarded as a regression task, and pedestrians and backgrounds are separated and regarded as a classification task. In the past decades, scholars at home and abroad make great contribution to the research of various pedestrian detection methods, the research methods can be divided into two categories by means of feature extraction, and the method based on manual features is gradually developed into the method based on deep learning features.

The early pedestrian detection method mainly performs feature extraction by manually designing a feature extraction operator, wherein the most typical feature is a Histogram of gradient directions (HOG) feature proposed by Dalal in 2005, which is used as a landmark mark on a pedestrian detection history, lays a foundation for excellent feature extraction, and simultaneously obtains a better detection effect by combining a learning method of a Support Vector Machine (SVM). Other methods based on manual features include methods based on Harr wavelet features and methods based on edgelet features.

The traditional infrared pedestrian image detection method based on manual feature extraction mainly has the following defects: (1) the manual feature design is difficult, the effectiveness of the artificial feature cannot be guaranteed (2) the hierarchy of the artificial feature is shallow, and the detection of pedestrians under a complex background is difficult to deal with, so that the detection performance is not high.

With the rapid development of deep learning theory and technology, there are researchers attempting to solve the pedestrian detection problem on infrared images using deep learning. Xingguo Zhang et al constructs a network model for visible light spectrum and infrared image by analyzing the target detection performance of R-CNN/Faster RCNN on visible light image, under the condition of using the same network model as fast RCNN, the omission ratio on VS data set and NIR data set is far lower than that of RPN _ BF and HOG + SVM, Wang Dongwei et al provides an improved YOLOv3 infrared video image pedestrian detection algorithm aiming at the problems of low accuracy and high omission ratio when YOLOv3 detects infrared video image pedestrians, and the result shows that the accuracy of the improved YOLOv3 algorithm in infrared pedestrian detection is as high as 90.63%, and is obviously superior to the fast-RCNN and YOLOv3 algorithms, and the improved network can detect more targets at the same time, and reduces the omission ratio.

Although the pedestrian detection problem of deep learning near infrared images obtains higher detection precision, the method has the following defects: (1) due to the lack of understanding of the overall semantics of the image and interference factors such as noise, error detection which is contrary to common knowledge information occurs in the detection result; (2) the effective features of the pedestrians may not be extracted due to the fact that the image information is single.

In summary, the problems of the prior art are as follows: due to the lack of understanding of the overall semantic information of the end-to-end convolutional neural network, a lot of misdetections against common sense can occur. Because the color of the near-infrared image is single and the outline is fuzzy, how to effectively extract the characteristics of the pedestrian target from the near-infrared image is also one of the main problems existing at present.

The difficulty of solving the technical problems is as follows: how to reduce semantic errors in detection results and how to effectively extract pedestrian features from near-infrared images without reducing detection performance.

The significance of solving the technical problems is as follows: the pedestrian detection algorithm of the near-infrared image is widely applied to the fields of intelligent security, automatic driving and robots, and how to obtain a high-precision detection result in as short a time as possible has important significance for intelligent security, automatic driving and the like.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet.

The invention is realized in this way, a near-infrared pedestrian monitoring method based on multitask EfficientDet, the near-infrared pedestrian monitoring method based on multitask EfficientDet includes:

acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set, and using the pedestrian activity area distribution to train segmentation branches of a model;

the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, sharing bottom layer characteristics through target detection and semantic segmentation, improving generalization capability of a model through multi-task learning, and further post-processing detection results through obtained prediction results of the pedestrian activity areas to improve performance;

and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area, reducing FP (false positive sample) in the prediction result, and obtaining the final pedestrian detection result and the pedestrian activity area result.

Further, the near-infrared pedestrian monitoring method based on multitask EfficientDet specifically comprises the following steps:

acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;

secondly, adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, and increasing a cavity space pyramid pooling module and an attention module to enhance segmentation performance;

training the improved multitask EfficientDet-D0 model, and improving the generalization capability of the model to the pedestrian target through multitask learning;

inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result.

Further, the first step comprises:

a) the used image training set is a single-channel near-infrared image data set in a monitoring scene;

b) traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.

Further, the semantic segmentation branch of the second step includes: adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112 and 320), performing 2-time upsampling and P4 channel splicing on P5 after passing through a cavity space pyramid pooling module, performing attention mechanism through a convolution module, then performing convolution layer on the channel, reducing the number of the channels to 128 by a BN layer and an activation layer, and obtaining a characteristic map O1 through attention mechanism of the convolution module; performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4; model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.

Further, the third step includes:

a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;

b) the method comprises the following steps of adopting a void space pyramid pooling module and a convolution module in a segmentation branch to pay attention to a mechanism attention model, improving the segmentation performance of the model, and adopting different learning rates in the segmentation branch and a detection branch to reduce overfitting of the model;

c) using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:

wherein X is a prediction mask, Y is a marking mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y;

b) using the Focal local as a Loss function of the classification, the definition of the Focal local is:

wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05;

c) SmoothL1 Loss was used as a Loss function for bounding box regression:

wherein y is a label and y' is a model prediction result;

d) setting training parameters:

learning Rate: 3 e-4;

learning rate reduction mode: cosine down;

batch Size: setting the batch size to 16;

input image size: input image size 768 × 512;

optimizer: an AdamW optimizer is adopted to realize the fast convergence of the network;

because the data volume of the pedestrian region segmentation is less compared with the pedestrian detection branch, the learning rate of the semantic segmentation branch is adjusted to be 0.01 times of the normal learning rate of the network; and the ratio of classification loss, regression loss and segmentation loss was set to 10: 10: 1.

further, the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:

the output result processing process comprises the following steps:

D＝S(D')；

then, a threshold value T is taken to carry out binarization processing on the prediction result, and the processing process is as follows:

taking the predicted activity area of the k image segmentation branches as prior information, and filtering out a target point of a frame center point which is not in the predicted activity area in a final detection result, thereby achieving the purpose of reducing false positive samples, wherein the processing process is as follows:

sorce(x,y)＝D(x,y)*sorce(x,y)；

where sorce (x, y) is the confidence score of the target at point (x, y).

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

the near-infrared pedestrian monitoring method based on the multitask EfficientDet is characterized by comprising the following steps of:

the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and improving model generalization capability through sharing bottom layer characteristics of target detection and semantic segmentation;

and based on the multitask EfficientDet-D0 model, post-processing the pedestrian target detection result through the predicted pedestrian activity area to obtain a final pedestrian detection result and a pedestrian activity area result.

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

The invention also aims to provide an information data processing terminal, which is used for realizing the near-infrared pedestrian monitoring method based on the multitask EfficientDet.

Another object of the present invention is to provide a system for a near-infrared pedestrian monitoring method based on multitask EfficientDet, which implements the method for near-infrared pedestrian monitoring based on multitask EfficientDet, wherein the system for the method for near-infrared pedestrian monitoring based on multitask EfficientDet comprises:

the pedestrian activity area distribution obtaining module is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;

the shared bottom layer feature segmentation module is used for adopting EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;

and the pedestrian detection result and pedestrian activity area result output module is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.

By combining all the technical schemes, the invention has the advantages and positive effects that: the method is mainly applied to the pedestrian detection field of near-infrared images, not only solves the problem of unreasonable semantic errors in the prediction result, maintains higher AP index, reduces the false positive rate of the prediction result, but also solves the problem of multi-task learning and sharing of a feature extraction network, and effectively improves the generalization capability of the model; the invention converts the target detection problem into a two-classification semantic segmentation problem and a normal target detection problem, and provides a pixel-level semantic segmentation pedestrian activity region prediction and pedestrian position detection network: the method adopts a void space pyramid pooling module with an increased convolution module attention mechanism to combine upsampling as a pedestrian activity area prediction branch, improves the generalization capability of the model through multi-task learning, and further improves the performance of pedestrian target detection through filtering false positive samples in a detection result through a pedestrian activity area.

The invention jointly acts semantic segmentation and target detection on the pedestrian target detection process, considers that some false positive samples which do not accord with semantic information exist in the pedestrian detection process, such as the prediction that some pedestrian targets may appear on trees or other areas which are contrary to common knowledge, can filter the false positive samples of the part through the prediction of the pedestrian areas, and improves the generalization capability of the model through multi-task learning, thereby obtaining more accurate pedestrian detection results.

According to the method, a semantic segmentation network and a target detection network are integrated, a feature extraction network is shared, a single model simultaneously completes two tasks of semantic segmentation and target detection, training cost is greatly reduced, the generalization capability of the model is greatly improved, overfitting detection data of the model is reduced, and the method provided by the invention can obtain higher detection performance and has better generalization capability through multi-task learning and post-processing of detection results by using pedestrian activity areas.

Compared with the prior art, the method has the following advantages:

(1) according to the invention, pedestrian activity region segmentation branches are introduced into the target detection model, and the generalization capability of the model is improved through multi-task learning;

(2) according to the invention, the prediction result of the pedestrian activity area is obtained through the additional segmentation branches, and the post-processing is carried out on the pedestrian target detection result through the pedestrian activity area, so that the detection performance is improved;

(3) in a fixed monitoring scene, the pedestrian activity area is pre-calculated and used for subsequent processing, so that the time overhead of activity area prediction each time is reduced, and the frame rate is always kept in a real-time monitoring state.

TABLE 1 comparison of the present invention with other target detection methods

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of a near-infrared pedestrian monitoring method based on multitask EfficientDet according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a near-infrared pedestrian monitoring method system based on multitask EfficientDet according to an embodiment of the present invention;

in fig. 2: 1. a pedestrian activity area distribution obtaining module; 2. a segmentation and detection module sharing bottom layer features; 3. and the pedestrian detection result and pedestrian activity area result output module.

Fig. 3 is a flowchart of an implementation of the near-infrared pedestrian monitoring method based on multitask EfficientDet according to the embodiment of the present invention.

Fig. 4 is a diagram of a multitasking EfficientDet-D0 network structure according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a near-infrared pedestrian monitoring method, system, medium and equipment based on multitask EfficientDet, and the invention is described in detail with reference to the attached drawings.

As shown in fig. 1, the near-infrared pedestrian monitoring method based on multitask EfficientDet provided by the invention comprises the following steps:

s101: acquiring the distribution of the activity areas of descending people in a fixed scene by using a training data set, inputting a near-infrared image, performing channel replication on the input image, and dividing the training data set and a test data set by adopting a random division mode;

s102: training the improved multitask EfficientDet-D0 model, and giving a lower learning rate aiming at the segmentation branches to prevent overfitting;

s103: and inputting k images to predict to obtain a pedestrian activity area prediction result, inputting a single near-infrared image to test, and obtaining a final detection result according to a pedestrian detection result and a pedestrian activity area prediction result.

A person skilled in the art can also implement the method for monitoring near-infrared pedestrians based on multitask EfficientDet provided by the present invention by using other steps, and the method for monitoring near-infrared pedestrians based on multitask EfficientDet provided by the present invention shown in fig. 1 is only a specific example.

As shown in fig. 2, the near-infrared pedestrian monitoring method system based on multitask EfficientDet provided by the invention comprises:

the pedestrian activity area distribution obtaining module 1 is used for obtaining pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set;

the shared bottom layer feature segmentation module 2 is used for adopting EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer features through target detection and semantic segmentation;

and the pedestrian detection result and pedestrian activity area result output module 3 is used for carrying out post-processing on a pedestrian target detection result through the predicted pedestrian activity area based on the multitask EfficientDet-D0 model to obtain a final pedestrian detection result and a final pedestrian activity area result.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

According to the method, pedestrian activity area distribution under different scenes is obtained by utilizing a near-infrared image pedestrian detection data set; the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and a feature attention module, sharing bottom layer features by utilizing target detection and semantic segmentation, and improving model generalization capability; and based on the multitask pedestrian detection model, post-processing the pedestrian target detection result through the predicted pedestrian activity region to obtain a final pedestrian detection result and a pedestrian activity region result. The method effectively improves and reduces the prediction of false positive samples by combining the segmentation branches and the target detection branches, has higher detection performance on the pedestrian target in the near-infrared image, can reduce FP samples, and has important significance on pedestrian activity monitoring in night monitoring scenes.

As shown in fig. 3, the near-infrared pedestrian monitoring method based on the near-infrared image and based on the multitask EfficientDet provided by the embodiment of the present invention specifically includes the following steps:

firstly, a used image training set is a single-channel near-infrared image data set in a monitoring scene; traversing all images in each scene in the training data set, obtaining the coordinates of the central points of the pedestrian targets through the labeled data, obtaining mask images of pedestrian areas through the coordinates of the central points of all the pedestrian targets in the scene, and executing the operation on each scene to obtain the pedestrian distribution area of each scene.

2a) the design of the segmentation branch is as follows, and the segmentation performance is improved mainly by using a void space pyramid pooling module and a convolution module attention mechanism module:

the method comprises the steps of constructing semantic segmentation branches by adopting P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, enabling the number of corresponding characteristic channels to be (24, 40, 112 and 320), conducting 2-time upsampling and P4 channel splicing on P5 after passing through a hollow space pyramid pooling module, conducting attention mechanism through a convolution module, reducing the number of channels to 128 through a BN layer and an activation layer, and obtaining a characteristic image O1 through attention mechanism of the convolution module. Performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4. Model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.

3a) embedding the division branches into a detection network, and sharing bottom layer characteristics with a target detection network;

3b) the cavity space pyramid pooling module and the convolution module are adopted in the segmentation branch to pay attention to the attention mechanism model, so that the segmentation performance of the model is improved, and different learning rates are adopted in the segmentation branch and the detection branch to reduce overfitting of the model.

3c) Using Dice Loss as a Loss function for splitting branches, the definition of Dice Loss is as follows:

wherein X is a prediction mask, Y is a labeling mask of a pedestrian region, wherein | X | + | Y | is an intersection between X and Y, and | X | + | Y | is a union between X and Y.

3d) The following is the definition of the Focal local as a Loss function of the classification

Wherein y is a label, y' is a model prediction result, gamma is a hyperparameter used for adjusting the weight of the difficult and easy samples and is set to be 2.0, and alpha is a hyperparameter used for adjusting the proportion of the positive and negative samples and is set to be 0.05.

3e) SmoothL1 Loss as a Loss function for bounding box regression

Wherein y is a label and y' is a model prediction result.

3f) Setting training parameters:

learning Rate: 3 e-4;

learning rate reduction mode: cosine down;

batch Size: setting the batch size to 16;

input image size: input image size 768 × 512;

inputting k images, testing to obtain the pedestrian activity area prediction of the scene, inputting near-infrared images for testing to obtain a pedestrian target detection result, and filtering false positive samples in the pedestrian target detection through the pedestrian activity area prediction result. The specific process is as follows:

firstly, predicting pedestrian activity regions of k (k is 10) images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:

the output result processing process comprises the following steps:

D＝S(D')；

the activity area predicted by the k image segmentation branches is used as prior information, and a target point of the frame center point which is not in the predicted activity area is filtered out in a final detection result, so that the aim of reducing false positive samples is fulfilled. The treatment process is as follows:

sorce(x,y)＝D(x,y)*sorce(x,y)；

where sorce (x, y) is the confidence score of the target at point (x, y).

The technical effects of the present invention will be described in detail with reference to simulations.

1. Simulation conditions

The invention uses Pycharm software to complete the simulation experiment of the invention on a PC with a CPU of Intel (R) core (TM) i7-7820X, CPU3.60GHz, RAM 32.00GB, 2X 2080Ti and ubuntu18.0 operating system.

2. Content of simulation experiment

This experiment was trained and tested using a self-collected data set of 10 scenes, with a raw image resolution of 2560 x 1440, scaled to 768 x 512. The scene A, B is taken as a test set, other scenes are taken as a training set, the test set comprises 3380 images, and the training set comprises 13072 images.

3. Simulation experiment results and analysis

Table 2 shows the comparison of the method of the present invention with Cascade RCNN original version, EfficientDet-D0 original version and EfficientDet-D0 method after Anchor clustering

TABLE 1 comparison of the present invention with other target detection methods

It can be seen from table 1 that the present invention maintains a high frame rate detection and a high detection accuracy through a multi-task learning mode, and improves the detection performance by 1.5 points when compared with an original EfficientDet-D0+ Anchor clustering model, and compared with other methods, the method of the present invention reduces the number of false positive samples and restricts the existence regions of the positive samples, thereby improving the detection performance. In conclusion, the invention successfully utilizes the multi-task detection model to improve the detection performance, reduces the number of false positive samples, obtains higher detection precision, and has important significance for researching intelligent security, automatic driving and the like.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A near-infrared pedestrian monitoring method based on multitask EfficientDet is characterized by comprising the following steps:

acquiring pedestrian activity area distribution under different scenes by utilizing a near-infrared image pedestrian detection training data set, and using the pedestrian activity area distribution to train segmentation branches;

the method comprises the steps of adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing bottom layer characteristics through target detection and semantic segmentation;

2. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 1, wherein the multitask EfficientDet-based near-infrared pedestrian monitoring method specifically comprises the following steps:

3. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the first step comprises:

4. The multitask EfficientDet-based near-infrared pedestrian monitoring method as claimed in claim 2, wherein the semantic division branch of the second step comprises: adopting EfficientDet-D0 as a basic detection network, constructing semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of EfficientDet-D0, wherein the number of corresponding characteristic channels is (24, 40, 112 and 320), performing 2-time upsampling and P4 channel splicing on P5 after passing through a cavity space pyramid pooling module, performing attention mechanism through a convolution module, then performing convolution layer on the channel, reducing the number of the channels to 128 by a BN layer and an activation layer, and obtaining a characteristic map O1 through attention mechanism of the convolution module; performing 2-time upsampling on the obtained feature map O1, performing channel splicing on the feature map P3, performing attention mechanism through a convolution module, performing channel splicing on the feature map O2 through convolution layer, reducing the number of channels to 64 through BN layer and activation layer, obtaining a feature map O2 through the attention mechanism of the convolution module, performing 2-time upsampling on the obtained feature map O2, performing channel splicing on the feature map P2, performing attention mechanism through the convolution layer, reducing the number of channels to 32 through BN layer and activation layer, obtaining a feature map O3 through the attention mechanism of the convolution module, reducing the number of channels to 1 through O3 through BN layer and activation layer, and obtaining an output result O4; model loss was calculated directly using O4; in the test stage, O4 is upsampled by 4 times to obtain the division result of the original image size.

5. The multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the third step comprises:

c) SmoothL1 Loss was used as a Loss function for bounding box regression:

wherein y is a label and y' is a model prediction result;

d) setting training parameters:

learning Rate: 3 e-4;

learning rate reduction mode: cosine down;

batch Size: setting the batch size to 16;

input image size: input image size 768 × 512;

6. the multitask EfficientDet-based near-infrared pedestrian monitoring method according to claim 2, wherein the fourth step includes: firstly, predicting pedestrian activity areas of k images, inputting the k images, and obtaining a final probability map result after passing the output result of the segmentation branch through a sigmoid function, wherein the sigmoid function is defined as follows:

the output result processing process comprises the following steps:

D＝S(D')；

sorce(x,y)＝D(x,y)*sorce(x,y)；

where sorce (x, y) is the confidence score of the target at point (x, y).

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

9. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the near-infrared pedestrian monitoring method based on the multitask EfficientDet according to any one of claims 1-6.

10. A near-infrared pedestrian monitoring system based on the multitask EfficientDet for implementing the near-infrared pedestrian monitoring method based on the multitask EfficientDet according to any one of claims 1-6, wherein the near-infrared pedestrian monitoring system based on the multitask EfficientDet comprises:

the segmentation and detection module is used for adopting an EfficientDet-D0 as a basic detection network, building semantic segmentation branches by utilizing P2, P3, P4 and P5 layers in a backbone network of the EfficientDet-D0 to monitor pedestrian activity areas, enhancing segmentation performance through a cavity space pyramid pooling module and an attention module, and sharing the bottom layer characteristics through target detection and semantic segmentation;