CN112069983B

CN112069983B - Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Info

Publication number: CN112069983B
Application number: CN202010917093.XA
Authority: CN
Inventors: 卢涛; 王元植; 张彦铎; 吴云韬
Original assignee: Wuhan Institute of Technology
Current assignee: Wuhan Institute of Technology
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2024-03-26
Anticipated expiration: 2040-09-03
Also published as: CN112069983A

Abstract

The invention discloses a low-light pedestrian detection method and a system based on multi-task feature fusion sharing learning, which acquire a normal low-light pedestrian data set; pre-training the image illumination enhancement network by using a normal low-illumination pedestrian data set; pre-training a pedestrian detection network by utilizing a normal illumination pedestrian data set; designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks, carrying out feature fusion and sharing on two networks, and constructing a low-light pedestrian detection network for multi-task feature fusion sharing learning; the method comprises the steps of importing two pre-training models into a low-illumination pedestrian detection network, and training by using a normal low-illumination data set to obtain a low-illumination pedestrian detection model with multi-task feature fusion sharing learning; and detecting the detected image by using a multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image. The invention can accurately and efficiently detect the position of the traveler in the low-illumination image.

Description

Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Technical Field

The invention belongs to the technical field of computer vision target detection, and particularly relates to a low-light pedestrian detection method and system based on multi-task feature fusion sharing learning.

Background

With the rapid development of world economy, frequent trips of personnel among different regions, different cities and different countries are brought, and accordingly public safety hidden trouble causes related safety departments to consume much energy. Video surveillance devices, which are an important component of urban security systems, are widely used today and installed in public areas such as roads, streets, schools, malls, etc. The equipment is mainly used for recording and storing things happening in related places, is convenient for people to complete requirements of remote monitoring, emergency command and the like, and ensures public safety of society. Pedestrians are main bodies in video monitoring, and research and analysis of behaviors of pedestrians by using intelligent technology are important components of the intelligent monitoring technology. Pedestrian detection is one of the key technologies in this field. Pedestrian detection is an important issue in the field of computer vision, with great success in recent years under controlled conditions.

In each application scene in the field of computer vision, pedestrians are very important analysis targets, and an important priori condition that a machine can complete subsequent tasks or interact with human beings is that the pedestrians targets in the behavior environment can be correctly identified. Pedestrian detection can be directly applied to scenes such as indoor and outdoor mobile robots, automobile automatic driving, security monitoring and the like, so that the pedestrian detection attracts attention in recent years. Pedestrian detection has a significant impact on many applications such as driver assistance systems, robotic navigation and intelligent video surveillance systems. The continued development of deep learning has contributed significantly to the performance of pedestrian detection.

Generally, pedestrian detection is a particular field of object detection. From this point of view, pedestrian detection algorithms in deep learning can be classified into two categories: anchor boxes based methods and keypoint based methods. Convolutional neural networks (Convolutional Neural Networks, CNN) were first introduced in RCNN to target detection, which can be performed without requiring manual design features. The keypoint-based target detection algorithm generates a target bounding box by detecting and grouping keypoints. This greatly simplifies the output of the network and eliminates the need to design anchor boxes.

Although the pedestrian detection algorithm described above still achieves satisfactory performance under normal lighting conditions. They mostly do not consider pedestrian detection in low-light environments. In practical applications, normal lighting conditions are not always guaranteed. In contrast, low-light environments are very common. The main reason for poor pedestrian detection performance in low light conditions is that color and texture information in input signals obtained in low light conditions is severely distorted. However, color and texture information plays a critical role in pedestrian detection. In order to solve the problem of low-light pedestrian detection, infrared pedestrian detection is studied at present, but the infrared pedestrian detection needs to be carried out under an infrared image, and the original low-light image detection cannot be directly used, so that the detection cost is greatly increased.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides a low-light pedestrian detection method and system based on multi-task feature fusion sharing learning, which solve the problem of poor pedestrian detection effect in a low-light environment.

In order to achieve the above object, according to one aspect of the present invention, there is provided a low-light pedestrian detection method based on multi-task feature fusion sharing learning, including:

s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;

s2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

s3: constructing a self-calibration split attention module SCSAB, wherein the self-calibration split attention module combines a self-calibration convolution network and a split attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;

s4: constructing a self-calibration separation attention hourglass network, wherein a basic module of the self-calibration separation attention hourglass network consists of SCSAB;

s5: constructing a pedestrian detection network, wherein the self-calibration separation attention hourglass network is used as a backbone network in the pedestrian detection network, and the pedestrian detection network is trained by utilizing the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model;

s6: based on the multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks, and carrying out feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network to construct a low-illumination pedestrian detection network with the multi-task feature fusion shared learning;

s7: and importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion sharing learning, training the low-illumination pedestrian detection network for multi-task feature fusion sharing learning by utilizing the normal-illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion sharing learning, and detecting an image to be detected through the low-illumination pedestrian detection model for multi-task feature fusion sharing learning to obtain the position of a pedestrian in the image to be detected.

In some alternative embodiments, step S6 includes:

and adding the features of the first convolution network of the image illumination enhancement network and the features of the second last SCSAB of the self-calibration separation attention hourglass network, taking an average, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last convolution network of the image illumination enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, taking an average, then passing through the sigmoid function, and feeding back to the two networks in the next iteration, thereby constructing the multi-task feature fusion shared learning low-illumination pedestrian detection network.

In some alternative embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein the loss function L of the image illumination enhancement network _enh The method comprises the following steps: l (L) _enh ＝L _recon +λ _ir L _ir +λ _is L _is ，λ _ir And lambda (lambda) _is As the weight coefficient, L _recon ，L _ir And L _is Representing reconstruction, reflectance and illumination smoothness loss functions, respectively.

In some alternative embodiments, the loss function L of the pedestrian detection network _cor The method comprises the following steps: l (L) _cor ＝L _det +δL _pull +ηL _push +γL _off Wherein, delta, eta and gamma are L respectively _pull ，L _push And L _off Weights of three loss functions, L _det For corner losses, L _pull For grouping diagonally, L _push Diagonal separation, L _off Is the offset penalty.

In some of the alternative embodiments of the present invention,

L _det for corner loss, N is the number of objects in the image, alpha and beta are hyper-parameters controlling the contribution of each corner, C, H and W represent the number of channels, height and width, p, respectively, of the input _aij To predict the score at the (i, j) position of class a in the image, y _aij Is an unnormalized original image;

L _off for offset loss o _k Is the offset of the label, x _k And y _k Is the x and y coordinates of corner k, n is the downsampling factor,representing the difference between the predicted offset and the tag offset calculated using the smoothing loss, ++>Representing the predicted offset;

L _pull for grouping diagonally, L _push Separating the angles, m represents the number of objects, +.>For embedding the upper left corner of object m, < >>For embedding the lower right corner of object m, e _m Is thatAnd->Average value of e _m And e _j Representing the embedding of objects m and j, respectively.

In some alternative embodiments, the method further comprises: and designing a characteristic fusion and sharing multi-task learning mechanism, wherein the mechanism can fuse the characteristics between upstream and downstream tasks and feed back the characteristics to other networks in the next iteration.

In some alternative embodiments, the feature fusion and sharing multitasking learning mechanism is: assume that there are two tasks, task A and task B, respectively, a convolutional layer C in the task A network _A1 Is characterized byConvolutional layer C in a mission B network _B1 The output characteristic of (2) is->C _A2 And C _B2 Respectively C _A1 And C _B1 Is the next convolutional layer of (a)>Is C _A2 Input features of convolutional layer,/->Is C _B2 Input features of convolutional layer, F _i The shared features obtained for the ith end-to-end round iteration are represented as follows: />When i=1, _a->Is represented as follows: />When i > 1, ">Is represented as follows: />F _i-1 Representing the sharing characteristics obtained by the i-1 th end-to-end round iteration.

In some alternative embodiments, the total training loss function of the multi-tasking feature-shared low-light pedestrian detection network is: l=l _det +L _cor ＝L _det +δL _pull +ηL _push +γL _off +ζL _enh L is total loss, ζ is illumination enhancement loss L _enh Is a weight of (2).

According to another aspect of the present invention, there is provided a low-light pedestrian detection system based on multi-task feature fusion sharing learning, including:

the data set module is used for acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;

the image illumination enhancement module is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

the pedestrian detection module is used for constructing a pedestrian detection network, the pedestrian detection network takes a self-calibration separation attention hourglass network as a main network, the pedestrian detection network is trained by utilizing the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines the self-calibration convolution network and the separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;

the multi-task feature fusion module is used for designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks based on multi-task feature fusion sharing learning, and carrying out feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network to construct a low-illumination pedestrian detection network for multi-task feature fusion sharing learning;

the model training module is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into the low-illumination pedestrian detection network for the multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for the multi-task feature fusion shared learning by utilizing the normal-illumination pedestrian data set and the low-illumination pedestrian data set to obtain the low-illumination pedestrian detection model for the multi-task feature fusion shared learning;

and the image detection module is used for detecting the image to be detected by utilizing the multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image to be detected.

In some alternative embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network.

In some alternative embodiments, a pedestrian detection network based on self-calibrating split attention is constructed.

According to another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the low-light pedestrian detection method based on the multi-tasking feature fusion sharing learning of any one of the above.

In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:

the invention provides a low-light pedestrian detection method and system based on multi-task feature fusion sharing learning, which can accurately and efficiently detect the position of a pedestrian in a low-light image; and creatively provides a self-calibration separation attention module which combines self-calibration convolution and a separation attention network to effectively collect information of each spatial position in an input image so as to expand the field of view of each convolution layer, thereby improving the detection performance.

Drawings

Fig. 1 is a schematic flow diagram of a low-light pedestrian detection method based on multi-task feature fusion sharing learning provided by an embodiment of the invention;

FIG. 2 is a block diagram of a low-light pedestrian detection network based on multi-task feature fusion sharing learning;

FIG. 3 is a diagram of a self-calibrating split attention block architecture in accordance with the present invention;

FIG. 4 is a schematic diagram of a low-light pedestrian detection system based on multi-task feature fusion sharing learning of the present invention;

FIG. 5 is a workflow diagram of a feature fusion and sharing multi-task learning mechanism in accordance with the present invention;

FIG. 6 is a graph showing a comparison of test results according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention is mainly divided into four parts: the image illumination enhancement pre-training model, the pedestrian detection pre-training model, the multi-task feature fusion shared learning low-illumination pedestrian detection model and the multi-task feature fusion shared learning low-illumination pedestrian detection model are used for reasoning the pedestrian position in the image from the low-illumination image.

The low-light pedestrian detection method based on the multi-task feature fusion sharing learning, as shown in fig. 1, comprises the following steps:

in the embodiment of the invention, the image can be obtained directly through photographing; the CityPersons (a published large-scale pedestrian detection dataset) pedestrian detection training set and verification set may also be de-illuminated, in embodiments of the invention using the RGB-based spatial luminance adjustment algorithm in OpenCV (a across platform computer vision library issued based on BSD permissions (open sources)), which adjusts based on the current RGB value size, i.e., the larger the R, G, B value, the larger the adjustment, for example: the current pixel point is (100,200,50), the adjustment coefficient is 1.1, and the current pixel point is (110,220,55) after adjustment. In the embodiment of the invention, the adjustment coefficient used is 0.8. After the brightness is reduced, the normal and low-light CityPersons pedestrian detection training set and the verification set are used as the training set and the test set.

S2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

and training the image illumination enhancement network for 100 periods independently by using the normal and low-illumination pedestrian detection training sets to obtain an image illumination enhancement pre-training model. The network structure of the image illumination enhancement network is shown in fig. 2, and the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, and the Retinex theory is introduced into the network. The classical Retinex theory builds a model of human color perception, which assumes that the observed image can be decomposed into two parts, the reflection channel and the illumination channel. Let S represent the source image, it can be represented by s=r×i, where R represents the reflectivity component, I represents the illumination component, and x represents the element-by-element multiplication. For the loss function, in order to ensure that the image after illumination recovery can retain the edge information of the object and also retain the smooth transition of the illumination information, the following loss function is used in the illumination enhancement network:

L _enh ＝L _recon +λ _ir L _ir +λ _is L _is

wherein lambda is _ir And lambda (lambda) _is Representing the coefficient of balance reflectivity and illuminance. Loss function L _recon ，L _ir And L _is Representing reconstruction, reflectance and illumination smoothness functions, respectively.

S3: constructing a pedestrian detection network, wherein the pedestrian detection network takes two self-calibration separation attention-based hourglass networks as a main network, and trains the pedestrian detection network by utilizing a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, a basic module of the self-calibration separation attention-based hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;

in the embodiment of the invention, the pedestrian detection training set with normal illumination is used for training the pedestrian detection network for 100 periods independently, so as to obtain a pedestrian detection pre-training model. The network structure of the pedestrian detection network is shown in fig. 2. For the pedestrian detection network, the idea of a target detection algorithm based on key points is consulted, the target detection algorithm based on the key points generates an object boundary box by detecting and grouping the key points, the output of the network is greatly simplified, the need of designing an anchor frame is eliminated, and meanwhile, a attention mechanism is introduced into the network to further improve the detection performance. The Self-calibration split attention block (Self-Calibrated Split Attention Block, SCSAB) provided by the invention is combined with the Self-calibration convolution and split attention network as shown in fig. 3, so that the information of each spatial position in an input image is effectively collected to expand the field of view of each convolution layer, and the detection performance is improved.

For the loss function of the pedestrian detection network, a focal loss function of α=2, β=4 can be used, let p _aij To predict the score at the (i, j) position of class a in the image and let y _aij Is an unnormalized original image.

Where N is the number of objects in the image and α and β are hyper-parameters that control the contribution of each point. C, H and W represent the number of channels, height and width, respectively, of the input.

The size of the output is typically smaller than the input image as it passes through the convolutional layer. Thus, the position (x, y) in the image maps to a position in the heat mapWhere n is the downsampling factor. Some accuracy may be lost when remapping locations from a heat map to an input image. To solve this problem, a position offset is predicted, which slightly adjusts the corner positions and then remaps them to the input resolution.

Wherein o is _k Is an offset, x _k And y _k Is the x and y coordinates of corner k. One set of offsets is predicted to be shared by the top left corner of all categories, and the other set of offsets is predicted to be shared by the bottom right corner. For training purposes, the offset loss is marked as L _off And applies the smoothed L1 loss as an offset loss:

wherein,representing the difference between the predicted offset and the tag offset calculated using the smoothing loss, ++>Representing the predicted offset.

There may be multiple objects in the image so that multiple upper left and lower right corners may be detected, so it is necessary to determine whether a pair of upper left and lower right corners are from the same bounding box. Order theFor embedding the upper left corner of object m, < >>Is the embedding of the lower right corner of object m. The network is trained to group diagonals using the "pull" penalty and split diagonals using the "push" penalty:

wherein e _m Is thatAnd->Average value of e _m And e _j Embedding e representing objects m and j, respectively _j Is->And->Average value of>For embedding in the upper left corner of object j, < >>For embedding in the lower right corner of object j, the total loss of training for pedestrian detection is as follows:

L _cor ＝L _det +δL _pull +ηL _push +γL _off ，

wherein δ, η and γ are each L _pull ，L _push And L _off Weights of three loss functions, L _cor Is the total loss of the pedestrian detection network.

S4: based on the multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features among different tasks, carrying out feature sharing on an image illumination enhancement network and a pedestrian detection network, adding the features of a first 3*3 convolution network of the enhancement network and the features of a penultimate SCSAB (secondary serial standard off-set) of a self-calibration separation attention hourglass network, taking an average value, then feeding back the average value to the two networks through a sigmoid function in the next iteration, adding the features of a penultimate 3*3 convolution network of the enhancement network and the features of a first SCSAB of the self-calibration separation attention hourglass network, taking the average value, feeding back the average value to the two networks through a sigmoid function in the next iteration, and constructing a low-illumination pedestrian detection network with the multi-task feature fusion shared learning;

the multi-task learning mechanism with feature fusion and sharing is designed, and the mechanism can fuse features between upstream tasks and downstream tasks and feed the features back to other networks in the next iteration. As shown in fig. 5, the feature fusion and sharing multi-task learning mechanism is: assume that there are two tasks, dividedThe tasks are task A and task B, and a convolution layer C in the task A network _A1 Is characterized byConvolutional layer C in a mission B network _B1 The output characteristic of (2) is->C _A2 And C _B2 Respectively C _A1 And C _B1 Is the next convolutional layer of (a)>Is C _A2 Input features of convolutional layer,/->Is C _B2 Input features of convolutional layer, F _i The shared features obtained for the ith end-to-end round iteration are represented as follows: />When i=1, _a->Is represented as follows:when i > 1, ">Is represented as follows: />F _i-1 Representing the sharing characteristics obtained by the i-1 th end-to-end round iteration.

The low-light pedestrian detection network for multi-task feature fusion sharing learning comprises: in order to further improve the performances of the two networks, a multi-task feature fusion module and a multi-arbitrary feature fusion module are introduced on the basisThe detailed structure of the service feature fusion module is shown in fig. 2. Wherein, at i ^th In iteration, in the multi-task feature fusion module, features in the image illumination enhancement sub-network are respectively processedAnd->Feature fused into pedestrian detection subnetwork>And->And the two fused features are marked +.>And->Characteristics->And features->From the first 3*3 and fourth last 3*3 convolutional layers, respectively, in the enhanced network. Characteristics->From the penultimate SCSAB in self-calibrating split attention hourglass network, and property ≡>From the first SCSAB. At the same time, the sizes of these features are the same. />And->The expression is as follows:

thenAnd->After the sigmoid function, it is denoted +.>And->At i+1 ^th In iteration, the->Respectively and->And->Element-wise multiplication serves as an input to the second convolutional layer of the enhancement network and as an input to the last SCSAB of the self-calibrating split attention hourglass network. The input is characterized by->Andthe formula is as follows：

The same fusion and sharing method acts on the input of the last convolution in the enhanced network and the input of the second SCSAB in the self-calibrating split attention hourglass network. The input features are expressed asAnd->The formula is as follows:

finally, the training loss function L of the low-light pedestrian detection network for the multi-task feature fusion sharing learning is as follows:

L＝L _det +δL _pull +ηL _push +γL _off +ζL _enh ，

wherein δ, η and γ are each L _pull ，L _push And L _off Weights of three loss functions. Zeta is the loss of illumination enhancement L _enh Is a weight of (2). Let δ and η be 0.1, γ be 1 and ζ be 0.05.

S5: the method comprises the steps of importing an image illumination enhancement pre-training model and a pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion and sharing learning, and training the low-illumination pedestrian detection network for the multi-task feature fusion and sharing learning by utilizing a normal-illumination pedestrian data set and a low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for the multi-task feature fusion and sharing learning;

training the low-light pedestrian detection network with the multi-task feature fusion shared learning by using the normal and low-light pedestrian detection training sets, and simultaneously importing the pre-training model with the pre-trained image illumination enhancement and the pre-training model with the pedestrian detection, and training for 100 periods to obtain the low-light pedestrian detection model with the multi-task feature fusion shared learning.

S6: and detecting the image to be detected by using the multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image. And inputting the low-illumination test set picture into a trained multi-task feature fusion shared learning low-illumination pedestrian detection model for reasoning, and framing the position of the pedestrian in the image.

The invention also provides a low-light pedestrian detection system based on the multi-task feature fusion sharing learning for realizing the low-light pedestrian detection method based on the multi-task feature fusion sharing learning, as shown in fig. 4, comprising:

a data set module 101, configured to acquire a normal illumination pedestrian data set and a low illumination pedestrian data set;

the image illumination enhancement module 102 is configured to construct an image illumination enhancement network, where the image illumination enhancement network includes a decomposition network and an enhancement network, and training the image illumination enhancement network by using a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

the pedestrian detection module 103 is configured to construct a pedestrian detection network, the pedestrian detection network uses two self-calibration separation attention-based hourglass networks as a backbone network, and trains the pedestrian detection network by using a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention-based hourglass network is composed of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;

the multi-task feature fusion module 104 is configured to fuse features between upstream and downstream tasks, perform feature sharing on the image illumination enhancement network and the pedestrian detection network, add the features of the first 3*3 convolution network of the enhancement network and the features of the penultimate SCSAB of the self-calibration separation attention hourglass network, average the features, feed back the features to the two networks through a sigmoid function in the next iteration, add the features of the penultimate 3*3 convolution network of the enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, average the features, feed back the features to the two networks through a sigmoid function in the next iteration, and construct a low-illumination pedestrian detection network with multi-task feature fusion sharing learning;

the model training module 105 is configured to import the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion and sharing learning, and train the low-illumination pedestrian detection network for multi-task feature fusion and sharing learning by using the normal illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion and sharing learning;

the image detection module 106 is configured to detect an image to be detected by using the low-light pedestrian detection model of the multi-task feature fusion sharing learning, so as to obtain the position of the pedestrian in the image.

Further, constructing an image illumination enhancement network based on the RetinexNet convolutional neural network; the self-calibration separation attention block provided by the invention is used for constructing a pedestrian detection network.

The invention also provides a computer storage medium, in which a computer program executable by a computer processor is stored, the computer program executes the low-light pedestrian detection model method based on the multi-task feature fusion sharing learning.

The invention finally provides a test embodiment employing normal and low light CityPersons datasets comprising 2975 pictures each of a normal and low light training set, normal and low light measurementsThe test set was 500 pictures each. Experiments were carried out in Pytorch and training was performed using 3 RTX 2080Ti graphics cards, while Adam optimization algorithm was applied. The learning rate was set to 0.0001 in the selection of experimental parameters. The evaluation index follows the evaluation standard of the california academy of technology: log mean Miss Rate (MR) for each image ^-2 )，MR ^-2 The lower the value of (c) the better the algorithm performance. By MR ^-2 The evaluation index is compared with other excellent pedestrian detection algorithms to prove the superiority of the invention. TABLE 1 passage through MR ^-2 The evaluation index shows the result of comparison, and fig. 6 is a graph showing the result of pedestrian position detection, in which (a) represents an input image, (b) represents a CSP detection result, (c) represents an ALFNet detection result, (d) represents a centrnet detection result, (e) represents a CornerNet detection result, (f) represents a CornerNet-Saccade detection result, (g) represents a detection result of the present invention, and (h) represents a Benchmark detection result.

The present invention selects some excellent pedestrian detection or target detection methods, such as: CSP, ALFNet, centerNet, cornerNet and CornerNet-Saccade. ALFNet is the most representative algorithm for pedestrian detection using anchors, CSP and center net are the best algorithms for center-point based pedestrian detection and target detection, respectively. Meanwhile, cornerNet and CornerNet-Saccade are representative algorithms of corner-based target detection methods. All algorithms in table 1 will train the algorithm with normal and low light training sets, so all these pedestrian detection networks have the ability to process low light images, guaranteeing the fairness of the experiment. In addition, in order to further illustrate the role of the multi-task feature fusion module in the algorithm of the present patent, the algorithm in table 2 adopts a cascade connection manner to cascade the RetinexNet illumination enhancement algorithm and the detection algorithm together. As can be seen from the results in table 2, the index of the cascading method is still not superior to the algorithm of the present invention, which proves that the multi-task feature fusion module plays an important role.

Table 1 comparison results table of the present invention with five excellent algorithms

Table 2 comparison results table of the present invention with five excellent algorithms cascaded by RetinexNet

From the experimental results of the table, the algorithm has obvious advantages compared with other five methods.

The non-illustrated portions of the specification are prior art or common general knowledge. The present embodiment is only for illustrating the present invention and not for limiting the scope of the present invention, and those skilled in the art will recognize that equivalent substitutions and modifications to the present invention are within the scope of the appended claims.

It should be noted that each step/component described in the present application may be split into more steps/components, or two or more steps/components or part of the operations of the steps/components may be combined into new steps/components, as needed for implementation, to achieve the object of the present invention.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The low-light pedestrian detection method based on the multi-task feature fusion sharing learning is characterized by comprising the following steps of:

2. The method according to claim 1, wherein step S6 comprises:

3. The method according to claim 1, wherein the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein a loss function L of the image illumination enhancement network _enh The method comprises the following steps: l (L) _enh ＝L _recon +λ _ir L _ir +λ _is L _is ，λ _ir And lambda (lambda) _is As the weight coefficient, L _recon ，L _ir And L _is Representing reconstruction, reflectance and illumination smoothness loss functions, respectively.

4. A method according to claim 3, characterized in that the loss function L of the pedestrian detection network _cor The method comprises the following steps: l (L) _cor ＝L _det +δL _pull +ηL _push +γL _off Wherein, delta, eta and gamma are L respectively _pull ，L _push And L _off Weights of three loss functions, L _det For corner losses, L _pull For grouping diagonally, L _push Diagonal separation, L _off Is the offset penalty.

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

L _off for offset loss o _k Is an offset, x _k And y _k Is the x and y coordinates of corner k, n is the downsampling factor,representing the difference between the predicted offset and the tag offset calculated using the smoothing loss, ++>Representing the predicted offset;

L _pull for grouping diagonally, L _push Separating the angles, m represents the number of objects, +.>For embedding the upper left corner of object m, < >>For embedding the lower right corner of the object m, e _m Is->Andaverage value of e _m And e _j Representing the embedding of objects m and j, respectively.

6. The method according to claim 1, wherein the method further comprises: and designing a characteristic fusion and sharing multi-task learning mechanism, wherein the mechanism can fuse the characteristics between upstream and downstream tasks and feed back the characteristics to other networks in the next iteration.

7. The method of claim 6, wherein the feature fusion and sharing multi-task learning mechanism is: assume that there are two tasks, task A and task B, respectively, a convolutional layer C in the task A network _A1 Is characterized byConvolutional layer C in a mission B network _B1 The output characteristic of (2) is->C _A2 And C _B2 Respectively C _A1 And C _B1 Is the next convolutional layer of (a)>Is C _A2 Input features of convolutional layer,/->Is C _B2 Input features of convolutional layer, F _i The shared features obtained for the ith end-to-end round iteration are represented as follows: />When i=1, _a->Is represented as follows:when i > 1, ">Is represented as follows: />F _i-1 Representing the sharing characteristics obtained by the i-1 th end-to-end round iteration.

8. The method of claim 5, wherein the total training loss function of the multi-tasking feature-shared low-light pedestrian detection network is: l=l _det +L _cor ＝L _det +δL _pull +ηL _push +γL _off +ζL _enh L is total loss, ζ is illumination enhancement loss L _enh Is a weight of (2).

9. A low-light pedestrian detection system based on multi-task feature fusion and sharing learning, comprising:

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the method for low-light pedestrian detection based on multi-tasking feature fusion sharing learning of any of claims 1 to 8.