CN112069983A

CN112069983A - Low-illumination pedestrian detection method and system for multi-task feature fusion shared learning

Info

Publication number: CN112069983A
Application number: CN202010917093.XA
Authority: CN
Inventors: 卢涛; 王元植; 张彦铎; 吴云韬
Original assignee: Wuhan Institute of Technology
Current assignee: Wuhan Institute of Technology
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2020-12-11
Anticipated expiration: 2040-09-03
Also published as: CN112069983B

Abstract

The invention discloses a low-illumination pedestrian detection method and system based on multitask feature fusion shared learning, which are used for acquiring normal and low-illumination pedestrian data sets; pre-training the image illumination enhancement network by using normal and low illumination pedestrian data sets; pre-training a pedestrian detection network by using a normal illumination pedestrian data set; designing a multi-task feature fusion module capable of fusing features between an upstream task and a downstream task, performing feature fusion and sharing on the two networks, and constructing a low-illumination pedestrian detection network for multi-task feature fusion shared learning; importing the two pre-training models into a low-illumination pedestrian detection network, and training by using normal and low-illumination data sets to obtain a low-illumination pedestrian detection model for multi-task feature fusion and shared learning; and detecting the detected image by using a low-illumination pedestrian detection model of multi-task feature fusion sharing learning to obtain the position of the pedestrian in the image. The invention can accurately and efficiently detect the position of the pedestrian in the low-illumination image.

Description

Low-illumination pedestrian detection method and system for multi-task feature fusion shared learning

Technical Field

The invention belongs to the technical field of computer vision target detection, and particularly relates to a low-illumination pedestrian detection method and system based on multi-task feature fusion shared learning.

Background

With the rapid development of world economy, people frequently come and go between different regions, different cities and different countries, the public safety hidden danger caused by the frequent coming and going of people causes related safety departments to consume much energy. Currently, video surveillance devices, which are important components of urban security systems, are widely used and installed in public areas such as roads, streets, schools, and shopping malls. The devices are mainly used for recording and storing things happening at related places, so that people can conveniently complete the requirements of remote monitoring, emergency command and the like, and the public safety of the society is guaranteed. Pedestrians are the main bodies in video monitoring, and the research and analysis of the behaviors of the pedestrians by using an intelligent technology are important components of the intelligent monitoring technology. Pedestrian detection is one of the key technologies in this field. Pedestrian detection is an important problem in the field of computer vision, and has enjoyed great success in recent years under controlled conditions.

In each application scene in the field of computer vision, pedestrians are very important analysis targets, and the important prior condition that a machine can complete subsequent tasks or interact with human beings is that the pedestrian targets in the behavior environment of the machine can be correctly identified. Pedestrian detection can be directly applied to scenes such as indoor and outdoor mobile robots, automatic driving of automobiles, security monitoring and the like, so that the pedestrian detection attracts the attention of people in recent years. Pedestrian detection has a significant impact on many applications, such as driver assistance systems, robotic navigation, and intelligent video surveillance systems. The continuous development of deep learning makes a great contribution to the performance of pedestrian detection.

Generally, pedestrian detection is a particular area of object detection. From this point of view, the pedestrian detection algorithms in deep learning can be divided into two categories: anchor boxes based methods and keypoint based methods. Convolutional Neural Networks (CNN) were first introduced into target detection in RCNN, which allows detection without the need to manually design features. The keypoint-based target detection algorithm generates a target bounding box by detecting and grouping keypoints. This greatly simplifies the output of the network and eliminates the need to design anchor boxes.

Although the pedestrian detection algorithm described above still achieves satisfactory performance under normal lighting conditions. Most of them do not consider pedestrian detection in low light environments. In practical applications, normal lighting conditions are not always guaranteed. In contrast, low light environments are very common. The main reason for poor pedestrian detection performance in low-light environments is that color and texture information in the input signal obtained under low-light conditions is severely distorted. However, color and texture information plays a crucial role in pedestrian detection. In order to solve the problem of low-illumination pedestrian detection, infrared pedestrian detection is researched at present, but the infrared pedestrian detection needs to be carried out under an infrared image, the original low-illumination image detection cannot be directly used, and the detection cost can be greatly increased.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a low-illumination pedestrian detection method and system based on multi-task feature fusion shared learning, and solves the problem of poor detection effect of pedestrians in a low-illumination environment.

To achieve the above object, according to one aspect of the present invention, there is provided a low-light pedestrian detection method based on multitask feature fusion shared learning, including:

s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;

s2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

s3: constructing a self-calibration separate attention module SCSAB, which combines a self-calibration convolution network and a separate attention network, for collecting information of each spatial position in an input image to extend a field of view of each convolution layer;

s4: constructing a self-calibration separated attention hourglass network, wherein a basic module of the self-calibration separated attention hourglass network consists of SCSAB;

s5: constructing a pedestrian detection network, wherein the pedestrian detection network takes the self-calibration separated attention hourglass network as a backbone network, and trains the pedestrian detection network by using the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model;

s6: based on multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features between an upstream task and a downstream task, performing feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network, and constructing a low-illumination pedestrian detection network for multi-task feature fusion shared learning;

s7: and importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by utilizing the normal illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning, and detecting an image to be detected through the low-illumination pedestrian detection model for multi-task feature fusion shared learning to obtain the position of a pedestrian in the image to be detected.

In some alternative embodiments, step S6 includes:

adding the characteristics of the first convolution network of the image illumination enhancement network and the characteristics of the second last SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the characteristics of the fourth last convolution network of the image illumination enhancement network and the characteristics of the first SCSAB of the self-calibration separation attention hourglass network, then taking the average number, then passing through the sigmoid function, and feeding back to the two networks in the next iteration, thereby constructing the low-illumination pedestrian detection network with multi-task feature fusion and shared learning.

In some optional embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein a loss function L of the image illumination enhancement network_enhComprises the following steps: l is_enh＝L_recon+λ_irL_ir+λ_isL_is，λ_irAnd λ_isIs a weight coefficient, L_recon，L_irAnd L_isRepresenting reconstruction, reflectance and illumination smoothness loss functions, respectively.

In some alternative embodiments, the loss function L of the pedestrian detection network_corComprises the following steps: l is_cor＝L_det+L_pull+ηL_push+γL_offWherein, η and γ are each L_pull，L_pushAnd L_offWeights of three loss functions, L_detFor corner loss, L_pullFor grouping corners, L_pushSeparation of the diagonal, L_offIs the offset loss.

In some of the alternative embodiments, the first and second,

L_detfor corner loss, N is the number of objects in the image, α and β are the hyperparameters controlling the contribution of each corner, C, H and W represent the number of channels, height and width, respectively, input, p_aijIs the score at the (i, j) position of class a in the predicted image, y_aijIs an original image without normalization;

L_offfor offset loss, o_kIs the offset of the label, x_kAnd y_kIs the x and y coordinates of the corner point k, n is the down-sampling factor,

representing the difference of the predicted offset and the offset of the tag calculated using the smoothing loss,

representing a predicted offset;

L_pullfor grouping corners, L_pushThe corners are separated, m represents the number of objects,

for the embedding of the upper left corner of object m,

for the embedding of the lower right corner of object m, e_mIs composed of

And

average value of e_mAnd e_jRepresenting the embedding of objects m and j, respectively.

In some optional embodiments, the method further comprises: and designing a multi-task learning mechanism with feature fusion and sharing, wherein the mechanism can fuse features between an upstream task and a downstream task and feed back the features to other networks during the next iteration.

In some optional embodiments, the feature fusion and sharing multitask learning mechanism is: assume that there are two tasks, task A and task B, respectively, task A being a network of convolutional layers C_A1Is characterized by an output

Convolutional layer C in task B network_B1Is characterized by an output

C_A2And C_B2Are respectively C_A1And C_B1The next layer of the wound up layer of (a),

is C_A2The input characteristics of the convolutional layer(s),

is C_B2Input characteristics of the convolutional layer, F_iThe shared features obtained for the ith end-to-end iteration are expressed as follows:

when the value of i is 1, the value of i,

is represented as follows:

when the value of i is greater than 1,

is represented as follows:

F_i-1representing the shared characteristics obtained by the i-1 st end-to-end iteration.

In some optional embodiments, the overall training loss function of the multitask feature shared low-light pedestrian detection network is: l ═ L_det+L_cor＝L_det+L_pull+ηL_push+γL_off+ζL_enhL is the total loss, ζ is the illumination enhancement loss L_enhThe weight of (c).

According to another aspect of the present invention, there is provided a low-light pedestrian detection system based on multitask feature fusion shared learning, including:

the data set module is used for acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;

the image illumination enhancement module is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the normal illumination pedestrian data set and the low illumination pedestrian data set are used for training the image illumination enhancement network to obtain an image illumination enhancement pre-training model;

the pedestrian detection network is trained by using the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;

the multitask feature fusion module is used for designing a multitask feature fusion module capable of fusing features between an upstream task and a downstream task based on multitask feature fusion shared learning, performing feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network, and constructing a low-illumination pedestrian detection network of the multitask feature fusion shared learning;

the model training module is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into the multitask feature fusion shared learning low-illumination pedestrian detection network, and training the multitask feature fusion shared learning low-illumination pedestrian detection network by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain a multitask feature fusion shared learning low-illumination pedestrian detection model;

and the image detection module is used for detecting the image to be detected by using the low-illumination pedestrian detection model of the multitask feature fusion shared learning to obtain the position of the pedestrian in the image to be detected.

In some optional embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network.

In some alternative embodiments, a self-calibration based discrete attention pedestrian detection network is constructed.

According to another aspect of the present invention, there is provided a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any of the low-light pedestrian detection methods based on multitask feature fusion shared learning.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

the invention provides a low-illumination pedestrian detection method and system based on multitask feature fusion shared learning, which can accurately and efficiently detect the position of a pedestrian in a low-illumination image; and the self-calibration separated attention module is creatively provided, and the module combines the self-calibration convolution and the separated attention network to effectively collect the information of each spatial position in the input image so as to expand the visual field of each convolution layer, thereby improving the detection performance.

Drawings

Fig. 1 is a schematic flowchart of a low-light pedestrian detection method based on multitask feature fusion shared learning according to an embodiment of the present invention;

FIG. 2 is a diagram of a low light pedestrian detection network architecture based on multi-tasking feature fusion shared learning in accordance with the present invention;

FIG. 3 is a self-calibrating discrete attention block diagram proposed by the present invention;

FIG. 4 is a schematic diagram of a low-light pedestrian detection system based on multi-tasking feature fusion shared learning of the present invention;

FIG. 5 is a workflow diagram of the feature fusion and sharing multitasking learning mechanism proposed by the present invention;

FIG. 6 is a graph comparing test results of the examples of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention is mainly divided into four parts: the method comprises an image illumination enhancement pre-training model, a pedestrian detection pre-training model, a low illumination pedestrian detection model of multi-task feature fusion shared learning and a low illumination pedestrian detection model of multi-task feature fusion shared learning, wherein the pedestrian position in the image is deduced from the low illumination image.

The low-light pedestrian detection method based on the multitask feature fusion shared learning disclosed by the embodiment of the invention comprises the following steps as shown in figure 1:

in the embodiment of the invention, the information can be directly obtained by photographing; CityPersons (a published large-scale pedestrian detection dataset) pedestrian detection training set and verification set can also be reduced in illumination, and in the embodiment of the present invention, an RGB-based spatial brightness adjustment algorithm in OpenCV (a BSD license (open source) -based distributed cross-platform computer vision library) is used, and the algorithm is adjusted based on the current RGB value size, that is, the larger the R, G, B value, the larger the adjustment, for example: the current pixel point is (100,200,50), the adjustment coefficient is 1.1, and then the adjustment is (110,220, 55). In the present embodiment, the adjustment coefficient used is 0.8. After reducing the brightness, the normal and low light CityPersons pedestrian detection training and verification sets were used as training and test sets.

S2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

the normal and low-illumination pedestrian detection training sets are used for independently training the image illumination enhancement network for 100 periods to obtain an image illumination enhancement pre-training model. The network structure of the image illumination enhancement network is shown in fig. 2, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, and the network introduces a Retinex theory. The classical Retinex theory establishes a model for human color perception, and this theory assumes that the observed image can be decomposed into two parts, a reflection channel and an illumination channel. Let S represent the source image, it can be represented by S ═ R × I, where R represents the reflectance component and I represents the illumination component, and × represents the element-by-element multiplication. For the loss function, in order to ensure that the image after restoring the illumination can retain the object edge information and also retain the smooth transition of the illumination information, the following loss functions are used in the illumination enhancement network:

L_enh＝L_recon+λ_irL_ir+λ_isL_is

wherein λ is_irAnd λ_isCoefficients representing the balance of reflectivity and illuminance. Loss function L_recon，L_irAnd L_isRepresenting reconstruction, reflectance and illumination smoothness functions, respectively.

S3: constructing a pedestrian detection network, wherein the pedestrian detection network takes two self-calibration separation attention-based hourglass networks as a backbone network, and trains the pedestrian detection network by using a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB (simple sequence analysis and clustering routine) which is combined with a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;

in the embodiment of the invention, a pedestrian detection training set with normal illumination is used for independently training a pedestrian detection network for 100 periods to obtain a pedestrian detection pre-training model. The network structure of the pedestrian detection network is shown in fig. 2. For the pedestrian detection network, a target detection algorithm thought based on key points is referred, and the target detection algorithm based on the key points generates an object boundary box by detecting and grouping the key points, so that the output of the network is greatly simplified, the need of designing an anchor box is eliminated, and meanwhile, an attention mechanism is introduced into the network to further improve the detection performance. The Self-calibration discrete Attention Block module utilizes an autonomously designed hourglass network as a backbone network of a pedestrian detection network, the hourglass network consists of the Self-calibration discrete Attention Block (SCSAB), and as shown in fig. 3, the Self-calibration convolution and discrete Attention network module effectively collects information of each spatial position in an input image to expand the view field of each convolution layer, so that the detection performance is improved.

For the loss function of the pedestrian detection network, a focal loss function of α -2 and β -4 may be used, and let p be_aijPredict the score at the (i, j) position of class a in the image and let y_aijRaw images were not normalized.

Where N is the number of objects in the image and α and β are the hyper-parameters that control the contribution of each point. C, H and W represent the number of input channels, height and width, respectively.

When an input image passes through a convolutional layer, the output is typically smaller in size than the input image. Thus, location (x, y) in the image maps to location in the heat map

Where n is the downsampling factor. Some accuracy may be lost when remapping the location from the heat map to the input image. To solve this problem, a position offset is predicted, which slightly adjusts the corner point positions and then remaps them to the input resolution.

Wherein o is_kIs an offset, x_kAnd y_kAre the x and y coordinates of the corner point k. One set of offsets is predicted to be shared by the top left corner of all classes and another set of offsets is shared by the bottom right corner. For training, the offset loss is labeled L_offAnd applying the smoothed L1 loss as the offset loss:

wherein,

indicating the predicted offset.

There may be multiple objects in the image and thus multiple upper left and lower right corners may be detected, so it is necessary to determine whether a pair of upper left and lower right corners are from the same bounding box. Order to

For the embedding of the upper left corner of object m,

is the embedding of the lower right corner of object m. The network is trained to group corners using "pull" losses and to separate corners using "push" losses:

wherein e is_mIs that

And

average value of e_mAnd e_jRepresenting the embedding of objects m and j, respectively, e_jIs that

And

is determined by the average value of (a) of (b),

for the embedding of the upper left corner of object j,

for the embedding of the lower right corner of object j, the total loss of training for pedestrian detection is as follows:

L_cor＝L_det+L_pull+ηL_push+γL_off，

wherein eta and gamma are each L_pull，L_pushAnd L_offWeights of three loss functions, L_corIs the total loss of the pedestrian detection network.

S4: based on multi-task feature fusion and sharing learning, designing a multi-task feature fusion module capable of fusing features among different tasks, sharing the features of an image illumination enhancement network and a pedestrian detection network, adding the features of the first 3 x 3 convolution network of the enhancement network and the features of the second last SCSAB of a self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last 3 x 3 convolution network of the enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, then taking the average number, then passing through the sigmoid function, and feeding back to the two networks in the next iteration to construct a low-illumination pedestrian detection network of multi-task feature fusion and sharing learning;

a multi-task learning mechanism with feature fusion and sharing is designed, features between an upstream task and a downstream task can be fused by the multi-task learning mechanism, and the features are fed back to other networks when next iteration is carried out. As shown in fig. 5, the multi-task learning mechanism for feature fusion and sharing is: assume that there are two tasks, task A and task B, respectively, task A being a network of convolutional layers C_A1Is characterized by an output

Convolutional layer C in task B network_B1Is characterized by an output

is C_A2The input characteristics of the convolutional layer(s),

when the value of i is 1, the value of i,

is represented as follows:

when the value of i is greater than 1,

is represented as follows:

F_i-1representing the shared characters obtained by the i-1 st end-to-end round iterationAnd (5) carrying out characterization.

The low-illumination pedestrian detection network for multi-task feature fusion and shared learning comprises: in order to further improve the performance of the two networks, a multitask feature fusion module is introduced on the basis, and the detailed structure of the multitask feature fusion module is shown in fig. 2. Wherein, in i^thIn the iteration, in the multitask feature fusion module, the image illumination is respectively enhanced to the features in the sub-networks

And

features fused into pedestrian detection sub-network

And

and the two fused features are respectively marked as

And

feature(s)

And features

From the first 3 x 3 convolutional layer and the fourth last 3 x 3 convolutional layer in the enhancement network, respectively. Feature(s)

From the characteristics of the second to last SCSAB in the self-calibrating split attention hourglass network

From the first to the secondOne SCSAB. Also, the size of these features is the same.

And

is represented as follows:

then the

And

after the sigmoid function is entered, it is expressed as

And

at i +1^thIn the course of the iteration(s),

are respectively connected with

And

element-by-element multiplication as input to the second convolutional layer of the enhancement network and the last SCSAB of the self-calibrating split attention hourglass network. The input is characterized by being represented as

And

the formula is as follows:

the same fusion and sharing method acts on the input of the last convolution in the enhancement network and the input of the second SCSAB in the self-calibrating split attention hourglass network. The input is characterized by being represented as

And

the formula is as follows:

finally, the training loss function L of the low-illumination pedestrian detection network for multi-task feature fusion shared learning is as follows:

L＝L_det+L_pull+ηL_push+γL_off+ζL_enh，

wherein eta and gamma are each L_pull，L_pushAnd L_offWeights of the three loss functions. ζ is the loss of light enhancement L_enhThe weight of (c). The sum η is set to 0.1, γ is set to 1, and ζ is set to 0.05.

S5: importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by using a normal illumination pedestrian data set and a low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning;

and training the low-illumination pedestrian detection network of the multitask feature fusion shared learning by the normal and low-illumination pedestrian detection training sets, simultaneously importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model which are trained well before, and training for 100 periods to obtain the low-illumination pedestrian detection model of the multitask feature fusion shared learning.

S6: and detecting the image to be detected by using the low-illumination pedestrian detection model of the multi-task feature fusion shared learning to obtain the position of the pedestrian in the image. And inputting the low-illumination test set picture into a trained multi-task feature fusion and shared learning low-illumination pedestrian detection model for reasoning, and framing the position of the pedestrian in the image.

The invention also provides a low-light pedestrian detection system based on multitask feature fusion shared learning, which is used for realizing the low-light pedestrian detection method based on multitask feature fusion shared learning, and as shown in fig. 4, the low-light pedestrian detection system based on multitask feature fusion shared learning comprises:

a data set module 101, configured to obtain a normal-illumination pedestrian data set and a low-illumination pedestrian data set;

the image illumination enhancement module 102 is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;

the pedestrian detection module 103 is used for constructing a pedestrian detection network, the pedestrian detection network takes two self-calibration separation attention-based hourglass networks as a backbone network, and the pedestrian detection network is trained by using a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;

the multitask feature fusion module 104 is used for fusing features between upstream and downstream tasks, sharing features of the image illumination enhancement network and the pedestrian detection network, adding the features of the first 3 x 3 convolution network of the enhancement network and the features of the second last SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last 3 x 3 convolution network of the enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through the sigmoid function, feeding back to the two networks in the next iteration, and constructing the multitask feature fusion shared learning low-illumination pedestrian detection network;

the model training module 105 is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by using a normal illumination pedestrian data set and a low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning;

and the image detection module 106 is configured to detect the image to be detected by using the low-illumination pedestrian detection model of the multitask feature fusion shared learning, so as to obtain the position of the pedestrian in the image.

Further, an image illumination enhancement network is constructed based on the RetinexNet convolutional neural network; a pedestrian detection network is constructed based on the self-calibration separation attention block provided by the invention.

The present invention also provides a computer storage medium having stored therein a computer program executable by a computer processor, the computer program executing the low-light pedestrian detection model method based on multitask feature fusion shared learning described above.

The invention is most suitable forThe test example is provided using normal and low light CityPersons data sets, including 2975 pictures each for normal and low light training sets and 500 pictures each for normal and low light test sets. The experiment was implemented in a pytoreh and trained using 3 RTX 2080Ti graphics, while applying Adam optimization algorithm. In the selection of experimental parameters, the set learning rate is 0.0001. The evaluation index complies with the evaluation standard of the california institute of technology: log mean Miss Rate (MR) per image^-2)，MR^-2The lower the value of (a), the better the algorithm performance. By MR^-2And the evaluation index is compared with other excellent pedestrian detection algorithms to prove the superiority of the invention. TABLE 1 by MR^-2The evaluation indexes show the results of the comparison, and fig. 6 is a comparison graph of the results of the detection of the pedestrian position, in which (a) shows the input image, (b) shows the CSP detection result, (c) shows the ALFNet detection result, (d) shows the cenet detection result, (e) shows the CornerNet detection result, (f) shows the CornerNet-Saccade detection result, (g) shows the detection result of the present invention, and (h) shows the Benchmark detection result.

The invention selects some excellent pedestrian detection or target detection methods, such as: CSP, ALFNet, CenterNet, CornerNet, and CornerNet-Saccade. ALFNet is the most representative algorithm for pedestrian detection using anchor boxes, and CSP and CenterNet are the best algorithms for pedestrian detection and target detection based on a central point, respectively. Meanwhile, CornerNet and CornerNet-Saccade are representative algorithms of the corner-based object detection method. All algorithms in table 1 train the algorithms with normal and low-light training sets, so all of these pedestrian detection networks have the capability of processing low-light images, and the fairness of the experiment is ensured. In addition, in order to further explain the function of the multitask feature fusion module in the algorithm of the patent, the algorithm in table 2 cascades the RetinexNet illumination enhancement algorithm and the detection algorithm together in a cascading manner. As can be seen from the results in Table 2, the indexes of the cascade method are still not superior to the algorithm of the invention, which proves that the multitask feature fusion module plays an important role.

Table 1 comparison of the present invention with five excellent algorithms

Table 2 comparison of the results of the present invention with five excellent algorithms cascaded by RetinexNet

From the experimental results of the table, the algorithm has obvious advantages compared with the other five methods.

The parts not described in the specification are prior art or common general knowledge. The present embodiments are illustrative only and not intended to limit the scope of the present invention, and modifications and equivalents thereof by those skilled in the art are considered to fall within the scope of the present invention as set forth in the claims.

It should be noted that, according to the implementation requirement, each step/component described in the present application can be divided into more steps/components, and two or more steps/components or partial operations of the steps/components can be combined into new steps/components to achieve the purpose of the present invention.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A low-light pedestrian detection method based on multitask feature fusion shared learning is characterized by comprising the following steps:

2. The method according to claim 1, wherein step S6 includes:

3. The method of claim 1, wherein the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein a loss function L of the image illumination enhancement network_enhComprises the following steps: l is_enh＝L_recon+λ_irL_ir+λ_isL_is，λ_irAnd λ_isIs a weight coefficient, L_recon，L_irAnd L_isRepresenting reconstruction, reflectance and illumination smoothness loss functions, respectively.

4. The method of claim 3, wherein the loss function L of the pedestrian detection network_corComprises the following steps: l is_cor＝L_det+L_pull+ηL_push+γL_offWherein, η and γ are each L_pull，L_pushAnd L_offWeights of three loss functions, L_detFor corner loss, L_pullFor grouping corners, L_pushSeparation of the diagonal, L_offIs the offset loss.

5. The method of claim 4,

L_detfor corner loss, N is the number of objects in the image, α and β are the hyperparameters controlling the contribution of each corner, C, H and W represent the number of channels input, respectively, highDegree and width, p_aijIs the score at the (i, j) position of class a in the predicted image, y_aijIs an original image without normalization;

L_offfor offset loss, o_kIs an offset, x_kAnd y_kIs the x and y coordinates of the corner point k, n is the down-sampling factor,

representing a predicted offset;

for the embedding of the upper left corner of object m,

for embedding in the lower right corner of object m, e_mIs composed of

And

6. The method of claim 1, further comprising: and designing a multi-task learning mechanism with feature fusion and sharing, wherein the mechanism can fuse features between an upstream task and a downstream task and feed back the features to other networks during the next iteration.

7. The method of claim 6, wherein the feature fusion and sharing multitask learning mechanism is: assume that there are two tasks, task A and task B, respectively, task A being a network of convolutional layers C_A1Is characterized by an output

Convolutional layer C in task B network_B1Is characterized by an output

is C_A2The input characteristics of the convolutional layer(s),

when the value of i is 1, the value of i,

is represented as follows:

when the value of i is greater than 1,

is represented as follows:

8. The method of claim 5, wherein the overall training loss function of the multitask feature shared low-light pedestrian detection network is: l ═ L_det+L_cor＝L_det+L_pull+ηL_push+γL_off+ζL_enhL is the total loss, ζ is the illumination enhancement loss L_enhThe weight of (c).

9. A low-light pedestrian detection system based on multitask feature fusion shared learning is characterized by comprising the following components in parts by weight:

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the low-light pedestrian detection method based on multitask feature fusion shared learning according to any one of claims 1 to 8.