CN112069983B - Low-light pedestrian detection method and system for multi-task feature fusion sharing learning - Google Patents

Low-light pedestrian detection method and system for multi-task feature fusion sharing learning Download PDF

Info

Publication number
CN112069983B
CN112069983B CN202010917093.XA CN202010917093A CN112069983B CN 112069983 B CN112069983 B CN 112069983B CN 202010917093 A CN202010917093 A CN 202010917093A CN 112069983 B CN112069983 B CN 112069983B
Authority
CN
China
Prior art keywords
network
illumination
pedestrian detection
low
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010917093.XA
Other languages
Chinese (zh)
Other versions
CN112069983A (en
Inventor
卢涛
王元植
张彦铎
吴云韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Original Assignee
Wuhan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology filed Critical Wuhan Institute of Technology
Priority to CN202010917093.XA priority Critical patent/CN112069983B/en
Publication of CN112069983A publication Critical patent/CN112069983A/en
Application granted granted Critical
Publication of CN112069983B publication Critical patent/CN112069983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a low-light pedestrian detection method and a system based on multi-task feature fusion sharing learning, which acquire a normal low-light pedestrian data set; pre-training the image illumination enhancement network by using a normal low-illumination pedestrian data set; pre-training a pedestrian detection network by utilizing a normal illumination pedestrian data set; designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks, carrying out feature fusion and sharing on two networks, and constructing a low-light pedestrian detection network for multi-task feature fusion sharing learning; the method comprises the steps of importing two pre-training models into a low-illumination pedestrian detection network, and training by using a normal low-illumination data set to obtain a low-illumination pedestrian detection model with multi-task feature fusion sharing learning; and detecting the detected image by using a multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image. The invention can accurately and efficiently detect the position of the traveler in the low-illumination image.

Description

Low-light pedestrian detection method and system for multi-task feature fusion sharing learning
Technical Field
The invention belongs to the technical field of computer vision target detection, and particularly relates to a low-light pedestrian detection method and system based on multi-task feature fusion sharing learning.
Background
With the rapid development of world economy, frequent trips of personnel among different regions, different cities and different countries are brought, and accordingly public safety hidden trouble causes related safety departments to consume much energy. Video surveillance devices, which are an important component of urban security systems, are widely used today and installed in public areas such as roads, streets, schools, malls, etc. The equipment is mainly used for recording and storing things happening in related places, is convenient for people to complete requirements of remote monitoring, emergency command and the like, and ensures public safety of society. Pedestrians are main bodies in video monitoring, and research and analysis of behaviors of pedestrians by using intelligent technology are important components of the intelligent monitoring technology. Pedestrian detection is one of the key technologies in this field. Pedestrian detection is an important issue in the field of computer vision, with great success in recent years under controlled conditions.
In each application scene in the field of computer vision, pedestrians are very important analysis targets, and an important priori condition that a machine can complete subsequent tasks or interact with human beings is that the pedestrians targets in the behavior environment can be correctly identified. Pedestrian detection can be directly applied to scenes such as indoor and outdoor mobile robots, automobile automatic driving, security monitoring and the like, so that the pedestrian detection attracts attention in recent years. Pedestrian detection has a significant impact on many applications such as driver assistance systems, robotic navigation and intelligent video surveillance systems. The continued development of deep learning has contributed significantly to the performance of pedestrian detection.
Generally, pedestrian detection is a particular field of object detection. From this point of view, pedestrian detection algorithms in deep learning can be classified into two categories: anchor boxes based methods and keypoint based methods. Convolutional neural networks (Convolutional Neural Networks, CNN) were first introduced in RCNN to target detection, which can be performed without requiring manual design features. The keypoint-based target detection algorithm generates a target bounding box by detecting and grouping keypoints. This greatly simplifies the output of the network and eliminates the need to design anchor boxes.
Although the pedestrian detection algorithm described above still achieves satisfactory performance under normal lighting conditions. They mostly do not consider pedestrian detection in low-light environments. In practical applications, normal lighting conditions are not always guaranteed. In contrast, low-light environments are very common. The main reason for poor pedestrian detection performance in low light conditions is that color and texture information in input signals obtained in low light conditions is severely distorted. However, color and texture information plays a critical role in pedestrian detection. In order to solve the problem of low-light pedestrian detection, infrared pedestrian detection is studied at present, but the infrared pedestrian detection needs to be carried out under an infrared image, and the original low-light image detection cannot be directly used, so that the detection cost is greatly increased.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a low-light pedestrian detection method and system based on multi-task feature fusion sharing learning, which solve the problem of poor pedestrian detection effect in a low-light environment.
In order to achieve the above object, according to one aspect of the present invention, there is provided a low-light pedestrian detection method based on multi-task feature fusion sharing learning, including:
s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
s2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
s3: constructing a self-calibration split attention module SCSAB, wherein the self-calibration split attention module combines a self-calibration convolution network and a split attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;
s4: constructing a self-calibration separation attention hourglass network, wherein a basic module of the self-calibration separation attention hourglass network consists of SCSAB;
s5: constructing a pedestrian detection network, wherein the self-calibration separation attention hourglass network is used as a backbone network in the pedestrian detection network, and the pedestrian detection network is trained by utilizing the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model;
s6: based on the multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks, and carrying out feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network to construct a low-illumination pedestrian detection network with the multi-task feature fusion shared learning;
s7: and importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion sharing learning, training the low-illumination pedestrian detection network for multi-task feature fusion sharing learning by utilizing the normal-illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion sharing learning, and detecting an image to be detected through the low-illumination pedestrian detection model for multi-task feature fusion sharing learning to obtain the position of a pedestrian in the image to be detected.
In some alternative embodiments, step S6 includes:
and adding the features of the first convolution network of the image illumination enhancement network and the features of the second last SCSAB of the self-calibration separation attention hourglass network, taking an average, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last convolution network of the image illumination enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, taking an average, then passing through the sigmoid function, and feeding back to the two networks in the next iteration, thereby constructing the multi-task feature fusion shared learning low-illumination pedestrian detection network.
In some alternative embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein the loss function L of the image illumination enhancement network enh The method comprises the following steps: l (L) enh =L reconir L iris L is ,λ ir And lambda (lambda) is As the weight coefficient, L recon ,L ir And L is Representing reconstruction, reflectance and illumination smoothness loss functions, respectively.
In some alternative embodiments, the loss function L of the pedestrian detection network cor The method comprises the following steps: l (L) cor =L det +δL pull +ηL push +γL off Wherein, delta, eta and gamma are L respectively pull ,L push And L off Weights of three loss functions, L det For corner losses, L pull For grouping diagonally, L push Diagonal separation, L off Is the offset penalty.
In some of the alternative embodiments of the present invention,
L det for corner loss, N is the number of objects in the image, alpha and beta are hyper-parameters controlling the contribution of each corner, C, H and W represent the number of channels, height and width, p, respectively, of the input aij To predict the score at the (i, j) position of class a in the image, y aij Is an unnormalized original image;
L off for offset loss o k Is the offset of the label, x k And y k Is the x and y coordinates of corner k, n is the downsampling factor,representing the difference between the predicted offset and the tag offset calculated using the smoothing loss, ++>Representing the predicted offset;
L pull for grouping diagonally, L push Separating the angles, m represents the number of objects, +.>For embedding the upper left corner of object m, < >>For embedding the lower right corner of object m, e m Is thatAnd->Average value of e m And e j Representing the embedding of objects m and j, respectively.
In some alternative embodiments, the method further comprises: and designing a characteristic fusion and sharing multi-task learning mechanism, wherein the mechanism can fuse the characteristics between upstream and downstream tasks and feed back the characteristics to other networks in the next iteration.
In some alternative embodiments, the feature fusion and sharing multitasking learning mechanism is: assume that there are two tasks, task A and task B, respectively, a convolutional layer C in the task A network A1 Is characterized byConvolutional layer C in a mission B network B1 The output characteristic of (2) is->C A2 And C B2 Respectively C A1 And C B1 Is the next convolutional layer of (a)>Is C A2 Input features of convolutional layer,/->Is C B2 Input features of convolutional layer, F i The shared features obtained for the ith end-to-end round iteration are represented as follows: />When i=1, _a->Is represented as follows: />When i > 1, ">Is represented as follows: />F i-1 Representing the sharing characteristics obtained by the i-1 th end-to-end round iteration.
In some alternative embodiments, the total training loss function of the multi-tasking feature-shared low-light pedestrian detection network is: l=l det +L cor =L det +δL pull +ηL push +γL off +ζL enh L is total loss, ζ is illumination enhancement loss L enh Is a weight of (2).
According to another aspect of the present invention, there is provided a low-light pedestrian detection system based on multi-task feature fusion sharing learning, including:
the data set module is used for acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
the image illumination enhancement module is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
the pedestrian detection module is used for constructing a pedestrian detection network, the pedestrian detection network takes a self-calibration separation attention hourglass network as a main network, the pedestrian detection network is trained by utilizing the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines the self-calibration convolution network and the separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;
the multi-task feature fusion module is used for designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks based on multi-task feature fusion sharing learning, and carrying out feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network to construct a low-illumination pedestrian detection network for multi-task feature fusion sharing learning;
the model training module is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into the low-illumination pedestrian detection network for the multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for the multi-task feature fusion shared learning by utilizing the normal-illumination pedestrian data set and the low-illumination pedestrian data set to obtain the low-illumination pedestrian detection model for the multi-task feature fusion shared learning;
and the image detection module is used for detecting the image to be detected by utilizing the multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image to be detected.
In some alternative embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network.
In some alternative embodiments, a pedestrian detection network based on self-calibrating split attention is constructed.
According to another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the low-light pedestrian detection method based on the multi-tasking feature fusion sharing learning of any one of the above.
In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:
the invention provides a low-light pedestrian detection method and system based on multi-task feature fusion sharing learning, which can accurately and efficiently detect the position of a pedestrian in a low-light image; and creatively provides a self-calibration separation attention module which combines self-calibration convolution and a separation attention network to effectively collect information of each spatial position in an input image so as to expand the field of view of each convolution layer, thereby improving the detection performance.
Drawings
Fig. 1 is a schematic flow diagram of a low-light pedestrian detection method based on multi-task feature fusion sharing learning provided by an embodiment of the invention;
FIG. 2 is a block diagram of a low-light pedestrian detection network based on multi-task feature fusion sharing learning;
FIG. 3 is a diagram of a self-calibrating split attention block architecture in accordance with the present invention;
FIG. 4 is a schematic diagram of a low-light pedestrian detection system based on multi-task feature fusion sharing learning of the present invention;
FIG. 5 is a workflow diagram of a feature fusion and sharing multi-task learning mechanism in accordance with the present invention;
FIG. 6 is a graph showing a comparison of test results according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention is mainly divided into four parts: the image illumination enhancement pre-training model, the pedestrian detection pre-training model, the multi-task feature fusion shared learning low-illumination pedestrian detection model and the multi-task feature fusion shared learning low-illumination pedestrian detection model are used for reasoning the pedestrian position in the image from the low-illumination image.
The low-light pedestrian detection method based on the multi-task feature fusion sharing learning, as shown in fig. 1, comprises the following steps:
s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
in the embodiment of the invention, the image can be obtained directly through photographing; the CityPersons (a published large-scale pedestrian detection dataset) pedestrian detection training set and verification set may also be de-illuminated, in embodiments of the invention using the RGB-based spatial luminance adjustment algorithm in OpenCV (a across platform computer vision library issued based on BSD permissions (open sources)), which adjusts based on the current RGB value size, i.e., the larger the R, G, B value, the larger the adjustment, for example: the current pixel point is (100,200,50), the adjustment coefficient is 1.1, and the current pixel point is (110,220,55) after adjustment. In the embodiment of the invention, the adjustment coefficient used is 0.8. After the brightness is reduced, the normal and low-light CityPersons pedestrian detection training set and the verification set are used as the training set and the test set.
S2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
and training the image illumination enhancement network for 100 periods independently by using the normal and low-illumination pedestrian detection training sets to obtain an image illumination enhancement pre-training model. The network structure of the image illumination enhancement network is shown in fig. 2, and the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, and the Retinex theory is introduced into the network. The classical Retinex theory builds a model of human color perception, which assumes that the observed image can be decomposed into two parts, the reflection channel and the illumination channel. Let S represent the source image, it can be represented by s=r×i, where R represents the reflectivity component, I represents the illumination component, and x represents the element-by-element multiplication. For the loss function, in order to ensure that the image after illumination recovery can retain the edge information of the object and also retain the smooth transition of the illumination information, the following loss function is used in the illumination enhancement network:
L enh =L reconir L iris L is
wherein lambda is ir And lambda (lambda) is Representing the coefficient of balance reflectivity and illuminance. Loss function L recon ,L ir And L is Representing reconstruction, reflectance and illumination smoothness functions, respectively.
S3: constructing a pedestrian detection network, wherein the pedestrian detection network takes two self-calibration separation attention-based hourglass networks as a main network, and trains the pedestrian detection network by utilizing a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, a basic module of the self-calibration separation attention-based hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;
in the embodiment of the invention, the pedestrian detection training set with normal illumination is used for training the pedestrian detection network for 100 periods independently, so as to obtain a pedestrian detection pre-training model. The network structure of the pedestrian detection network is shown in fig. 2. For the pedestrian detection network, the idea of a target detection algorithm based on key points is consulted, the target detection algorithm based on the key points generates an object boundary box by detecting and grouping the key points, the output of the network is greatly simplified, the need of designing an anchor frame is eliminated, and meanwhile, a attention mechanism is introduced into the network to further improve the detection performance. The Self-calibration split attention block (Self-Calibrated Split Attention Block, SCSAB) provided by the invention is combined with the Self-calibration convolution and split attention network as shown in fig. 3, so that the information of each spatial position in an input image is effectively collected to expand the field of view of each convolution layer, and the detection performance is improved.
For the loss function of the pedestrian detection network, a focal loss function of α=2, β=4 can be used, let p aij To predict the score at the (i, j) position of class a in the image and let y aij Is an unnormalized original image.
Where N is the number of objects in the image and α and β are hyper-parameters that control the contribution of each point. C, H and W represent the number of channels, height and width, respectively, of the input.
The size of the output is typically smaller than the input image as it passes through the convolutional layer. Thus, the position (x, y) in the image maps to a position in the heat mapWhere n is the downsampling factor. Some accuracy may be lost when remapping locations from a heat map to an input image. To solve this problem, a position offset is predicted, which slightly adjusts the corner positions and then remaps them to the input resolution.
Wherein o is k Is an offset, x k And y k Is the x and y coordinates of corner k. One set of offsets is predicted to be shared by the top left corner of all categories, and the other set of offsets is predicted to be shared by the bottom right corner. For training purposes, the offset loss is marked as L off And applies the smoothed L1 loss as an offset loss:
wherein,representing the difference between the predicted offset and the tag offset calculated using the smoothing loss, ++>Representing the predicted offset.
There may be multiple objects in the image so that multiple upper left and lower right corners may be detected, so it is necessary to determine whether a pair of upper left and lower right corners are from the same bounding box. Order theFor embedding the upper left corner of object m, < >>Is the embedding of the lower right corner of object m. The network is trained to group diagonals using the "pull" penalty and split diagonals using the "push" penalty:
wherein e m Is thatAnd->Average value of e m And e j Embedding e representing objects m and j, respectively j Is->And->Average value of>For embedding in the upper left corner of object j, < >>For embedding in the lower right corner of object j, the total loss of training for pedestrian detection is as follows:
L cor =L det +δL pull +ηL push +γL off
wherein δ, η and γ are each L pull ,L push And L off Weights of three loss functions, L cor Is the total loss of the pedestrian detection network.
S4: based on the multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features among different tasks, carrying out feature sharing on an image illumination enhancement network and a pedestrian detection network, adding the features of a first 3*3 convolution network of the enhancement network and the features of a penultimate SCSAB (secondary serial standard off-set) of a self-calibration separation attention hourglass network, taking an average value, then feeding back the average value to the two networks through a sigmoid function in the next iteration, adding the features of a penultimate 3*3 convolution network of the enhancement network and the features of a first SCSAB of the self-calibration separation attention hourglass network, taking the average value, feeding back the average value to the two networks through a sigmoid function in the next iteration, and constructing a low-illumination pedestrian detection network with the multi-task feature fusion shared learning;
the multi-task learning mechanism with feature fusion and sharing is designed, and the mechanism can fuse features between upstream tasks and downstream tasks and feed the features back to other networks in the next iteration. As shown in fig. 5, the feature fusion and sharing multi-task learning mechanism is: assume that there are two tasks, dividedThe tasks are task A and task B, and a convolution layer C in the task A network A1 Is characterized byConvolutional layer C in a mission B network B1 The output characteristic of (2) is->C A2 And C B2 Respectively C A1 And C B1 Is the next convolutional layer of (a)>Is C A2 Input features of convolutional layer,/->Is C B2 Input features of convolutional layer, F i The shared features obtained for the ith end-to-end round iteration are represented as follows: />When i=1, _a->Is represented as follows:when i > 1, ">Is represented as follows: />F i-1 Representing the sharing characteristics obtained by the i-1 th end-to-end round iteration.
The low-light pedestrian detection network for multi-task feature fusion sharing learning comprises: in order to further improve the performances of the two networks, a multi-task feature fusion module and a multi-arbitrary feature fusion module are introduced on the basisThe detailed structure of the service feature fusion module is shown in fig. 2. Wherein, at i th In iteration, in the multi-task feature fusion module, features in the image illumination enhancement sub-network are respectively processedAnd->Feature fused into pedestrian detection subnetwork>And->And the two fused features are marked +.>And->Characteristics->And features->From the first 3*3 and fourth last 3*3 convolutional layers, respectively, in the enhanced network. Characteristics->From the penultimate SCSAB in self-calibrating split attention hourglass network, and property ≡>From the first SCSAB. At the same time, the sizes of these features are the same. />And->The expression is as follows:
thenAnd->After the sigmoid function, it is denoted +.>And->At i+1 th In iteration, the->Respectively and->And->Element-wise multiplication serves as an input to the second convolutional layer of the enhancement network and as an input to the last SCSAB of the self-calibrating split attention hourglass network. The input is characterized by->Andthe formula is as follows:
The same fusion and sharing method acts on the input of the last convolution in the enhanced network and the input of the second SCSAB in the self-calibrating split attention hourglass network. The input features are expressed asAnd->The formula is as follows:
finally, the training loss function L of the low-light pedestrian detection network for the multi-task feature fusion sharing learning is as follows:
L=L det +δL pull +ηL push +γL off +ζL enh
wherein δ, η and γ are each L pull ,L push And L off Weights of three loss functions. Zeta is the loss of illumination enhancement L enh Is a weight of (2). Let δ and η be 0.1, γ be 1 and ζ be 0.05.
S5: the method comprises the steps of importing an image illumination enhancement pre-training model and a pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion and sharing learning, and training the low-illumination pedestrian detection network for the multi-task feature fusion and sharing learning by utilizing a normal-illumination pedestrian data set and a low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for the multi-task feature fusion and sharing learning;
training the low-light pedestrian detection network with the multi-task feature fusion shared learning by using the normal and low-light pedestrian detection training sets, and simultaneously importing the pre-training model with the pre-trained image illumination enhancement and the pre-training model with the pedestrian detection, and training for 100 periods to obtain the low-light pedestrian detection model with the multi-task feature fusion shared learning.
S6: and detecting the image to be detected by using the multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image. And inputting the low-illumination test set picture into a trained multi-task feature fusion shared learning low-illumination pedestrian detection model for reasoning, and framing the position of the pedestrian in the image.
The invention also provides a low-light pedestrian detection system based on the multi-task feature fusion sharing learning for realizing the low-light pedestrian detection method based on the multi-task feature fusion sharing learning, as shown in fig. 4, comprising:
a data set module 101, configured to acquire a normal illumination pedestrian data set and a low illumination pedestrian data set;
the image illumination enhancement module 102 is configured to construct an image illumination enhancement network, where the image illumination enhancement network includes a decomposition network and an enhancement network, and training the image illumination enhancement network by using a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
the pedestrian detection module 103 is configured to construct a pedestrian detection network, the pedestrian detection network uses two self-calibration separation attention-based hourglass networks as a backbone network, and trains the pedestrian detection network by using a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention-based hourglass network is composed of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;
the multi-task feature fusion module 104 is configured to fuse features between upstream and downstream tasks, perform feature sharing on the image illumination enhancement network and the pedestrian detection network, add the features of the first 3*3 convolution network of the enhancement network and the features of the penultimate SCSAB of the self-calibration separation attention hourglass network, average the features, feed back the features to the two networks through a sigmoid function in the next iteration, add the features of the penultimate 3*3 convolution network of the enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, average the features, feed back the features to the two networks through a sigmoid function in the next iteration, and construct a low-illumination pedestrian detection network with multi-task feature fusion sharing learning;
the model training module 105 is configured to import the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion and sharing learning, and train the low-illumination pedestrian detection network for multi-task feature fusion and sharing learning by using the normal illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion and sharing learning;
the image detection module 106 is configured to detect an image to be detected by using the low-light pedestrian detection model of the multi-task feature fusion sharing learning, so as to obtain the position of the pedestrian in the image.
Further, constructing an image illumination enhancement network based on the RetinexNet convolutional neural network; the self-calibration separation attention block provided by the invention is used for constructing a pedestrian detection network.
The invention also provides a computer storage medium, in which a computer program executable by a computer processor is stored, the computer program executes the low-light pedestrian detection model method based on the multi-task feature fusion sharing learning.
The invention finally provides a test embodiment employing normal and low light CityPersons datasets comprising 2975 pictures each of a normal and low light training set, normal and low light measurementsThe test set was 500 pictures each. Experiments were carried out in Pytorch and training was performed using 3 RTX 2080Ti graphics cards, while Adam optimization algorithm was applied. The learning rate was set to 0.0001 in the selection of experimental parameters. The evaluation index follows the evaluation standard of the california academy of technology: log mean Miss Rate (MR) for each image -2 ),MR -2 The lower the value of (c) the better the algorithm performance. By MR -2 The evaluation index is compared with other excellent pedestrian detection algorithms to prove the superiority of the invention. TABLE 1 passage through MR -2 The evaluation index shows the result of comparison, and fig. 6 is a graph showing the result of pedestrian position detection, in which (a) represents an input image, (b) represents a CSP detection result, (c) represents an ALFNet detection result, (d) represents a centrnet detection result, (e) represents a CornerNet detection result, (f) represents a CornerNet-Saccade detection result, (g) represents a detection result of the present invention, and (h) represents a Benchmark detection result.
The present invention selects some excellent pedestrian detection or target detection methods, such as: CSP, ALFNet, centerNet, cornerNet and CornerNet-Saccade. ALFNet is the most representative algorithm for pedestrian detection using anchors, CSP and center net are the best algorithms for center-point based pedestrian detection and target detection, respectively. Meanwhile, cornerNet and CornerNet-Saccade are representative algorithms of corner-based target detection methods. All algorithms in table 1 will train the algorithm with normal and low light training sets, so all these pedestrian detection networks have the ability to process low light images, guaranteeing the fairness of the experiment. In addition, in order to further illustrate the role of the multi-task feature fusion module in the algorithm of the present patent, the algorithm in table 2 adopts a cascade connection manner to cascade the RetinexNet illumination enhancement algorithm and the detection algorithm together. As can be seen from the results in table 2, the index of the cascading method is still not superior to the algorithm of the present invention, which proves that the multi-task feature fusion module plays an important role.
Table 1 comparison results table of the present invention with five excellent algorithms
Table 2 comparison results table of the present invention with five excellent algorithms cascaded by RetinexNet
From the experimental results of the table, the algorithm has obvious advantages compared with other five methods.
The non-illustrated portions of the specification are prior art or common general knowledge. The present embodiment is only for illustrating the present invention and not for limiting the scope of the present invention, and those skilled in the art will recognize that equivalent substitutions and modifications to the present invention are within the scope of the appended claims.
It should be noted that each step/component described in the present application may be split into more steps/components, or two or more steps/components or part of the operations of the steps/components may be combined into new steps/components, as needed for implementation, to achieve the object of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The low-light pedestrian detection method based on the multi-task feature fusion sharing learning is characterized by comprising the following steps of:
s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
s2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
s3: constructing a self-calibration split attention module SCSAB, wherein the self-calibration split attention module combines a self-calibration convolution network and a split attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;
s4: constructing a self-calibration separation attention hourglass network, wherein a basic module of the self-calibration separation attention hourglass network consists of SCSAB;
s5: constructing a pedestrian detection network, wherein the self-calibration separation attention hourglass network is used as a backbone network in the pedestrian detection network, and the pedestrian detection network is trained by utilizing the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model;
s6: based on the multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks, and carrying out feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network to construct a low-illumination pedestrian detection network with the multi-task feature fusion shared learning;
s7: and importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion sharing learning, training the low-illumination pedestrian detection network for multi-task feature fusion sharing learning by utilizing the normal-illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion sharing learning, and detecting an image to be detected through the low-illumination pedestrian detection model for multi-task feature fusion sharing learning to obtain the position of a pedestrian in the image to be detected.
2. The method according to claim 1, wherein step S6 comprises:
and adding the features of the first convolution network of the image illumination enhancement network and the features of the second last SCSAB of the self-calibration separation attention hourglass network, taking an average, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last convolution network of the image illumination enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, taking an average, then passing through the sigmoid function, and feeding back to the two networks in the next iteration, thereby constructing the multi-task feature fusion shared learning low-illumination pedestrian detection network.
3. The method according to claim 1, wherein the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein a loss function L of the image illumination enhancement network enh The method comprises the following steps: l (L) enh =L reconir L iris L is ,λ ir And lambda (lambda) is As the weight coefficient, L recon ,L ir And L is Representing reconstruction, reflectance and illumination smoothness loss functions, respectively.
4. A method according to claim 3, characterized in that the loss function L of the pedestrian detection network cor The method comprises the following steps: l (L) cor =L det +δL pull +ηL push +γL off Wherein, delta, eta and gamma are L respectively pull ,L push And L off Weights of three loss functions, L det For corner losses, L pull For grouping diagonally, L push Diagonal separation, L off Is the offset penalty.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
L det for corner loss, N is the number of objects in the image, alpha and beta are hyper-parameters controlling the contribution of each corner, C, H and W represent the number of channels, height and width, p, respectively, of the input aij To predict the score at the (i, j) position of class a in the image, y aij Is an unnormalized original image;
L off for offset loss o k Is an offset, x k And y k Is the x and y coordinates of corner k, n is the downsampling factor,representing the difference between the predicted offset and the tag offset calculated using the smoothing loss, ++>Representing the predicted offset;
L pull for grouping diagonally, L push Separating the angles, m represents the number of objects, +.>For embedding the upper left corner of object m, < >>For embedding the lower right corner of the object m, e m Is->Andaverage value of e m And e j Representing the embedding of objects m and j, respectively.
6. The method according to claim 1, wherein the method further comprises: and designing a characteristic fusion and sharing multi-task learning mechanism, wherein the mechanism can fuse the characteristics between upstream and downstream tasks and feed back the characteristics to other networks in the next iteration.
7. The method of claim 6, wherein the feature fusion and sharing multi-task learning mechanism is: assume that there are two tasks, task A and task B, respectively, a convolutional layer C in the task A network A1 Is characterized byConvolutional layer C in a mission B network B1 The output characteristic of (2) is->C A2 And C B2 Respectively C A1 And C B1 Is the next convolutional layer of (a)>Is C A2 Input features of convolutional layer,/->Is C B2 Input features of convolutional layer, F i The shared features obtained for the ith end-to-end round iteration are represented as follows: />When i=1, _a->Is represented as follows:when i > 1, ">Is represented as follows: />F i-1 Representing the sharing characteristics obtained by the i-1 th end-to-end round iteration.
8. The method of claim 5, wherein the total training loss function of the multi-tasking feature-shared low-light pedestrian detection network is: l=l det +L cor =L det +δL pull +ηL push +γL off +ζL enh L is total loss, ζ is illumination enhancement loss L enh Is a weight of (2).
9. A low-light pedestrian detection system based on multi-task feature fusion and sharing learning, comprising:
the data set module is used for acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
the image illumination enhancement module is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
the pedestrian detection module is used for constructing a pedestrian detection network, the pedestrian detection network takes a self-calibration separation attention hourglass network as a main network, the pedestrian detection network is trained by utilizing the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines the self-calibration convolution network and the separation attention network and is used for collecting information of each spatial position in an input image so as to expand the field of view of each convolution layer;
the multi-task feature fusion module is used for designing a multi-task feature fusion module capable of fusing features between upstream tasks and downstream tasks based on multi-task feature fusion sharing learning, and carrying out feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network to construct a low-illumination pedestrian detection network for multi-task feature fusion sharing learning;
the model training module is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into the low-illumination pedestrian detection network for the multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for the multi-task feature fusion shared learning by utilizing the normal-illumination pedestrian data set and the low-illumination pedestrian data set to obtain the low-illumination pedestrian detection model for the multi-task feature fusion shared learning;
and the image detection module is used for detecting the image to be detected by utilizing the multi-task feature fusion shared learning low-illumination pedestrian detection model to obtain the position of the pedestrian in the image to be detected.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the method for low-light pedestrian detection based on multi-tasking feature fusion sharing learning of any of claims 1 to 8.
CN202010917093.XA 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning Active CN112069983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010917093.XA CN112069983B (en) 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010917093.XA CN112069983B (en) 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Publications (2)

Publication Number Publication Date
CN112069983A CN112069983A (en) 2020-12-11
CN112069983B true CN112069983B (en) 2024-03-26

Family

ID=73666641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010917093.XA Active CN112069983B (en) 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Country Status (1)

Country Link
CN (1) CN112069983B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862713B (en) * 2021-02-02 2022-08-09 山东师范大学 Attention mechanism-based low-light image enhancement method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317244A (en) * 2014-09-28 2015-01-28 北京理工大学 Reconfigurable manufacturing system part family construction method
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014222863A (en) * 2013-05-14 2014-11-27 キヤノン株式会社 Imaging apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317244A (en) * 2014-09-28 2015-01-28 北京理工大学 Reconfigurable manufacturing system part family construction method
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于双目立体视觉和SVM算法行人检测方法;陈双玉等;《华中科技大学学报(自然科学版)》;20151016;第141页至第143页 *

Also Published As

Publication number Publication date
CN112069983A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
Kumar et al. A new vehicle tracking system with R-CNN and random forest classifier for disaster management platform to improve performance
CN112233097B (en) Road scene other vehicle detection system and method based on space-time domain multi-dimensional fusion
CN111814595B (en) Low-illumination pedestrian detection method and system based on multi-task learning
US8213679B2 (en) Method for moving targets tracking and number counting
CN112084869B (en) Compact quadrilateral representation-based building target detection method
CN110781838A (en) Multi-modal trajectory prediction method for pedestrian in complex scene
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
US20210319242A1 (en) Dense and Discriminative Neural Network Architectures for Improved Object Detection and Instance Segmentation
CN112686207A (en) Urban street scene target detection method based on regional information enhancement
CN111539351A (en) Multi-task cascaded face frame selection comparison method
CN113962246A (en) Target detection method, system, equipment and storage medium fusing bimodal features
CN116385761A (en) 3D target detection method integrating RGB and infrared information
CN113052108A (en) Multi-scale cascade aerial photography target detection method and system based on deep neural network
CN114202803A (en) Multi-stage human body abnormal action detection method based on residual error network
CN115760921A (en) Pedestrian trajectory prediction method and system based on multi-target tracking
CN112069983B (en) Low-light pedestrian detection method and system for multi-task feature fusion sharing learning
Wu et al. Vehicle detection based on adaptive multi-modal feature fusion and cross-modal vehicle index using RGB-T images
CN117576149A (en) Single-target tracking method based on attention mechanism
He et al. Real-time pedestrian warning system on highway using deep learning methods
Zhu et al. Advanced driver assistance system based on machine vision
Gerhardt et al. Neural network-based traffic sign recognition in 360° images for semi-automatic road maintenance inventory
CN116485894A (en) Video scene mapping and positioning method and device, electronic equipment and storage medium
Jourdheuil et al. Heterogeneous adaboost with real-time constraints-application to the detection of pedestrians by stereovision
Guo et al. ANMS: attention-based non-maximum suppression
Nataprawira et al. Pedestrian Detection in Different Lighting Conditions Using Deep Neural Networks.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant