CN112069983A - Low-illumination pedestrian detection method and system for multi-task feature fusion shared learning - Google Patents

Low-illumination pedestrian detection method and system for multi-task feature fusion shared learning Download PDF

Info

Publication number
CN112069983A
CN112069983A CN202010917093.XA CN202010917093A CN112069983A CN 112069983 A CN112069983 A CN 112069983A CN 202010917093 A CN202010917093 A CN 202010917093A CN 112069983 A CN112069983 A CN 112069983A
Authority
CN
China
Prior art keywords
illumination
network
low
pedestrian
pedestrian detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010917093.XA
Other languages
Chinese (zh)
Other versions
CN112069983B (en
Inventor
卢涛
王元植
张彦铎
吴云韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Original Assignee
Wuhan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology filed Critical Wuhan Institute of Technology
Priority to CN202010917093.XA priority Critical patent/CN112069983B/en
Publication of CN112069983A publication Critical patent/CN112069983A/en
Application granted granted Critical
Publication of CN112069983B publication Critical patent/CN112069983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a low-illumination pedestrian detection method and system based on multitask feature fusion shared learning, which are used for acquiring normal and low-illumination pedestrian data sets; pre-training the image illumination enhancement network by using normal and low illumination pedestrian data sets; pre-training a pedestrian detection network by using a normal illumination pedestrian data set; designing a multi-task feature fusion module capable of fusing features between an upstream task and a downstream task, performing feature fusion and sharing on the two networks, and constructing a low-illumination pedestrian detection network for multi-task feature fusion shared learning; importing the two pre-training models into a low-illumination pedestrian detection network, and training by using normal and low-illumination data sets to obtain a low-illumination pedestrian detection model for multi-task feature fusion and shared learning; and detecting the detected image by using a low-illumination pedestrian detection model of multi-task feature fusion sharing learning to obtain the position of the pedestrian in the image. The invention can accurately and efficiently detect the position of the pedestrian in the low-illumination image.

Description

Low-illumination pedestrian detection method and system for multi-task feature fusion shared learning
Technical Field
The invention belongs to the technical field of computer vision target detection, and particularly relates to a low-illumination pedestrian detection method and system based on multi-task feature fusion shared learning.
Background
With the rapid development of world economy, people frequently come and go between different regions, different cities and different countries, the public safety hidden danger caused by the frequent coming and going of people causes related safety departments to consume much energy. Currently, video surveillance devices, which are important components of urban security systems, are widely used and installed in public areas such as roads, streets, schools, and shopping malls. The devices are mainly used for recording and storing things happening at related places, so that people can conveniently complete the requirements of remote monitoring, emergency command and the like, and the public safety of the society is guaranteed. Pedestrians are the main bodies in video monitoring, and the research and analysis of the behaviors of the pedestrians by using an intelligent technology are important components of the intelligent monitoring technology. Pedestrian detection is one of the key technologies in this field. Pedestrian detection is an important problem in the field of computer vision, and has enjoyed great success in recent years under controlled conditions.
In each application scene in the field of computer vision, pedestrians are very important analysis targets, and the important prior condition that a machine can complete subsequent tasks or interact with human beings is that the pedestrian targets in the behavior environment of the machine can be correctly identified. Pedestrian detection can be directly applied to scenes such as indoor and outdoor mobile robots, automatic driving of automobiles, security monitoring and the like, so that the pedestrian detection attracts the attention of people in recent years. Pedestrian detection has a significant impact on many applications, such as driver assistance systems, robotic navigation, and intelligent video surveillance systems. The continuous development of deep learning makes a great contribution to the performance of pedestrian detection.
Generally, pedestrian detection is a particular area of object detection. From this point of view, the pedestrian detection algorithms in deep learning can be divided into two categories: anchor boxes based methods and keypoint based methods. Convolutional Neural Networks (CNN) were first introduced into target detection in RCNN, which allows detection without the need to manually design features. The keypoint-based target detection algorithm generates a target bounding box by detecting and grouping keypoints. This greatly simplifies the output of the network and eliminates the need to design anchor boxes.
Although the pedestrian detection algorithm described above still achieves satisfactory performance under normal lighting conditions. Most of them do not consider pedestrian detection in low light environments. In practical applications, normal lighting conditions are not always guaranteed. In contrast, low light environments are very common. The main reason for poor pedestrian detection performance in low-light environments is that color and texture information in the input signal obtained under low-light conditions is severely distorted. However, color and texture information plays a crucial role in pedestrian detection. In order to solve the problem of low-illumination pedestrian detection, infrared pedestrian detection is researched at present, but the infrared pedestrian detection needs to be carried out under an infrared image, the original low-illumination image detection cannot be directly used, and the detection cost can be greatly increased.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a low-illumination pedestrian detection method and system based on multi-task feature fusion shared learning, and solves the problem of poor detection effect of pedestrians in a low-illumination environment.
To achieve the above object, according to one aspect of the present invention, there is provided a low-light pedestrian detection method based on multitask feature fusion shared learning, including:
s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
s2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
s3: constructing a self-calibration separate attention module SCSAB, which combines a self-calibration convolution network and a separate attention network, for collecting information of each spatial position in an input image to extend a field of view of each convolution layer;
s4: constructing a self-calibration separated attention hourglass network, wherein a basic module of the self-calibration separated attention hourglass network consists of SCSAB;
s5: constructing a pedestrian detection network, wherein the pedestrian detection network takes the self-calibration separated attention hourglass network as a backbone network, and trains the pedestrian detection network by using the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model;
s6: based on multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features between an upstream task and a downstream task, performing feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network, and constructing a low-illumination pedestrian detection network for multi-task feature fusion shared learning;
s7: and importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by utilizing the normal illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning, and detecting an image to be detected through the low-illumination pedestrian detection model for multi-task feature fusion shared learning to obtain the position of a pedestrian in the image to be detected.
In some alternative embodiments, step S6 includes:
adding the characteristics of the first convolution network of the image illumination enhancement network and the characteristics of the second last SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the characteristics of the fourth last convolution network of the image illumination enhancement network and the characteristics of the first SCSAB of the self-calibration separation attention hourglass network, then taking the average number, then passing through the sigmoid function, and feeding back to the two networks in the next iteration, thereby constructing the low-illumination pedestrian detection network with multi-task feature fusion and shared learning.
In some optional embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein a loss function L of the image illumination enhancement networkenhComprises the following steps: l isenh=LreconirLirisLis,λirAnd λisIs a weight coefficient, Lrecon,LirAnd LisRepresenting reconstruction, reflectance and illumination smoothness loss functions, respectively.
In some alternative embodiments, the loss function L of the pedestrian detection networkcorComprises the following steps: l iscor=Ldet+Lpull+ηLpush+γLoffWherein, η and γ are each Lpull,LpushAnd LoffWeights of three loss functions, LdetFor corner loss, LpullFor grouping corners, LpushSeparation of the diagonal, LoffIs the offset loss.
In some of the alternative embodiments, the first and second,
Figure BDA0002665395670000041
Ldetfor corner loss, N is the number of objects in the image, α and β are the hyperparameters controlling the contribution of each corner, C, H and W represent the number of channels, height and width, respectively, input, paijIs the score at the (i, j) position of class a in the predicted image, yaijIs an original image without normalization;
Figure BDA0002665395670000042
Lofffor offset loss, okIs the offset of the label, xkAnd ykIs the x and y coordinates of the corner point k, n is the down-sampling factor,
Figure BDA0002665395670000043
representing the difference of the predicted offset and the offset of the tag calculated using the smoothing loss,
Figure BDA0002665395670000044
representing a predicted offset;
Figure BDA0002665395670000045
Figure BDA0002665395670000046
Lpullfor grouping corners, LpushThe corners are separated, m represents the number of objects,
Figure BDA0002665395670000047
for the embedding of the upper left corner of object m,
Figure BDA0002665395670000048
for the embedding of the lower right corner of object m, emIs composed of
Figure BDA0002665395670000049
And
Figure BDA00026653956700000410
average value of emAnd ejRepresenting the embedding of objects m and j, respectively.
In some optional embodiments, the method further comprises: and designing a multi-task learning mechanism with feature fusion and sharing, wherein the mechanism can fuse features between an upstream task and a downstream task and feed back the features to other networks during the next iteration.
In some optional embodiments, the feature fusion and sharing multitask learning mechanism is: assume that there are two tasks, task A and task B, respectively, task A being a network of convolutional layers CA1Is characterized by an output
Figure BDA00026653956700000411
Convolutional layer C in task B networkB1Is characterized by an output
Figure BDA00026653956700000412
CA2And CB2Are respectively CA1And CB1The next layer of the wound up layer of (a),
Figure BDA0002665395670000051
is CA2The input characteristics of the convolutional layer(s),
Figure BDA0002665395670000052
is CB2Input characteristics of the convolutional layer, FiThe shared features obtained for the ith end-to-end iteration are expressed as follows:
Figure BDA0002665395670000053
when the value of i is 1, the value of i,
Figure BDA0002665395670000054
is represented as follows:
Figure BDA0002665395670000055
when the value of i is greater than 1,
Figure BDA0002665395670000056
is represented as follows:
Figure BDA0002665395670000057
Fi-1representing the shared characteristics obtained by the i-1 st end-to-end iteration.
In some optional embodiments, the overall training loss function of the multitask feature shared low-light pedestrian detection network is: l ═ Ldet+Lcor=Ldet+Lpull+ηLpush+γLoff+ζLenhL is the total loss, ζ is the illumination enhancement loss LenhThe weight of (c).
According to another aspect of the present invention, there is provided a low-light pedestrian detection system based on multitask feature fusion shared learning, including:
the data set module is used for acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
the image illumination enhancement module is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the normal illumination pedestrian data set and the low illumination pedestrian data set are used for training the image illumination enhancement network to obtain an image illumination enhancement pre-training model;
the pedestrian detection network is trained by using the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;
the multitask feature fusion module is used for designing a multitask feature fusion module capable of fusing features between an upstream task and a downstream task based on multitask feature fusion shared learning, performing feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network, and constructing a low-illumination pedestrian detection network of the multitask feature fusion shared learning;
the model training module is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into the multitask feature fusion shared learning low-illumination pedestrian detection network, and training the multitask feature fusion shared learning low-illumination pedestrian detection network by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain a multitask feature fusion shared learning low-illumination pedestrian detection model;
and the image detection module is used for detecting the image to be detected by using the low-illumination pedestrian detection model of the multitask feature fusion shared learning to obtain the position of the pedestrian in the image to be detected.
In some optional embodiments, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network.
In some alternative embodiments, a self-calibration based discrete attention pedestrian detection network is constructed.
According to another aspect of the present invention, there is provided a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any of the low-light pedestrian detection methods based on multitask feature fusion shared learning.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
the invention provides a low-illumination pedestrian detection method and system based on multitask feature fusion shared learning, which can accurately and efficiently detect the position of a pedestrian in a low-illumination image; and the self-calibration separated attention module is creatively provided, and the module combines the self-calibration convolution and the separated attention network to effectively collect the information of each spatial position in the input image so as to expand the visual field of each convolution layer, thereby improving the detection performance.
Drawings
Fig. 1 is a schematic flowchart of a low-light pedestrian detection method based on multitask feature fusion shared learning according to an embodiment of the present invention;
FIG. 2 is a diagram of a low light pedestrian detection network architecture based on multi-tasking feature fusion shared learning in accordance with the present invention;
FIG. 3 is a self-calibrating discrete attention block diagram proposed by the present invention;
FIG. 4 is a schematic diagram of a low-light pedestrian detection system based on multi-tasking feature fusion shared learning of the present invention;
FIG. 5 is a workflow diagram of the feature fusion and sharing multitasking learning mechanism proposed by the present invention;
FIG. 6 is a graph comparing test results of the examples of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention is mainly divided into four parts: the method comprises an image illumination enhancement pre-training model, a pedestrian detection pre-training model, a low illumination pedestrian detection model of multi-task feature fusion shared learning and a low illumination pedestrian detection model of multi-task feature fusion shared learning, wherein the pedestrian position in the image is deduced from the low illumination image.
The low-light pedestrian detection method based on the multitask feature fusion shared learning disclosed by the embodiment of the invention comprises the following steps as shown in figure 1:
s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
in the embodiment of the invention, the information can be directly obtained by photographing; CityPersons (a published large-scale pedestrian detection dataset) pedestrian detection training set and verification set can also be reduced in illumination, and in the embodiment of the present invention, an RGB-based spatial brightness adjustment algorithm in OpenCV (a BSD license (open source) -based distributed cross-platform computer vision library) is used, and the algorithm is adjusted based on the current RGB value size, that is, the larger the R, G, B value, the larger the adjustment, for example: the current pixel point is (100,200,50), the adjustment coefficient is 1.1, and then the adjustment is (110,220, 55). In the present embodiment, the adjustment coefficient used is 0.8. After reducing the brightness, the normal and low light CityPersons pedestrian detection training and verification sets were used as training and test sets.
S2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and training the image illumination enhancement network by utilizing a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
the normal and low-illumination pedestrian detection training sets are used for independently training the image illumination enhancement network for 100 periods to obtain an image illumination enhancement pre-training model. The network structure of the image illumination enhancement network is shown in fig. 2, the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, and the network introduces a Retinex theory. The classical Retinex theory establishes a model for human color perception, and this theory assumes that the observed image can be decomposed into two parts, a reflection channel and an illumination channel. Let S represent the source image, it can be represented by S ═ R × I, where R represents the reflectance component and I represents the illumination component, and × represents the element-by-element multiplication. For the loss function, in order to ensure that the image after restoring the illumination can retain the object edge information and also retain the smooth transition of the illumination information, the following loss functions are used in the illumination enhancement network:
Lenh=LreconirLirisLis
wherein λ isirAnd λisCoefficients representing the balance of reflectivity and illuminance. Loss function Lrecon,LirAnd LisRepresenting reconstruction, reflectance and illumination smoothness functions, respectively.
S3: constructing a pedestrian detection network, wherein the pedestrian detection network takes two self-calibration separation attention-based hourglass networks as a backbone network, and trains the pedestrian detection network by using a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB (simple sequence analysis and clustering routine) which is combined with a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;
in the embodiment of the invention, a pedestrian detection training set with normal illumination is used for independently training a pedestrian detection network for 100 periods to obtain a pedestrian detection pre-training model. The network structure of the pedestrian detection network is shown in fig. 2. For the pedestrian detection network, a target detection algorithm thought based on key points is referred, and the target detection algorithm based on the key points generates an object boundary box by detecting and grouping the key points, so that the output of the network is greatly simplified, the need of designing an anchor box is eliminated, and meanwhile, an attention mechanism is introduced into the network to further improve the detection performance. The Self-calibration discrete Attention Block module utilizes an autonomously designed hourglass network as a backbone network of a pedestrian detection network, the hourglass network consists of the Self-calibration discrete Attention Block (SCSAB), and as shown in fig. 3, the Self-calibration convolution and discrete Attention network module effectively collects information of each spatial position in an input image to expand the view field of each convolution layer, so that the detection performance is improved.
For the loss function of the pedestrian detection network, a focal loss function of α -2 and β -4 may be used, and let p beaijPredict the score at the (i, j) position of class a in the image and let yaijRaw images were not normalized.
Figure BDA0002665395670000091
Where N is the number of objects in the image and α and β are the hyper-parameters that control the contribution of each point. C, H and W represent the number of input channels, height and width, respectively.
When an input image passes through a convolutional layer, the output is typically smaller in size than the input image. Thus, location (x, y) in the image maps to location in the heat map
Figure BDA0002665395670000092
Where n is the downsampling factor. Some accuracy may be lost when remapping the location from the heat map to the input image. To solve this problem, a position offset is predicted, which slightly adjusts the corner point positions and then remaps them to the input resolution.
Figure BDA0002665395670000093
Wherein o iskIs an offset, xkAnd ykAre the x and y coordinates of the corner point k. One set of offsets is predicted to be shared by the top left corner of all classes and another set of offsets is shared by the bottom right corner. For training, the offset loss is labeled LoffAnd applying the smoothed L1 loss as the offset loss:
Figure BDA0002665395670000094
wherein,
Figure BDA0002665395670000095
representing the difference of the predicted offset and the offset of the tag calculated using the smoothing loss,
Figure BDA0002665395670000101
indicating the predicted offset.
There may be multiple objects in the image and thus multiple upper left and lower right corners may be detected, so it is necessary to determine whether a pair of upper left and lower right corners are from the same bounding box. Order to
Figure BDA0002665395670000102
For the embedding of the upper left corner of object m,
Figure BDA0002665395670000103
is the embedding of the lower right corner of object m. The network is trained to group corners using "pull" losses and to separate corners using "push" losses:
Figure BDA0002665395670000104
Figure BDA0002665395670000105
wherein e ismIs that
Figure BDA0002665395670000106
And
Figure BDA0002665395670000107
average value of emAnd ejRepresenting the embedding of objects m and j, respectively, ejIs that
Figure BDA0002665395670000108
And
Figure BDA0002665395670000109
is determined by the average value of (a) of (b),
Figure BDA00026653956700001010
for the embedding of the upper left corner of object j,
Figure BDA00026653956700001011
for the embedding of the lower right corner of object j, the total loss of training for pedestrian detection is as follows:
Lcor=Ldet+Lpull+ηLpush+γLoff
wherein eta and gamma are each Lpull,LpushAnd LoffWeights of three loss functions, LcorIs the total loss of the pedestrian detection network.
S4: based on multi-task feature fusion and sharing learning, designing a multi-task feature fusion module capable of fusing features among different tasks, sharing the features of an image illumination enhancement network and a pedestrian detection network, adding the features of the first 3 x 3 convolution network of the enhancement network and the features of the second last SCSAB of a self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last 3 x 3 convolution network of the enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, then taking the average number, then passing through the sigmoid function, and feeding back to the two networks in the next iteration to construct a low-illumination pedestrian detection network of multi-task feature fusion and sharing learning;
a multi-task learning mechanism with feature fusion and sharing is designed, features between an upstream task and a downstream task can be fused by the multi-task learning mechanism, and the features are fed back to other networks when next iteration is carried out. As shown in fig. 5, the multi-task learning mechanism for feature fusion and sharing is: assume that there are two tasks, task A and task B, respectively, task A being a network of convolutional layers CA1Is characterized by an output
Figure BDA0002665395670000111
Convolutional layer C in task B networkB1Is characterized by an output
Figure BDA0002665395670000112
CA2And CB2Are respectively CA1And CB1The next layer of the wound up layer of (a),
Figure BDA0002665395670000113
is CA2The input characteristics of the convolutional layer(s),
Figure BDA0002665395670000114
is CB2Input characteristics of the convolutional layer, FiThe shared features obtained for the ith end-to-end iteration are expressed as follows:
Figure BDA0002665395670000115
when the value of i is 1, the value of i,
Figure BDA0002665395670000116
is represented as follows:
Figure BDA0002665395670000117
when the value of i is greater than 1,
Figure BDA0002665395670000118
is represented as follows:
Figure BDA0002665395670000119
Fi-1representing the shared characters obtained by the i-1 st end-to-end round iterationAnd (5) carrying out characterization.
The low-illumination pedestrian detection network for multi-task feature fusion and shared learning comprises: in order to further improve the performance of the two networks, a multitask feature fusion module is introduced on the basis, and the detailed structure of the multitask feature fusion module is shown in fig. 2. Wherein, in ithIn the iteration, in the multitask feature fusion module, the image illumination is respectively enhanced to the features in the sub-networks
Figure BDA00026653956700001110
And
Figure BDA00026653956700001111
features fused into pedestrian detection sub-network
Figure BDA00026653956700001112
And
Figure BDA00026653956700001113
and the two fused features are respectively marked as
Figure BDA00026653956700001114
And
Figure BDA00026653956700001115
feature(s)
Figure BDA00026653956700001116
And features
Figure BDA00026653956700001117
From the first 3 x 3 convolutional layer and the fourth last 3 x 3 convolutional layer in the enhancement network, respectively. Feature(s)
Figure BDA00026653956700001118
From the characteristics of the second to last SCSAB in the self-calibrating split attention hourglass network
Figure BDA00026653956700001119
From the first to the secondOne SCSAB. Also, the size of these features is the same.
Figure BDA00026653956700001120
And
Figure BDA00026653956700001121
is represented as follows:
Figure BDA00026653956700001122
Figure BDA00026653956700001123
then the
Figure BDA00026653956700001124
And
Figure BDA00026653956700001125
after the sigmoid function is entered, it is expressed as
Figure BDA00026653956700001126
And
Figure BDA00026653956700001127
at i +1thIn the course of the iteration(s),
Figure BDA00026653956700001128
are respectively connected with
Figure BDA00026653956700001129
And
Figure BDA00026653956700001130
element-by-element multiplication as input to the second convolutional layer of the enhancement network and the last SCSAB of the self-calibrating split attention hourglass network. The input is characterized by being represented as
Figure BDA00026653956700001131
And
Figure BDA00026653956700001133
the formula is as follows:
Figure BDA00026653956700001134
Figure BDA0002665395670000121
the same fusion and sharing method acts on the input of the last convolution in the enhancement network and the input of the second SCSAB in the self-calibrating split attention hourglass network. The input is characterized by being represented as
Figure BDA0002665395670000122
And
Figure BDA0002665395670000123
the formula is as follows:
Figure BDA0002665395670000124
Figure BDA0002665395670000125
finally, the training loss function L of the low-illumination pedestrian detection network for multi-task feature fusion shared learning is as follows:
L=Ldet+Lpull+ηLpush+γLoff+ζLenh
wherein eta and gamma are each Lpull,LpushAnd LoffWeights of the three loss functions. ζ is the loss of light enhancement LenhThe weight of (c). The sum η is set to 0.1, γ is set to 1, and ζ is set to 0.05.
S5: importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by using a normal illumination pedestrian data set and a low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning;
and training the low-illumination pedestrian detection network of the multitask feature fusion shared learning by the normal and low-illumination pedestrian detection training sets, simultaneously importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model which are trained well before, and training for 100 periods to obtain the low-illumination pedestrian detection model of the multitask feature fusion shared learning.
S6: and detecting the image to be detected by using the low-illumination pedestrian detection model of the multi-task feature fusion shared learning to obtain the position of the pedestrian in the image. And inputting the low-illumination test set picture into a trained multi-task feature fusion and shared learning low-illumination pedestrian detection model for reasoning, and framing the position of the pedestrian in the image.
The invention also provides a low-light pedestrian detection system based on multitask feature fusion shared learning, which is used for realizing the low-light pedestrian detection method based on multitask feature fusion shared learning, and as shown in fig. 4, the low-light pedestrian detection system based on multitask feature fusion shared learning comprises:
a data set module 101, configured to obtain a normal-illumination pedestrian data set and a low-illumination pedestrian data set;
the image illumination enhancement module 102 is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing a normal illumination pedestrian data set and a low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
the pedestrian detection module 103 is used for constructing a pedestrian detection network, the pedestrian detection network takes two self-calibration separation attention-based hourglass networks as a backbone network, and the pedestrian detection network is trained by using a normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;
the multitask feature fusion module 104 is used for fusing features between upstream and downstream tasks, sharing features of the image illumination enhancement network and the pedestrian detection network, adding the features of the first 3 x 3 convolution network of the enhancement network and the features of the second last SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the features of the fourth last 3 x 3 convolution network of the enhancement network and the features of the first SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through the sigmoid function, feeding back to the two networks in the next iteration, and constructing the multitask feature fusion shared learning low-illumination pedestrian detection network;
the model training module 105 is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, and training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by using a normal illumination pedestrian data set and a low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning;
and the image detection module 106 is configured to detect the image to be detected by using the low-illumination pedestrian detection model of the multitask feature fusion shared learning, so as to obtain the position of the pedestrian in the image.
Further, an image illumination enhancement network is constructed based on the RetinexNet convolutional neural network; a pedestrian detection network is constructed based on the self-calibration separation attention block provided by the invention.
The present invention also provides a computer storage medium having stored therein a computer program executable by a computer processor, the computer program executing the low-light pedestrian detection model method based on multitask feature fusion shared learning described above.
The invention is most suitable forThe test example is provided using normal and low light CityPersons data sets, including 2975 pictures each for normal and low light training sets and 500 pictures each for normal and low light test sets. The experiment was implemented in a pytoreh and trained using 3 RTX 2080Ti graphics, while applying Adam optimization algorithm. In the selection of experimental parameters, the set learning rate is 0.0001. The evaluation index complies with the evaluation standard of the california institute of technology: log mean Miss Rate (MR) per image-2),MR-2The lower the value of (a), the better the algorithm performance. By MR-2And the evaluation index is compared with other excellent pedestrian detection algorithms to prove the superiority of the invention. TABLE 1 by MR-2The evaluation indexes show the results of the comparison, and fig. 6 is a comparison graph of the results of the detection of the pedestrian position, in which (a) shows the input image, (b) shows the CSP detection result, (c) shows the ALFNet detection result, (d) shows the cenet detection result, (e) shows the CornerNet detection result, (f) shows the CornerNet-Saccade detection result, (g) shows the detection result of the present invention, and (h) shows the Benchmark detection result.
The invention selects some excellent pedestrian detection or target detection methods, such as: CSP, ALFNet, CenterNet, CornerNet, and CornerNet-Saccade. ALFNet is the most representative algorithm for pedestrian detection using anchor boxes, and CSP and CenterNet are the best algorithms for pedestrian detection and target detection based on a central point, respectively. Meanwhile, CornerNet and CornerNet-Saccade are representative algorithms of the corner-based object detection method. All algorithms in table 1 train the algorithms with normal and low-light training sets, so all of these pedestrian detection networks have the capability of processing low-light images, and the fairness of the experiment is ensured. In addition, in order to further explain the function of the multitask feature fusion module in the algorithm of the patent, the algorithm in table 2 cascades the RetinexNet illumination enhancement algorithm and the detection algorithm together in a cascading manner. As can be seen from the results in Table 2, the indexes of the cascade method are still not superior to the algorithm of the invention, which proves that the multitask feature fusion module plays an important role.
Table 1 comparison of the present invention with five excellent algorithms
Figure BDA0002665395670000151
Table 2 comparison of the results of the present invention with five excellent algorithms cascaded by RetinexNet
Figure BDA0002665395670000152
From the experimental results of the table, the algorithm has obvious advantages compared with the other five methods.
The parts not described in the specification are prior art or common general knowledge. The present embodiments are illustrative only and not intended to limit the scope of the present invention, and modifications and equivalents thereof by those skilled in the art are considered to fall within the scope of the present invention as set forth in the claims.
It should be noted that, according to the implementation requirement, each step/component described in the present application can be divided into more steps/components, and two or more steps/components or partial operations of the steps/components can be combined into new steps/components to achieve the purpose of the present invention.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A low-light pedestrian detection method based on multitask feature fusion shared learning is characterized by comprising the following steps:
s1: acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
s2: constructing an image illumination enhancement network, wherein the image illumination enhancement network comprises a decomposition network and an enhancement network, and the image illumination enhancement network is trained by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain an image illumination enhancement pre-training model;
s3: constructing a self-calibration separate attention module SCSAB, which combines a self-calibration convolution network and a separate attention network, for collecting information of each spatial position in an input image to extend a field of view of each convolution layer;
s4: constructing a self-calibration separated attention hourglass network, wherein a basic module of the self-calibration separated attention hourglass network consists of SCSAB;
s5: constructing a pedestrian detection network, wherein the pedestrian detection network takes the self-calibration separated attention hourglass network as a backbone network, and trains the pedestrian detection network by using the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model;
s6: based on multi-task feature fusion shared learning, designing a multi-task feature fusion module capable of fusing features between an upstream task and a downstream task, performing feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network, and constructing a low-illumination pedestrian detection network for multi-task feature fusion shared learning;
s7: and importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into a low-illumination pedestrian detection network for multi-task feature fusion shared learning, training the low-illumination pedestrian detection network for multi-task feature fusion shared learning by utilizing the normal illumination pedestrian data set and the low-illumination pedestrian data set to obtain a low-illumination pedestrian detection model for multi-task feature fusion shared learning, and detecting an image to be detected through the low-illumination pedestrian detection model for multi-task feature fusion shared learning to obtain the position of a pedestrian in the image to be detected.
2. The method according to claim 1, wherein step S6 includes:
adding the characteristics of the first convolution network of the image illumination enhancement network and the characteristics of the second last SCSAB of the self-calibration separation attention hourglass network, then taking an average number, then passing through a sigmoid function, feeding back to the two networks in the next iteration, adding the characteristics of the fourth last convolution network of the image illumination enhancement network and the characteristics of the first SCSAB of the self-calibration separation attention hourglass network, then taking the average number, then passing through the sigmoid function, and feeding back to the two networks in the next iteration, thereby constructing the low-illumination pedestrian detection network with multi-task feature fusion and shared learning.
3. The method of claim 1, wherein the image illumination enhancement network is constructed based on a RetinexNet convolutional neural network, wherein a loss function L of the image illumination enhancement networkenhComprises the following steps: l isenh=LreconirLirisLis,λirAnd λisIs a weight coefficient, Lrecon,LirAnd LisRepresenting reconstruction, reflectance and illumination smoothness loss functions, respectively.
4. The method of claim 3, wherein the loss function L of the pedestrian detection networkcorComprises the following steps: l iscor=Ldet+Lpull+ηLpush+γLoffWherein, η and γ are each Lpull,LpushAnd LoffWeights of three loss functions, LdetFor corner loss, LpullFor grouping corners, LpushSeparation of the diagonal, LoffIs the offset loss.
5. The method of claim 4,
Figure FDA0002665395660000021
Ldetfor corner loss, N is the number of objects in the image, α and β are the hyperparameters controlling the contribution of each corner, C, H and W represent the number of channels input, respectively, highDegree and width, paijIs the score at the (i, j) position of class a in the predicted image, yaijIs an original image without normalization;
Figure FDA0002665395660000022
Lofffor offset loss, okIs an offset, xkAnd ykIs the x and y coordinates of the corner point k, n is the down-sampling factor,
Figure FDA0002665395660000023
representing the difference of the predicted offset and the offset of the tag calculated using the smoothing loss,
Figure FDA0002665395660000031
representing a predicted offset;
Figure FDA0002665395660000032
Figure FDA0002665395660000033
Lpullfor grouping corners, LpushThe corners are separated, m represents the number of objects,
Figure FDA0002665395660000034
for the embedding of the upper left corner of object m,
Figure FDA0002665395660000035
for embedding in the lower right corner of object m, emIs composed of
Figure FDA0002665395660000036
And
Figure FDA0002665395660000037
average value of emAnd ejRepresenting the embedding of objects m and j, respectively.
6. The method of claim 1, further comprising: and designing a multi-task learning mechanism with feature fusion and sharing, wherein the mechanism can fuse features between an upstream task and a downstream task and feed back the features to other networks during the next iteration.
7. The method of claim 6, wherein the feature fusion and sharing multitask learning mechanism is: assume that there are two tasks, task A and task B, respectively, task A being a network of convolutional layers CA1Is characterized by an output
Figure FDA0002665395660000038
Convolutional layer C in task B networkB1Is characterized by an output
Figure FDA0002665395660000039
CA2And CB2Are respectively CA1And CB1The next layer of the wound up layer of (a),
Figure FDA00026653956600000310
is CA2The input characteristics of the convolutional layer(s),
Figure FDA00026653956600000311
is CB2Input characteristics of the convolutional layer, FiThe shared features obtained for the ith end-to-end iteration are expressed as follows:
Figure FDA00026653956600000312
when the value of i is 1, the value of i,
Figure FDA00026653956600000313
is represented as follows:
Figure FDA00026653956600000314
when the value of i is greater than 1,
Figure FDA00026653956600000315
is represented as follows:
Figure FDA00026653956600000316
Fi-1representing the shared characteristics obtained by the i-1 st end-to-end iteration.
8. The method of claim 5, wherein the overall training loss function of the multitask feature shared low-light pedestrian detection network is: l ═ Ldet+Lcor=Ldet+Lpull+ηLpush+γLoff+ζLenhL is the total loss, ζ is the illumination enhancement loss LenhThe weight of (c).
9. A low-light pedestrian detection system based on multitask feature fusion shared learning is characterized by comprising the following components in parts by weight:
the data set module is used for acquiring a normal illumination pedestrian data set and a low illumination pedestrian data set;
the image illumination enhancement module is used for constructing an image illumination enhancement network, the image illumination enhancement network comprises a decomposition network and an enhancement network, and the normal illumination pedestrian data set and the low illumination pedestrian data set are used for training the image illumination enhancement network to obtain an image illumination enhancement pre-training model;
the pedestrian detection network is trained by using the normal illumination pedestrian data set to obtain a pedestrian detection pre-training model, wherein a basic module of the self-calibration separation attention hourglass network consists of a self-calibration separation attention module SCSAB, and the self-calibration separation attention module combines a self-calibration convolution network and a separation attention network and is used for collecting information of each spatial position in an input image to expand the visual field of each convolution layer;
the multitask feature fusion module is used for designing a multitask feature fusion module capable of fusing features between an upstream task and a downstream task based on multitask feature fusion shared learning, performing feature fusion and sharing on the image illumination enhancement network and the pedestrian detection network, and constructing a low-illumination pedestrian detection network of the multitask feature fusion shared learning;
the model training module is used for importing the image illumination enhancement pre-training model and the pedestrian detection pre-training model into the multitask feature fusion shared learning low-illumination pedestrian detection network, and training the multitask feature fusion shared learning low-illumination pedestrian detection network by utilizing the normal illumination pedestrian data set and the low illumination pedestrian data set to obtain a multitask feature fusion shared learning low-illumination pedestrian detection model;
and the image detection module is used for detecting the image to be detected by using the low-illumination pedestrian detection model of the multitask feature fusion shared learning to obtain the position of the pedestrian in the image to be detected.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the low-light pedestrian detection method based on multitask feature fusion shared learning according to any one of claims 1 to 8.
CN202010917093.XA 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning Active CN112069983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010917093.XA CN112069983B (en) 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010917093.XA CN112069983B (en) 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Publications (2)

Publication Number Publication Date
CN112069983A true CN112069983A (en) 2020-12-11
CN112069983B CN112069983B (en) 2024-03-26

Family

ID=73666641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010917093.XA Active CN112069983B (en) 2020-09-03 2020-09-03 Low-light pedestrian detection method and system for multi-task feature fusion sharing learning

Country Status (1)

Country Link
CN (1) CN112069983B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862713A (en) * 2021-02-02 2021-05-28 山东师范大学 Attention mechanism-based low-light image enhancement method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140340555A1 (en) * 2013-05-14 2014-11-20 Canon Kabushiki Kaisha Image sensing apparatus
CN104317244A (en) * 2014-09-28 2015-01-28 北京理工大学 Reconfigurable manufacturing system part family construction method
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140340555A1 (en) * 2013-05-14 2014-11-20 Canon Kabushiki Kaisha Image sensing apparatus
CN104317244A (en) * 2014-09-28 2015-01-28 北京理工大学 Reconfigurable manufacturing system part family construction method
CN109711258A (en) * 2018-11-27 2019-05-03 哈尔滨工业大学(深圳) Lightweight face critical point detection method, system and storage medium based on convolutional network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈双玉等: "基于双目立体视觉和SVM算法行人检测方法", 《华中科技大学学报(自然科学版)》, 16 October 2015 (2015-10-16), pages 141 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862713A (en) * 2021-02-02 2021-05-28 山东师范大学 Attention mechanism-based low-light image enhancement method and system
CN112862713B (en) * 2021-02-02 2022-08-09 山东师范大学 Attention mechanism-based low-light image enhancement method and system

Also Published As

Publication number Publication date
CN112069983B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
US10453197B1 (en) Object counting and instance segmentation using neural network architectures with image-level supervision
CN110163187B (en) F-RCNN-based remote traffic sign detection and identification method
Kumar et al. A new vehicle tracking system with R-CNN and random forest classifier for disaster management platform to improve performance
CN111814595B (en) Low-illumination pedestrian detection method and system based on multi-task learning
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
Huang et al. Spatial-temproal based lane detection using deep learning
KR20180048407A (en) Apparatus and method for detecting a lane
CN112084869A (en) Compact quadrilateral representation-based building target detection method
CN113255443A (en) Pyramid structure-based method for positioning time sequence actions of graph attention network
CN114943757A (en) Unmanned aerial vehicle forest exploration system based on monocular depth of field prediction and depth reinforcement learning
Asgarian et al. Fast drivable area detection for autonomous driving with deep learning
Ren et al. A new multi-scale pedestrian detection algorithm in traffic environment
Li et al. Detection of road objects based on camera sensors for autonomous driving in various traffic situations
CN112069983B (en) Low-light pedestrian detection method and system for multi-task feature fusion sharing learning
Zhang et al. An efficient deep neural network with color-weighted loss for fire detection
CN117576149A (en) Single-target tracking method based on attention mechanism
He et al. Real-time pedestrian warning system on highway using deep learning methods
Tan et al. UAV image object recognition method based on small sample learning
Nataprawira et al. Pedestrian Detection in Different Lighting Conditions Using Deep Neural Networks.
CN117636241A (en) Low-light scene multi-mode pedestrian detection tracking method based on decision-level fusion
Dutta Seeing Objects in Dark with Continual Contrastive Learning
CN117523491A (en) Unmanned aerial vehicle-mounted fire automatic detection method and device
Malik et al. High-Level Semantic Feature Detector: Pedestrian Detection Based On Improved Mask R-CNN Algorithm
Wang Human Detection in aSequence of Thermal Images using Deep Learning
CN117423135A (en) Pedestrian target detection method based on improved YOLOv8 lightweight network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant