CN110472542A - A kind of infrared image pedestrian detection method and detection system based on deep learning - Google Patents
A kind of infrared image pedestrian detection method and detection system based on deep learning Download PDFInfo
- Publication number
- CN110472542A CN110472542A CN201910716970.4A CN201910716970A CN110472542A CN 110472542 A CN110472542 A CN 110472542A CN 201910716970 A CN201910716970 A CN 201910716970A CN 110472542 A CN110472542 A CN 110472542A
- Authority
- CN
- China
- Prior art keywords
- network
- infrared image
- detection
- fidn
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 105
- 238000013135 deep learning Methods 0.000 title claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 43
- 238000012549 training Methods 0.000 claims description 40
- 230000006870 function Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 239000011248 coating agent Substances 0.000 claims description 4
- 238000000576 coating method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 239000003550 marker Substances 0.000 claims description 3
- 238000002156 mixing Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000005192 partition Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 3
- 238000003475 lamination Methods 0.000 claims description 2
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 238000005096 rolling process Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 5
- 238000013461 design Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 239000013065 commercial product Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003331 infrared imaging Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The present invention provides a kind of infrared image pedestrian detection method and detection system based on deep learning, belongs to technical field of computer vision.Infrared image pedestrian detection method of the present invention includes the following steps: to obtain data and data prediction;Target detection FIDN network is constructed based on convolutional neural networks;Target detection FIDN network is constructed based on convolutional neural networks;Predict that the present invention also provides a kind of detection systems for realizing the infrared image pedestrian detection method based on optimal models.The invention has the benefit that ensure high-precision while being able to satisfy requirement of real-time, strong robustness.
Description
Technical field
The present invention relates to a kind of image detecting method more particularly to a kind of infrared image pedestrian detections based on deep learning
Method and detection system.
Background technique
Target detection is an important project in computer vision field, main task be positioned from image it is interested
Target, need accurately to judge the specific category of each target, and provide the bounding box of each target.Due to visual angle, block,
The factors such as posture cause target, and deformation occurs, and target detection is caused to become a challenging task.
Conventional target detection method be broadly divided into pretreatment, window sliding, feature extraction, feature selecting, tagsort and
Post-process six steps.Conventional target detection generally by some preferable manual features are designed, then using classifier into
Row classification.As the requirement of target detection accuracy and speed is higher and higher, conventional method is no longer satisfied demand.In recent years,
Depth learning technology is widely used, and produces a series of algorithm of target detection, such as RCNN, Fast-RCNN,
Faster-RCNN, YOLO, SSD and its a series of derivative algorithms, but these detection techniques or since precision is low or detection is time-consuming
It is too long to be applied in commercial product well.Current algorithm of target detection is difficult meet the needs of practical application, In
Scientific research field, most of researcher only focus on target detection precision (using mAP (Mean Average Precision, average essence
Spend mean value) measurement), very complicated network can be designed and add some very complicated methods and some training skills, then open
Obtain a preferable achievement on data set, but this is difficult to be applied directly to and goes in practice.Infrared imaging is by infrared biography
The thermal imaging performance of sensor obtains image, is solely dependent upon the temperature and its heat radiated of object.Therefore at night, rain
In the insufficient situation of the light intensities such as it or haze, infrared image has apparent advantage compared to visible images.Human body target
It is all the research hotspot of target following and detection field all the time as factor main, most active in environment, and human body
Target it is non-rigid, in addition the shortcomings that infrared image itself, so that the pedestrian detection based on infrared image is filled with difficulty and chooses
War.
Summary of the invention
To solve the problems of the prior art, the present invention provides a kind of infrared image pedestrian detection side based on deep learning
Method and detection system, it is ensured that high-precision while being able to satisfy requirement of real-time.
The present invention is based on the infrared image pedestrian detection methods of deep learning to include the following steps:
Step S1: data and data prediction are obtained: obtains the infrared image comprising pedestrian, infrared image is located in advance
Reason, and pretreated infrared image is manually marked, the training set of detection model is then divided into according to setting ratio
Collect with verifying;
Step S2: based on convolutional neural networks building target detection FIDN network: the target detection FIDN network includes
Several layers convolutional layer and maximum pond layer, and be arranged in convolutional layer and the subsequent expansion convolutional layer of maximum pond layer, convolutional layer
In stacking, when port number reaches setting value, the port number for expanding convolutional layer is not further added by;
Step S3: model training: model training is carried out to target detection FIDN network using training set, and selects and is verifying
Collection shows optimal optimal models;
Step S4: optimal models prediction: being based on optimal models, predicted on GPU server, and realization flows into video
Row target detection.
The present invention is further improved, and in step S2, the target detection FIDN network further includes that self-adaptive features figure is logical
Trace weighting module, channel weighting of the setting in expansion convolutional layer output end, for the characteristic pattern to expansion convolutional layer output.
The present invention is further improved, the processing method of the self-adaptive features figure channel weighting module are as follows:
A1: using a global pool layer characteristic pattern boil down to 1*1*C, wherein the port number of C expression characteristic pattern;
A2: using full articulamentum port number boil down to C/16;
A3: by Relu activation primitive, port number is reduced to C using full articulamentum;
A4: output result connects sigmoid active coating, the weight vectors of a 1*1*C is obtained, at sigmoid function
It manages, the weight value in the weight vectors is between 0-1;
A5: characteristic pattern channel dimension is weighted using weight.
The present invention is further improved, and in step S1, the pretreatment includes median filter process, and median filtering formula is such as
Under:
G (x, y)=median { f (x-k, y-l), (k, l) ∈ W }
Wherein, f (x, y) and g (x, y) is respectively image after original image and processing, and W is two dimension pattern plate.
The present invention is further improved, and artificial mark is that the pedestrian in each picture is used rectangle using annotation tool
Circle goes out, and rectangle frame is the minimum circumscribed rectangle of target pedestrian, and the corresponding XML file generated records in figure in XML file
The coordinate of each target includes top left co-ordinate x, top left co-ordinatey, width w and height h, at the same delete picture blur or
It is difficult to the picture marked, by above-mentioned data mixing, the ratio cut partition according to 9:1 is that the training set of detection model and verifying collect.
The present invention is further improved, and in step S2, the target detection FIDN network is by 7 layers of 1*1 convolution or 3*3 volumes
The full convolutional network that network is constituted is accumulated, the candidate frame on image is directly to generate on original image, and generation method is as follows:
Original image: being directly divided into S*S region by B1, and wherein S is the size of the characteristic pattern of the last one convolution;
B2: in the different candidate frame of each Area generation several length-width ratios, specific length-width ratio is marked according to data set
Rectangle frame is obtained using k-means algorithm;
B3: being distributed according to the size that real data collection calculates priori candidate frame, use (1-IoU) as distance metric,
Middle IoU indicates the friendship of area between priori candidate frame and the rectangle frame of label and ratio, calculation formula are as follows:
Wherein, A indicates that priori candidate frame, B indicate that the rectangle frame of label, ∩ indicate the intersection of A and B, and ∪ indicates A's and B
Union.
The present invention is further improved, and the target detection FIDN network is using lightweight convolutional neural networks as backbone network
Network predicted according to algorithm of target detection using the convolution of a 1*1, the positioning loss function of the algorithm of target detection
Are as follows:
Wherein, λ is coefficient of the control positioning loss in total loss accounting, and S indicates the characteristic pattern of last convolution
Size, A indicate the number of each Area generation anchor frame,It is a 0-1 function, if there is target in the region of the i-th row j column,
Value is 1, otherwise value 0, x, y, h, and w respectively indicates the height and width of the coordinate of central point, prediction block, wherein lower marker tape ^ is indicated
It is true value, the expression predicted value not with ^.
The present invention is further improved, and in step S3, the model training refers to training of starting from scratch, and weight parameter uses
The method of random initializtion carries out data enhancement operations to data by left and right overturning, random cropping, color jitter, by not
Disconnected regularized learning algorithm rate, batch size, optimization method hyper parameter carry out training objective detection FIDN network.
The present invention is further improved, in step S4, the prediction technique are as follows: the forward direction for constructing network infers process, defeated
Enter parameter be image data, be returned as prediction result, to video carry out target detection when, be added Kalman filter carry out with
Track.
The present invention also provides a kind of detection systems for realizing the infrared image pedestrian detection method, comprising:
Obtain data module: for obtaining the infrared image comprising pedestrian;
Data preprocessing module: people is carried out for pre-processing to infrared image, and to pretreated infrared image
Then work mark is divided into the training set of detection model according to setting ratio and verifying collects;
Construct target detection FIDN network module: for constructing target detection FIDN network, institute based on convolutional neural networks
Stating target detection FIDN network includes several layers convolutional layer and maximum pond layer, and setting is behind convolutional layer and maximum pond layer
Expansion convolutional layer, in the stacking of convolutional layer, when port number reaches setting value, the port number for expanding convolutional layer is not further added by;
Model training module: it for carrying out model training to target detection FIDN network using training set, and selects and is testing
Card collection shows optimal optimal models;
Optimal models prediction module: being based on optimal models, predicted on GPU server, realizes and carries out to video flowing
Target detection.
Compared with prior art, the beneficial effects of the present invention are: taking full advantage of the high property of deep learning accuracy, Shandong
Stick is good, can adapt to the various change of external environment.By design construction FIDN network, precision with higher and extremely low
Calculation amount can achieve 180fps on GPU, have 18fps or so on CPU, ensure that the requirement of real-time, has
Very high practicability.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is target detection FIDN schematic network structure;
Fig. 3 is characterized figure channel weighting resume module method flow diagram;
Fig. 4 is former infrared image;
Fig. 5 is the image after detection.
Specific embodiment
The present invention is described in further details with reference to the accompanying drawings and examples.
As shown in Figure 1, the method for the present invention constructs FIDN (Fast-Infared-Detect-Network, fast infrared mesh
Mark detection) deep neural network, include the following steps:
Step S1: data and data prediction are obtained: obtains the infrared image comprising pedestrian, infrared image is located in advance
Reason, and pretreated infrared image is manually marked, the training set of detection model is then divided into according to setting ratio
Collect with verifying.
After obtaining the largely picture comprising pedestrian, because the usual image quality of infrared image is bad, need to do some pre- places
Then reason carries out artificial mark for the infrared image after processing, mark includes two parts, and target category and target are surrounded
Frame.
Step S2: based on convolutional neural networks building target detection FIDN network (abbreviation FIDN network): the target inspection
Surveying FIDN network includes several layers convolutional layer and maximum pond layer, and setting in convolutional layer and the subsequent expansion volume of maximum pond layer
Lamination, in the stacking of convolutional layer, when port number reaches setting value, the port number for expanding convolutional layer is not further added by;
Step S3: model training: model training is carried out to target detection FIDN network using training set, and selects and is verifying
Collection shows optimal optimal models;
Step S4: optimal models prediction: being based on optimal models, predicted on GPU server, and realization flows into video
Row target detection can achieve 180fps (video real-time detection speed, the frame number of detection per second) or more, specifically on GPU
Pre- flow gauge is shown in Fig. 3.
In step sl, the pretreatment includes median filter process.Due to by external environment and infrared camera imaging
Principle influences, and infrared image imaging process can generate more noise, cause picture imaging quality bad, and clarity is inadequate, increases
Add the difficulty to pedestrian detection and identification, so starting to pre-process image and filter out noise.Median filtering formula
It is as follows:
G (x, y)=median { f (x-k, y-l), (k, l) ∈ W }
Wherein, f (x, y) and g (x, y) is respectively image after original image and processing, and W is two dimension pattern plate, and k, l are respectively W
In two dimension value.
The artificial mark of this example refers to: all being outlined the pedestrian in each picture with rectangle frame using annotation tool, square
Shape frame is the minimum circumscribed rectangle of target pedestrian, the corresponding XML file generated.In XML file, each target in figure is recorded
Coordinate includes top left co-ordinate x, top left co-ordinate y, width w and height h, while deleting what picture blur or be difficult to marked
Picture.By above-mentioned data mixing, the ratio cut partition according to 9:1 is that the training set of detection model and verifying collect, and training set is used for mould
Type training, verifying collection is not involved in model training, for verifying the training effect of model.
In step S2, the FIDN network is the full convolutional network being made of 7 layers of 1*1 convolution or 3*3 convolutional network.
The whole flow process of this method is a single phase detector, without specially generating candidate frame, the candidate frame of this method in a network
It is directly to be generated in original image, generation method is as follows, and original image is directly divided into S*S part, and (wherein S is the last one volume
The size of long-pending characteristic pattern, usually 13*13, original image are 416*416), it is then different in 5 length-width ratios of each Area generation
Candidate frame, specific length-width ratio are to be obtained according to data set indicia framing using k-means algorithm.It is calculated according to real data collection
The size of anchors (priori candidate frame) is distributed, which is obtained by K-means algorithm, uses (1-IoU) as apart from degree
Amount, wherein IoU indicates the friendship of area and ratio between priori candidate frame and indicia framing.Calculation formula is as follows:
Wherein, A indicates that priori candidate frame, B indicate that the rectangle frame of label, ∩ indicate the intersection of A and B, and ∪ indicates A's and B
Union.
As shown in Fig. 2, wherein conv indicates that convolutional layer, Dilated conv indicate expansion convolution, maxpool is maximum value
Chi Hua, predicted portions are the convolution of a 1*1, and target detection FIDN network described in the target detection FIDN network of this example includes 5
Layer convolutional layer and maximum pond layer, and setting is in convolutional layer and the subsequent 2 expansions convolutional layer of maximum pond layer, the heap of convolutional layer
In folded, when port number reaches setting value 256, the port number 256 for expanding convolutional layer is not further added by.
Using Dilated Convolution (expansion convolution), the great advantage for expanding convolution exists most latter two convolutional layer
In the operation for not doing pond or down-sampling, receptive field can be increased, each convolution output is allowed to include large range of information,
Retain the spatial information of biggish characteristic pattern and image as far as possible simultaneously, this is very crucial for small target deteection.For target
Test problems, can great retaining space information using expansion convolution.When using expansion convolution, since characteristic pattern does not reduce,
Calculation amount can be significantly greatly increased in this, different from general network structure, and FIDN network in the last one module, lead to by all convolution
Road number is both configured to 256, and due to having compressed the number of plies, we attached a self-adaptive features figure channel after this layer of convolution and add
Module is weighed, self-adaptive features figure channel weighting module, setting is in expansion convolutional layer output end, for expansion convolutional layer output
The channel weighting of characteristic pattern.
As shown in figure 3, the processing method of the self-adaptive features figure channel weighting module are as follows:
A1: using a global pool layer characteristic pattern boil down to 1*1*C, wherein C indicates the port number of characteristic pattern, this
Place is 256;
A2: using full articulamentum port number boil down to C/16;
A3: connecing Relu activation primitive again, and by Relu activation primitive, port number is reduced to C using full articulamentum;
A4: output result connects sigmoid active coating, is equivalent to have obtained the weight vectors of a 1*1*C, passes through
Sigmoid function is handled, and the weight value in the weight vectors is between 0-1, as the output characteristic pattern of convolutional layer before
Channel weighting allows network oneself to learn the weight in channel, because there is different role in channel different in characteristic pattern so multichannel
With different significance levels;
A5: being weighted characteristic pattern channel dimension using weight,
In Fig. 3, conv indicates that convolutional layer, avgpool indicate that average pond layer, fc indicate full articulamentum, and ReLU expression makes
Use relu function as activation primitive, Sigmoid expression uses sigmoid function as active coating.ReWeight indicates basis
The weight that the right branch obtains is weighted characteristic pattern channel dimension.
It is demonstrated experimentally that the convolutional layer port number is 256 (being denoted as FIDN-256 network) and port number is 1024 (to be denoted as
FIDN-1024 network) it compares, on self-built data set, detection accuracy is respectively 80.1% (FIDN-256 network) and 80.6%
(FIND-1024 network).As shown in Figure 2, whole network is using lightweight convolutional neural networks as bone for entire FIDN network structure
Dry network, detection part is similar with a most of common step algorithm of target detection, is predicted using a full articulamentum,
FIDN is predicted using the convolution of a 1*1.This example is improved in the loss function part of network, in algorithm of target detection
In, loss function generally comprises two parts, respectively positioning loss and Classification and Identification loss.Positioning is lost, it is contemplated that
Influence of the different size of target detection frame to loss be it is different, therefore, this example be provided with following positioning loss function:
Wherein, λ is a control positioning loss in the coefficient of total loss accounting, and default is 5, because positioning loss is opposite
Classification Loss is more important, so accounting is heavier.S indicates the size of the characteristic pattern of last convolution, and A indicates each Area generation anchor frame
Number, default is 5,It is a 0-1 function, if there is a target in the region of the i-th row j column, value 1, otherwise value
0.x, y, h, w respectively indicate the coordinate of central point and the height and width of prediction block, wherein the ^ expression of lower marker tape is true value, no band
Expression predicted value.
In step S3, the model training refers to training of starting from scratch, because network is smaller, training of directly starting from scratch
Quickly, there is no over-fitting risk yet, be trained on data set directly in step sl, weight parameter is all using random yet
The method of initialization carries out the data enhancement operations such as flip horizontal, random cropping, color jitter, continuous regularized learning algorithm to data
The hyper parameters such as rate, batch size (batch_size), optimization method train FIDN network.
The optimal models are: in training process, every by 1 wheel, (1 wheel refers to that all pictures are all trained to one in data set
It is secondary) model of storage, ordinary circumstance, 60 wheel of training.And by the model in verifying collection test, according to the essence of pedestrian detection
It spends mAP and selects optimal models.
In step S4, the prediction technique is: the forward direction for constructing network infers process, and forward direction infers the network knot of process
Structure is process that is identical, only losing without calculating loss and passback with structure when training.Input parameter be image data,
It is returned as prediction result, input picture does a simple pretreatment, is then passed to the input of network, which can be adaptive
The picture of any size, network internal can scale automatically.And can centainly be post-processed, target detection is being carried out to video
When, it is tracked by the way that Kalman filter is added, so that detection process is more smooth and stablizes.To Fig. 4 by of the invention
The result of object detection method detection is as shown in Figure 5.
Of the invention takes full advantage of the high property of deep learning accuracy based on the infrared pedestrian detection method of deep learning,
Robustness is good, can adapt to the various change of external environment.By design construction FIDN network, the network have higher precision and
Extremely low calculation amount, can achieve 180fps on GPU, have 18fps or so on CPU, ensure that wanting for real-time
It asks, there is very high practicability.
The present invention has following two points main innovation point:
(1) new target detection network FIDN is designed.Method proposes a kind of new efficient target detection networks, are used for
Infrared image pedestrian detection is a kind of single phase object detection method, and the priori for obtaining data set by k-means method is candidate
Then the distribution of frame carries out the positioning of target frame using the method returned.It (does not include channel that whole network, which only has 7 convolutional layers,
The part of weighting), comprising some convolutional layers and maximum pond layer, then do not reduce the size of characteristic pattern using expansion convolution finally
It is helpful to the precision improvement of pedestrian detection with enough receptive fields.In the stacking of convolutional layer, there is no as general networks that
The progress of one straight grip port number of sample is double, and when port number is 256, port number is not just further added by, and can greatly reduce calculating in this way
Amount.
(2) self-adaptive features figure channel weighting method is designed.Since in planned network, no picture Normal practice is to channel
Number progress is double, and this reduces characteristic pattern port numbers, can there is certain influence on effect, and the present invention devises one adaptively
The method of characteristic pattern channel weighting, it is several hundred or even thousands of because the port number of characteristic pattern is usually very much, but the letter of their offers
Breath and significance level are different, and the self-adaptive features figure channel weighting method that the present invention designs can pass through network oneself
Learn a set of weighting parameters out, be then dissolved into characteristic pattern, and this method has certain versatility, may be added to very much
In network, part convolutional layer can be added to unrestricted choice followed by characteristic pattern channel weighting.
The specific embodiment of the above is better embodiment of the invention, is not limited with this of the invention specific
Practical range, the scope of the present invention includes being not limited to present embodiment, all equal according to equivalence changes made by the present invention
Within the scope of the present invention.
Claims (10)
1. a kind of infrared image pedestrian detection method based on deep learning, which is characterized in that the infrared image pedestrian detection
Method includes the following steps:
Step S1: data and data prediction are obtained: obtain the infrared image comprising pedestrian, infrared image is pre-processed,
And pretreated infrared image is manually marked, be then divided into the training set of detection model according to setting ratio and is tested
Card collection;
Step S2: construct target detection FIDN network based on convolutional neural networks: the target detection FIDN network includes several
Layer convolutional layer and maximum pond layer, and setting is in convolutional layer and the subsequent expansion convolutional layer of maximum pond layer, the stacking of convolutional layer
In, when port number reaches setting value, the port number for expanding convolutional layer is not further added by;
Step S3: model training: model training is carried out to target detection FIDN network using training set, and is selected in verifying collection table
Existing optimal optimal models;
Step S4: optimal models prediction: being based on optimal models, predicted on GPU server, realizes and carries out mesh to video flowing
Mark detection.
2. infrared image pedestrian detection method according to claim 1, it is characterised in that: in step S2, the target inspection
Surveying FIDN network further includes self-adaptive features figure channel weighting module, and setting is in expansion convolutional layer output end, for rolling up to expansion
The channel weighting of the characteristic pattern of lamination output.
3. infrared image pedestrian detection method according to claim 2, it is characterised in that: self-adaptive features figure channel
The processing method of weighting block are as follows:
A1: using a global pool layer characteristic pattern boil down to 1*1*C, wherein the port number of C expression characteristic pattern;
A2: using full articulamentum port number boil down to C/16;
A3: by Relu activation primitive, port number is reduced to C using full articulamentum;
A4: output result connects sigmoid active coating, obtains the weight vectors of a 1*1*C, handles by sigmoid function, institute
The weight value in weight vectors is stated between 0-1;
A5: characteristic pattern channel dimension is weighted using weight.
4. infrared image pedestrian detection method according to claim 1-3, it is characterised in that: in step S1, institute
Stating pretreatment includes median filter process, and median filtering formula is as follows:
G (x, y)=median fx-k, y-l), (k, l) ∈ W }
Wherein, f (x, y) and g (x, y) is respectively image after original image and processing, and W is two dimension pattern plate.
5. infrared image pedestrian detection method according to claim 4, it is characterised in that: artificial mark is using mark work
Tool all outlines the pedestrian in each picture with rectangle frame, and rectangle frame is the minimum circumscribed rectangle of target pedestrian, corresponding to generate
XML file, in XML file, record figure in each target coordinate, include top left co-ordinate x, top left co-ordinate y, width
W and height h, while deleting picture blur or being difficult to the picture marked, by above-mentioned data mixing, according to the ratio cut partition of 9:1
For the training set and verifying collection of detection model.
6. infrared image pedestrian detection method according to claim 5, it is characterised in that: in step S2, the target inspection
The full convolutional network that FIDN network is made of 7 layers of 1*1 convolution or 3*3 convolutional network is surveyed, the candidate frame on image is direct
It is generated on original image, generation method is as follows:
Original image: being directly divided into S*S region by B1, and wherein S is the size of the characteristic pattern of the last one convolution;
B2: in the different candidate frame of each Area generation several length-width ratios, the rectangle that specific length-width ratio is marked according to data set
Frame is obtained using k-means algorithm;
B3: it is distributed according to the size that real data collection calculates priori candidate frame, uses (1-IoU) as distance metric, wherein IoU
It indicates the friendship of area between priori candidate frame and the rectangle frame of label and ratio, calculation formula is as follows:
Wherein, A indicates that priori candidate frame, B indicate that the rectangle frame of label, ∩ indicate the intersection of A and B, and ∪ indicates the union of A and B.
7. infrared image pedestrian detection method according to claim 6, it is characterised in that: the target detection FIDN network
Using lightweight convolutional neural networks as backbone network, according to algorithm of target detection, predicted using the convolution of a 1*1,
The positioning loss function of the algorithm of target detection are as follows:
Wherein, λ is coefficient of the control positioning loss in total loss accounting, and S indicates the size of the characteristic pattern of last convolution,
A indicates the number of each Area generation anchor frame,It is a 0-1 function, if there is target in the region of the i-th row j column, value is
1, otherwise value 0, x, y, h, w respectively indicate the height and width of the coordinate of central point, prediction block, wherein lower marker tape ^ expression is true
Value, the expression predicted value not with ^.
8. infrared image pedestrian detection method according to claim 1-3, it is characterised in that: in step S3, institute
It states model training and refers to training of starting from scratch, the method that weight parameter uses random initializtion passes through left and right overturning, random sanction
Cut, color jitter to data carry out data enhancement operations, pass through continuous regularized learning algorithm rate, batch size, the super ginseng of optimization method
Number carrys out training objective and detects FIDN network.
9. infrared image pedestrian detection method according to claim 8, it is characterised in that: in step S4, the prediction side
Method are as follows: the forward direction for constructing network infers that process, input parameter are image data, are returned as prediction result, is carrying out mesh to video
When mark detection, Kalman filter is added and is tracked.
10. a kind of detection system for realizing the described in any item infrared image pedestrian detection methods of claim 1-9, feature exist
In, comprising:
Obtain data module: for obtaining the infrared image comprising pedestrian;
Data preprocessing module: it is manually marked for being pre-processed to infrared image, and to pretreated infrared image
Then note is divided into the training set of detection model according to setting ratio and verifying collects;
Construct target detection FIDN network module: for constructing target detection FIDN network, the mesh based on convolutional neural networks
Mark detection FIDN network includes several layers convolutional layer and maximum pond layer, and setting in convolutional layer and the subsequent expansion of maximum pond layer
Convolutional layer, in the stacking of convolutional layer, when port number reaches setting value, the port number for expanding convolutional layer is not further added by;
Model training module: it for carrying out model training to target detection FIDN network using training set, and selects and collects in verifying
Show optimal optimal models;
Optimal models prediction module: being based on optimal models, predicted on GPU server, realizes and carries out target to video flowing
Detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910716970.4A CN110472542A (en) | 2019-08-05 | 2019-08-05 | A kind of infrared image pedestrian detection method and detection system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910716970.4A CN110472542A (en) | 2019-08-05 | 2019-08-05 | A kind of infrared image pedestrian detection method and detection system based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110472542A true CN110472542A (en) | 2019-11-19 |
Family
ID=68509998
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910716970.4A Pending CN110472542A (en) | 2019-08-05 | 2019-08-05 | A kind of infrared image pedestrian detection method and detection system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472542A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105372A (en) * | 2019-12-10 | 2020-05-05 | 北京都是科技有限公司 | Thermal infrared image processor, system, method and apparatus |
CN111259736A (en) * | 2020-01-08 | 2020-06-09 | 上海海事大学 | Real-time pedestrian detection method based on deep learning in complex environment |
CN112101434A (en) * | 2020-09-04 | 2020-12-18 | 河南大学 | Infrared image weak and small target detection method based on improved YOLO v3 |
CN112102394A (en) * | 2020-09-17 | 2020-12-18 | 中国科学院海洋研究所 | Remote sensing image ship size integrated extraction method based on deep learning |
CN112307955A (en) * | 2020-10-29 | 2021-02-02 | 广西科技大学 | Optimization method based on SSD infrared image pedestrian detection |
CN112464884A (en) * | 2020-12-11 | 2021-03-09 | 武汉工程大学 | ADAS infrared night vision method and system |
CN112488165A (en) * | 2020-11-18 | 2021-03-12 | 杭州电子科技大学 | Infrared pedestrian identification method and system based on deep learning model |
CN112733589A (en) * | 2020-10-29 | 2021-04-30 | 广西科技大学 | Infrared image pedestrian detection method based on deep learning |
CN112949633A (en) * | 2021-03-05 | 2021-06-11 | 中国科学院光电技术研究所 | Improved YOLOv 3-based infrared target detection method |
CN113159277A (en) * | 2021-03-09 | 2021-07-23 | 北京大学 | Target detection method, device and equipment |
CN113408471A (en) * | 2021-07-02 | 2021-09-17 | 浙江传媒学院 | Non-green-curtain portrait real-time matting algorithm based on multitask deep learning |
CN114299429A (en) * | 2021-12-24 | 2022-04-08 | 宁夏广天夏电子科技有限公司 | Human body recognition method, system and device based on deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106096561A (en) * | 2016-06-16 | 2016-11-09 | 重庆邮电大学 | Infrared pedestrian detection method based on image block degree of depth learning characteristic |
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN109086678A (en) * | 2018-07-09 | 2018-12-25 | 天津大学 | A kind of pedestrian detection method extracting image multi-stage characteristics based on depth supervised learning |
US20190114511A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN109902677A (en) * | 2019-01-30 | 2019-06-18 | 深圳北斗通信科技有限公司 | A kind of vehicle checking method based on deep learning |
CN109961009A (en) * | 2019-02-15 | 2019-07-02 | 平安科技(深圳)有限公司 | Pedestrian detection method, system, device and storage medium based on deep learning |
-
2019
- 2019-08-05 CN CN201910716970.4A patent/CN110472542A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106096561A (en) * | 2016-06-16 | 2016-11-09 | 重庆邮电大学 | Infrared pedestrian detection method based on image block degree of depth learning characteristic |
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
US20190114511A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN109086678A (en) * | 2018-07-09 | 2018-12-25 | 天津大学 | A kind of pedestrian detection method extracting image multi-stage characteristics based on depth supervised learning |
CN109902677A (en) * | 2019-01-30 | 2019-06-18 | 深圳北斗通信科技有限公司 | A kind of vehicle checking method based on deep learning |
CN109961009A (en) * | 2019-02-15 | 2019-07-02 | 平安科技(深圳)有限公司 | Pedestrian detection method, system, device and storage medium based on deep learning |
Non-Patent Citations (2)
Title |
---|
张顺 等: "深度卷积神经网络的发展及其在计算机视觉领域的应用", 《计算机学报》 * |
耿磊 等: "结合深度可分离卷积与通道加权的全卷积神经网络视网膜图像血管分割" * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105372A (en) * | 2019-12-10 | 2020-05-05 | 北京都是科技有限公司 | Thermal infrared image processor, system, method and apparatus |
CN111259736A (en) * | 2020-01-08 | 2020-06-09 | 上海海事大学 | Real-time pedestrian detection method based on deep learning in complex environment |
CN111259736B (en) * | 2020-01-08 | 2023-04-07 | 上海海事大学 | Real-time pedestrian detection method based on deep learning in complex environment |
CN112101434B (en) * | 2020-09-04 | 2022-09-09 | 河南大学 | Infrared image weak and small target detection method based on improved YOLO v3 |
CN112101434A (en) * | 2020-09-04 | 2020-12-18 | 河南大学 | Infrared image weak and small target detection method based on improved YOLO v3 |
CN112102394A (en) * | 2020-09-17 | 2020-12-18 | 中国科学院海洋研究所 | Remote sensing image ship size integrated extraction method based on deep learning |
CN112307955A (en) * | 2020-10-29 | 2021-02-02 | 广西科技大学 | Optimization method based on SSD infrared image pedestrian detection |
CN112733589A (en) * | 2020-10-29 | 2021-04-30 | 广西科技大学 | Infrared image pedestrian detection method based on deep learning |
CN112488165A (en) * | 2020-11-18 | 2021-03-12 | 杭州电子科技大学 | Infrared pedestrian identification method and system based on deep learning model |
CN112464884A (en) * | 2020-12-11 | 2021-03-09 | 武汉工程大学 | ADAS infrared night vision method and system |
CN112949633A (en) * | 2021-03-05 | 2021-06-11 | 中国科学院光电技术研究所 | Improved YOLOv 3-based infrared target detection method |
CN112949633B (en) * | 2021-03-05 | 2022-10-21 | 中国科学院光电技术研究所 | Improved YOLOv 3-based infrared target detection method |
CN113159277A (en) * | 2021-03-09 | 2021-07-23 | 北京大学 | Target detection method, device and equipment |
CN113408471A (en) * | 2021-07-02 | 2021-09-17 | 浙江传媒学院 | Non-green-curtain portrait real-time matting algorithm based on multitask deep learning |
CN113408471B (en) * | 2021-07-02 | 2023-03-28 | 浙江传媒学院 | Non-green-curtain portrait real-time matting algorithm based on multitask deep learning |
CN114299429A (en) * | 2021-12-24 | 2022-04-08 | 宁夏广天夏电子科技有限公司 | Human body recognition method, system and device based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110472542A (en) | A kind of infrared image pedestrian detection method and detection system based on deep learning | |
CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
CN105069746B (en) | Video real-time face replacement method and its system based on local affine invariant and color transfer technology | |
CN109902677A (en) | A kind of vehicle checking method based on deep learning | |
CN107067415B (en) | A kind of object localization method based on images match | |
CN110889324A (en) | Thermal infrared image target identification method based on YOLO V3 terminal-oriented guidance | |
CN109740665A (en) | Shielded image ship object detection method and system based on expertise constraint | |
CN108460403A (en) | The object detection method and system of multi-scale feature fusion in a kind of image | |
CN107204010A (en) | A kind of monocular image depth estimation method and system | |
CN104794737B (en) | A kind of depth information Auxiliary Particle Filter tracking | |
CN109934862A (en) | A kind of binocular vision SLAM method that dotted line feature combines | |
CN107330357A (en) | Vision SLAM closed loop detection methods based on deep neural network | |
CN110533695A (en) | A kind of trajectory predictions device and method based on DS evidence theory | |
CN114220035A (en) | Rapid pest detection method based on improved YOLO V4 | |
CN104573731A (en) | Rapid target detection method based on convolutional neural network | |
CN110795982A (en) | Apparent sight estimation method based on human body posture analysis | |
CN110175504A (en) | A kind of target detection and alignment schemes based on multitask concatenated convolutional network | |
CN106023257A (en) | Target tracking method based on rotor UAV platform | |
CN106599994A (en) | Sight line estimation method based on depth regression network | |
CN106991411B (en) | Remote Sensing Target based on depth shape priori refines extracting method | |
CN110197152A (en) | A kind of road target recognition methods for automated driving system | |
CN109887029A (en) | A kind of monocular vision mileage measurement method based on color of image feature | |
CN109344878A (en) | A kind of imitative hawk brain feature integration Small object recognition methods based on ResNet | |
CN110245587B (en) | Optical remote sensing image target detection method based on Bayesian transfer learning | |
CN114036969B (en) | 3D human body action recognition algorithm under multi-view condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |
|
RJ01 | Rejection of invention patent application after publication |