CN110795991A - Mining locomotive pedestrian detection method based on multi-information fusion - Google Patents

Mining locomotive pedestrian detection method based on multi-information fusion Download PDF

Info

Publication number
CN110795991A
CN110795991A CN201910860797.5A CN201910860797A CN110795991A CN 110795991 A CN110795991 A CN 110795991A CN 201910860797 A CN201910860797 A CN 201910860797A CN 110795991 A CN110795991 A CN 110795991A
Authority
CN
China
Prior art keywords
image
model
target detection
training
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910860797.5A
Other languages
Chinese (zh)
Other versions
CN110795991B (en
Inventor
张传伟
罗坤鑫
陈黎明
夏占
卢强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Science and Technology
Original Assignee
Xian University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Science and Technology filed Critical Xian University of Science and Technology
Priority to CN201910860797.5A priority Critical patent/CN110795991B/en
Publication of CN110795991A publication Critical patent/CN110795991A/en
Application granted granted Critical
Publication of CN110795991B publication Critical patent/CN110795991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a mine locomotive pedestrian detection method based on multi-information fusion, which comprises the steps of establishing a pedestrian data set of visible light and infrared light under a coal mine, respectively carrying out CLAHE finite contrast adaptive histogram equalization and optimal notch denoising method processing based on a 3-order Butterworth function on two types of images aiming at a special underground environment, adjusting and improving a YOLOv3 target detection network by adopting a dense connection and multi-scale pooling structure, extracting and fusing the characteristics of the two types of images, optimizing a loss function by adopting a cross entropy function, establishing a model, improving the accuracy, real-time property and stability of the model under the complex underground environment, further improving the training precision of the network by adopting a transfer learning method, shortening the training time, and further detecting a pedestrian target by using the target detection model obtained by training, and the pedestrian detection result in front of the mining locomotive is output in real time, so that the detection speed required by the running of the mining locomotive can be met.

Description

Mining locomotive pedestrian detection method based on multi-information fusion
Technical Field
The invention relates to the technical field of underground coal mine detection, in particular to a mine locomotive pedestrian detection method based on multi-information fusion.
Background
With the continuous rise of the coal mine resource market, the underground transportation task is heavier. The data show that the rate of safety accidents caused by mine locomotive transportation accounts for 20% -30% of the total accidents due to the influence of dust, illumination and other factors of the operation environment. Transportation accidents may occur due to fatigue driving of drivers, improper operation or illegal operation of miners during operation of mine locomotives, so that serious harm is brought to life safety of the miners, and meanwhile, great loss is brought to production efficiency of the miners.
Generally, installing an object detection device on a locomotive under a coal mine is a main means for reducing accidents. And in the image acquisition stage, single visible light sensor is easily influenced by light, and is relatively poor to the penetrability of tiny granule, is difficult to adapt to complicated environment in the pit. The infrared sensor is less affected by dark light and dust, and the defects of the visible light sensor can be well made up. The current image processing technology has the technical defects of low processing precision, low processing speed and the like.
TABLE 1 comparison of visible and infrared sensors
Figure BDA0002199695250000011
Therefore, the advantages of visible light and infrared light are fully fused, and the convolutional neural network technology is combined to be applied to pedestrian target detection in front of the mine locomotive in order to prevent the occurrence of the accident of collision of the locomotive.
Disclosure of Invention
In order to solve the problems, the invention provides a method for detecting pedestrians by a mining locomotive based on multi-information fusion, which can overcome the defect that the existing visible light sensor is easily influenced by light and dust in a special underground environment, has strong adaptability and anti-interference capability, and further improves the detection precision and real-time performance.
In order to achieve the purpose, the invention provides a mining locomotive pedestrian detection method based on multi-information fusion, which comprises the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting a model with the highest detection precision as an optimal model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
Optionally, the YOLOv3 target detection network is improved by adopting dense connection, including performing jump connection on a 52 × 52 × 256 feature map in the network, so that the adjusted feature map is overlapped with the subsequent two feature maps 26 × 26 × 512 and 13 × 13 × 512; the 26 × 26 × 512 feature map is then superimposed with the two subsequent feature maps 13 × 13 × 512 and 26 × 26 × 256, whereas the 13 × 13 × 512 feature map is superimposed with only one subsequent feature map 26 × 26 × 256.
Optionally, the YOLOv3 target detection network is improved by adopting a multi-scale pooling structure, which includes extracting 4 feature maps with different sizes from 13 × 13 × 512, 26 × 26 × 256 and 52 × 52 × 128 feature maps in the network through 4 pooling layers with different scales, combining context information of global and sub-regions, and then combining the 4 feature maps with original features to form a final feature expression, thereby performing convolution output.
Optionally, the optimizing the loss function includes defining class loss by using a cross entropy loss function, so that the model is more easily fitted, that is, the modified loss function is as follows: (1)
Figure BDA0002199695250000031
where S represents the network size and is 13 × 13, 26 × 26, or 52 × 52, B is the number of candidate frames, and the variable xiAnd yiAs coordinates of the center point of the candidate frame, wiAnd hiWidth and height of the bounding box, respectively, CiIs the confidence of the predicted object, p(ij)Is a class, variable, of an objectIs a predicted value;
Figure BDA0002199695250000033
representing the presence of an object in grid i;
indicating the presence of an object in bounding box j in mesh i;
Figure BDA0002199695250000035
indicating that no object is present in bounding box j in mesh i.
Optionally, the expanding each data set to multiple scales by the image scaling method is to scale each image to 10 sizes using the image scaling method: {320, 352, 384, 416,448,480,512,544,576,608}.
Optionally, in step 1, for the visible light image, processing the visible light image by using a CLAHE finite contrast adaptive histogram equalization method; and for the infrared light image, denoising the infrared light image by adopting a filtering method of an optimal notch filter based on a 3-order Butterworth function.
Optionally, the denoising processing of the infrared image by using a filtering method based on an optimal notch filter of a 3-order butterworth function includes:
firstly, Fourier transform is carried out on an infrared light image G (x, y) containing periodic noise to obtain a frequency spectrum image G (u, v) of the infrared light image G;
a5-pair 3-order Butterworth notch band-pass filter H (u, v) is placed at the position of a noise peak and used for extracting a main frequency part of noise, and the mathematical expression of the filter is as follows:
Figure BDA0002199695250000041
wherein, for a notch, its center point coordinate is (u)k,vk) Then a distance D from the center of the filterk(u, v) and the coordinates of the center point of the notch, which is symmetrical about the origin, are (-u)k,-vk) At a distance D from the center of the filter-k(u, v), W is the width of the band, D0Is the center radius of the frequency band, k is a natural number;
the spectral image of the extracted noise can be represented as:
N(u,v)=H(u,v)G(u,v) (3);
and (3) obtaining a corresponding spatial domain image n (x, y) by carrying out inverse Fourier transform on the frequency spectrum image of the noise:
weighting and adjusting the noise by using a w (x, y) modulation function, and then subtracting the modulated noise image from the infrared image containing the noise in the spatial domain to obtain an estimate of a denoised image
Figure BDA0002199695250000043
Figure BDA0002199695250000044
The modulation function is then minimizedThe minimum of the variance over a given neighborhood (2a +1) (2b +1) of each point (x, y), namely:
Figure BDA0002199695250000046
wherein s and t are variables,
Figure BDA0002199695250000047
and
Figure BDA0002199695250000048
respectively, as the average of their functions.
The second derivative is made to be zero, and then the modulation function can be obtained:
Figure BDA0002199695250000051
and finally, obtaining the denoised space domain infrared image through the step 4.
Optionally, in step 5, the improved YOLOv3 target detection network is trained by setting different learning rates, weight attenuation coefficients and momentum coefficients, and then 10 different models are generated simultaneously.
Optionally, in step 6, the 10 models are verified on the cross-validation set to obtain their respective loss function values, and the model corresponding to the minimum value is determined as the optimal model.
In addition, the present invention also provides an electronic device including:
a memory for storing a computer program;
and the processor is used for executing the computer program stored in the memory, and when the computer program is executed, the pedestrian detection method for the mining locomotive based on multi-information fusion is realized.
The invention has the advantages and beneficial effects that: compared with the existing mine locomotive pedestrian detection technology, the invention provides a mine locomotive pedestrian detection method based on multi-information fusion, which has the following advantages:
(1) by utilizing a multi-sensor fusion technology, the defect of a single visible light sensor is overcome, and the method can adapt to complex underground coal mine environments.
(2) For visible light, a CLAHE finite contrast adaptive histogram equalization method is adopted to process the visible light image, and the problem that the visible light image is weak in detail under the dark light condition is effectively solved.
(3) For the infrared image, the filtering method of the optimal notch filter based on the 3-order Butterworth function is adopted to carry out denoising processing on the infrared image, and the influence of periodic noise is effectively reduced.
(4) By adopting the idea of dense connection, the characteristic graphs of different levels are overlapped in a jumping mode, so that the characteristic multiplexing is realized, the training parameters are effectively reduced, the performance of back propagation is improved, and the model is easier to train.
(5) By introducing a multi-scale pooling model structure, the network can be combined with feature maps of different levels, the context information of the whole area and the sub-area can be combined, and the detection precision can be effectively improved.
(6) The training precision of the network is improved and the training time is shortened by using the transfer learning technology.
(7) The method lays a foundation for the application of auxiliary driving of the mining locomotive and the like in the future.
Drawings
Fig. 1 is a flow chart of a training phase of a multi-information fusion mining locomotive pedestrian detection method in an embodiment of the invention.
FIG. 2 is a flow chart of a detection phase of a multi-information fusion mining locomotive pedestrian detection method in an embodiment of the invention.
Fig. 3 is a schematic diagram of the convolutional layer of the improved YOLOv3 network structure of the present invention.
Fig. 4 is a schematic diagram of a residual block of the improved YOLOv3 network structure in the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In one embodiment, as shown in fig. 1, the invention provides a mining locomotive pedestrian detection method based on multi-information fusion, which includes the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting a model with the highest detection precision as an optimal model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, as shown in fig. 2, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
Among them, a real-time object detection method is a youonly look once (YOLO) network structure of YOLOv3 (v 3 is version three), and the position of the bounding box and the category to which the bounding box belongs are directly regressed in the output layer by using the whole graph as the input of the network. The improved yollov 3 network structure is composed of a Convolutional Layer (conditional Layer) of 1x1 (default step size is 1), 3x3 and 3x3/2 (step size is 2), a Residual Block (Residual Block), a multi-scale average pooling Layer (averaging Pooling), an upsampling (Up Sample), and a feature map connection (collocation) as a whole.
FIG. 3 is a schematic view of a convolutional layer. The convolutional layer consists of a convolution, batch normalization and leakage ReLU activation function (linear unit function with leakage correction). The convolution is used for extracting features, batch normalization is used for improving the convergence efficiency of the algorithm and accelerating fitting, and the Leaky ReLU activation function is used for realizing the nonlinear operation of the network.
Fig. 4 is a schematic diagram of a residual block. The input x is output as F (x) after passing through two volumes, namely blocks, and the input x is directly connected with the output F (x) by introducing a jump connection to form the final output F (x) + x.
Taking a three-channel 416 × 416 color image as an example, the input image is subjected to four sets of convolution (32 3 × 3 convolutional layers, 64 3 × 3/2 convolutional layers, 32 1 × 1 convolutional layers, and 64 3 × 3 convolutional layers in sequence), and a feature map (feature map) with an output of 208 × 208 × 64 is obtained by calculation according to the following formula:
Figure BDA0002199695250000071
where n denotes the size of the input, p denotes the number of fills, f denotes the size of the convolutional or pooling layer, s denotes the step size (Stride), and out is the size of the output.
The method sequentially uses the obtained 208 × 208 × 64 feature map as input to obtain a 52 × 52 × 256 feature map through the subsequent 22 sets of convolutional layers, obtains a 26 × 26 × 512 feature map through the subsequent 17 convolutional layers for the 52 × 52 × 256 feature map, and obtains a 13 × 13 × 512 feature map through the subsequent 14 convolutional layers.
The 13 × 13 × 512 feature map is subjected to 5-layer convolution to obtain a 13 × 13 × 256 feature map, and then the feature map is up-sampled to 26 by using a bilinear difference algorithm, so that a 26 × 26 × 256 feature map is obtained. And obtaining a 52 × 52 × 128 feature map through 5-layer convolution and upsampling in a sequential method.
The 52 × 52 × 256 feature maps are subjected to skip connection, and the first connection is reduced to 26 × 26 × 128 by using 128 convolution layers of 3 × 3/2 and is overlapped with the subsequent first feature map. The second connection takes 128 convolutional layers of 3 × 3/2 and 64 3 × 3/2 to be dimensionality reduced to 13 × 13 × 64 and overlaid with the subsequent second feature map.
Similarly, the 26 × 26 × 512 feature maps are subjected to skip connection, adjusted by the convolution layer, and then superimposed on the two subsequent feature maps. Whereas the 13 x 512 signature is superimposed with only a subsequent one.
The 13 × 13 × 512 feature map is reduced in size by a multi-scale pooling method, that is, 4 feature maps are simultaneously output through 4 pooling layers (a global average pooling layer, a 5 × 5/4 average pooling layer of 7 × 7/6, and a 3 × 3/2 average pooling layer) and then are input and simultaneously pass through 128 1 × 1 convolutional layers to respectively obtain 4 feature maps, the feature maps are uniformly enlarged to 13 × 13 × 128 by an upsampling method, then are fused with the previous 13 × 13 × 512 feature map by a connection method, and finally the fused feature maps pass through 18 1 × 1 convolutional layers to obtain an output with the size of 13 × 13 × 18.
By using the above method for the 26 × 26 × 256 and 52 × 52 × 128 feature maps, 26 × 26 × 18 and 52 × 52 × 18 outputs can be obtained, respectively.
The first two-dimensional vector in the output represents the number of grids which divide the input image, each grid complexly detects an object of which the center position falls into, and each grid can simultaneously detect 3 kinds of objects by adopting a priori frame method, each object contains information of size, coordinates, confidence coefficient and class probability, and the total number of the objects is 5 parameters, so that the total number of the parameters is 18.
The method is characterized in that a traditional manual selection method is replaced, a modified K-means algorithm is adopted to aggregate a data set to obtain the size and the number of prior frames, and original distance measurement in the K-means algorithm is modified into IOU according to the following formula, namely distance measurement d is represented by subtracting the intersection ratio of all bounding boxes box and clustering center bounding box centroids from 1.
d(box,centroid)=1-IOU(box,centroid) (9);
For each output (3 total) 3 prior boxes were set, for a total of 9 sizes of prior boxes clustered. In assignment, a larger prior box is applied on the smallest 13 x 13 signature (with the largest receptive field) for detecting larger objects. Medium prior frames were applied on medium 26 x 26 signature (medium receptive field) for detection of medium sized objects. A smaller a priori box is applied on the larger 52 x 52 signature (smaller field of reception) for detecting smaller objects.
In the process of calculating the gradient decline of the weight parameters, the square sum function used by the original category loss is probably not a convex function, namely, a plurality of local optimal values exist, and a global optimal solution is difficult to obtain, so that the plan adopts a cross entropy function to replace the original square sum function to define the category loss. The modified loss function is as follows: (1)
Figure BDA0002199695250000091
where S represents the network size and is 13 × 13, 26 × 26, or 52 × 52, B is the number of candidate frames (3), and the variable xiAnd yiAs coordinates of the center point of the candidate frame, wiAnd hiWidth and height of the bounding box, respectively, CiIs the confidence of the predicted object, p(ij)Is a class, variable, of an objectIs a predicted value;
Figure BDA0002199695250000093
indicating the presence of an object in grid i, meaning that the grid in which the object is present accounts for the error.
Figure BDA0002199695250000094
Indicating the presence of an object in bounding box j in grid i, means that only the bounding box data that is "responsible" (relatively large) for the prediction will be subject to error.
-indicating that no object is present in the bounding box j in the mesh i.
Since there are some large objects or objects near multiple grids that can be detected by multiple grids at the same time, multiple bounding boxes are output, where the result of output repetition is filtered by using a non-maximum suppression method.
The method comprises the steps of extracting visible light and infrared light pedestrian videos recorded in front of a mining locomotive into images, processing the visible light images by a CLAHE finite contrast adaptive histogram equalization method, namely dividing the images by an 8 x 8 network, then respectively performing histogram equalization on each small block, and finally stitching the small blocks by using a bilinear difference algorithm in order to remove boundaries caused by the algorithm among the small blocks.
For the infrared image, because the infrared image is easily interfered by periodic noise, the infrared image is denoised by adopting a filtering method of an optimal notch filter based on a 3-order Butterworth function, namely:
firstly, Fourier transform is carried out on the infrared light image G (x, y) containing periodic noise to obtain a frequency spectrum image G (u, v).
According to the noise characteristics of the spectrum image, a 5-pair 3-order Butterworth notch band-pass filter H (u, v) is placed at the position of a noise peak and used for extracting the main frequency part of noise, and the mathematical expression of the filter is as follows:
Figure BDA0002199695250000102
wherein, for a notch, its center point coordinate is (u)k,vk) Then a distance D from the center of the filterk(u, v) and the coordinates of the center point of the notch, which is symmetrical about the origin, are (-u)k,-vk) At a distance D from the center of the filter-k(u, v), W is the width of the band, D0Is the center radius of the frequency band, k is a natural number;
the spectral image of the extracted noise can be represented as:
N(u,v)=H(u,v)G(u,v) (3);
the corresponding spatial domain image n (x, y) can be obtained by the frequency spectrum image of the noise through inverse Fourier transform
Since this process usually only yields an approximation of the noise, the noise is weighted with a w (x, y) modulation function and then subtracted with the noisy infrared image in the spatial domainThe noise image after being processed can obtain an estimation of a de-noised image
Figure BDA0002199695250000104
Figure BDA0002199695250000105
The modulation function is then minimized
Figure BDA0002199695250000106
The minimum of the variance over a given neighborhood (2a +1) (2b +1) of each point (x, y), namely:
wherein s and t are variables,and
Figure BDA0002199695250000113
respectively, as the average of their functions.
The second derivative is made to be zero, and then the modulation function can be obtained:
Figure BDA0002199695250000114
and finally, obtaining the denoised space domain infrared image through the step 4.
The data sets were divided into training sets, cross-validation sets, and test sets in 8:1:1, and each data set was expanded to 10 sizes by an image scaling method: {320, 352, 384, 416,448,480,512,544,576,608], i.e. in increments of size 32.
And initializing the weight parameters of the front 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method and training the weight parameters of the front 43 layers of convolutions of the YOLOv 3.
Adam is adopted in the optimization algorithm, the iteration number of the Adam is set to be 100, the number of samples of each iteration is 64, the momentum coefficient is 0.9, and the learning rate attenuation coefficient is 0.0005. After every 10 iterations, i.e. randomly selecting a new size for training (10 sizes in total), all iterations are completed, i.e. completing one training to obtain one model, and then modifying the parameters again to continuously output 10 models in total.
And verifying the 10 models on the cross verification set to respectively obtain loss function values of the 10 models, determining the model corresponding to the minimum value as an optimal model, calculating the loss function value of the model by using the verification set, modifying parameters again if the value fails to meet the expected requirement, training the model, and otherwise, directly outputting the model as a target detection model.
The method comprises the steps of extracting the obtained visible light and infrared light videos into images, respectively carrying out CLAHE finite contrast adaptive histogram equalization on the images, carrying out denoising processing on the images and a filtering method of an optimal notch filter based on a 3-order Butterworth function, then detecting the images by using a target detection model obtained through training, and outputting a pedestrian detection result in front of the mining locomotive in real time.
In addition, the present invention also provides an electronic device including:
a memory for storing a computer program;
a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing a mining locomotive pedestrian detection method based on multi-information fusion, at least comprising the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting an optimal target detection model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
Alternatively, the electronic device may be a server or a personal computer, or the like.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to the above specific embodiments, it is to be understood that the invention is not limited to the specific embodiments disclosed, nor is the division of the aspects, which is for convenience only as the features in these aspects cannot be combined to advantage. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A mining locomotive pedestrian detection method based on multi-information fusion is characterized by comprising the following steps:
step 1, acquiring visible light and infrared light videos of pedestrians in front of a mining locomotive, extracting the videos into images, respectively preprocessing the images by using a CLAHE finite contrast adaptive histogram equalization and an optimal notch denoising method, then labeling the images by using LabelImg software, and expanding a data set by using an image enhancement method;
step 2, dividing the data set into a training set, a cross validation set and a test set according to the ratio of 8:1:1, wherein the training set is used for model training, the cross validation set is used for measuring the performance of the model so as to select optimal parameters, and the test set is used for final evaluation of the model; each data set is expanded into a plurality of scales through an image scaling method and is used for subsequent multi-scale training;
step 3, improving the YOLOv3 target detection network by adopting dense connection and a multi-scale pooling structure, and optimizing a loss function of the YOLOv3 target detection network;
step 4, initializing the weight parameters of the first 43 layers of convolutions of the improved YOLOv3 target detection network by adopting a transfer learning method, wherein the weight parameters of the first 43 layers of convolutions of the YOLOv3 target detection network are trained;
step 5, adjusting training parameters, and training the improved Yolov3 target detection network by using a training set;
step 6, selecting a model with the highest detection precision as an optimal model according to the detection result of the cross validation set, and then evaluating the performance of the model by using the test set;
step 7, analyzing the evaluation result, if the performance does not meet the expected requirement, executing the step 5 again, otherwise, directly outputting the trained target detection model;
and 8, detecting the re-acquired visible light and infrared light videos by using the trained target detection model, and outputting a pedestrian detection result in front of the mining locomotive in real time.
2. The method of claim 1, wherein the YOLOv3 target detection network is improved by using dense connection, and comprises performing jump connection on a 52 x 256 feature map in the network, so that the adjusted feature map is overlapped with the two subsequent feature maps 26 x 512 and 13 x 512; the 26 × 26 × 512 feature map is then superimposed with the two subsequent feature maps 13 × 13 × 512 and 26 × 26 × 256, whereas the 13 × 13 × 512 feature map is superimposed with only one subsequent feature map 26 × 26 × 256.
3. The method of claim 1, wherein the YOLOv3 target detection network is improved by using a multi-scale pooling structure, which comprises extracting 4 feature maps with different sizes from 13 x 512, 26 x 256 and 52 x 128 feature maps in the network through 4 pooling layers with different scales, combining the context information of the global region and the sub-region, and then combining the 4 feature maps with the original features to form a final feature expression, thereby performing convolution output.
4. The method of claim 1, wherein optimizing the loss function comprises defining class losses using a cross-entropy loss function to make the model easier to fit, i.e., the modified loss function is as follows:
Figure FDA0002199695240000021
where S represents the network size and is 13 × 13, 26 × 26, or 52 × 52, B is the number of candidate frames, and the variable xiAnd yiAs coordinates of the center point of the candidate frame, wiAnd hiWidth and height of the bounding box, respectively, CiIs the confidence of the predicted object, p(ij)Is a class, variable, of an object
Figure FDA0002199695240000022
Is its predicted value;
Figure FDA0002199695240000023
representing the presence of an object in grid i;
indicating the presence of an object in bounding box j in mesh i;
Figure FDA0002199695240000025
indicating that no object is present in bounding box j in mesh i.
5. The method of claim 1, wherein optionally the expanding each data set to multiple scales by an image scaling method is scaling each image to 10 sizes using an image scaling method: {320, 352, 384, 416,448,480,512,544,576,608}.
6. The method of claim 1, wherein in step 1, for the visible light image, the CLAHE finite contrast adaptive histogram equalization method is adopted to process the visible light image; and for the infrared light image, denoising the infrared light image by adopting a filtering method of an optimal notch filter based on a 3-order Butterworth function.
7. The method of claim 6, wherein denoising the infrared image using a filtering method based on an optimal notch filter of a 3 rd order Butterworth function comprises:
firstly, Fourier transform is carried out on an infrared light image G (x, y) containing periodic noise to obtain a frequency spectrum image G (u, v) of the infrared light image G;
a5-pair 3-order Butterworth notch band-pass filter H (u, v) is placed at the position of a noise peak and used for extracting a main frequency part of noise, and the mathematical expression of the filter is as follows:
Figure FDA0002199695240000031
wherein, for a notch, its center point coordinate is (u)k,vk) Then a distance D from the center of the filterk(u, v) and the coordinates of the center point of the notch, which is symmetrical about the origin, are (-u)k,-vk) At a distance D from the center of the filter-k(u, v), W is the width of the band, D0Is the center radius of the frequency band, k is a natural number;
the spectral image of the extracted noise can be represented as:
N(u,v)=H(u,v)G(u,v) (3);
and (3) obtaining a corresponding spatial domain image n (x, y) by carrying out inverse Fourier transform on the frequency spectrum image of the noise:
Figure FDA0002199695240000032
weighting and adjusting the noise by using a w (x, y) modulation function, and then subtracting the modulated noise image from the infrared image containing the noise in the spatial domain to obtain an estimate of a denoised image
Figure FDA0002199695240000033
The modulation function is then minimized
Figure FDA0002199695240000035
The minimum of the variance over a given neighborhood (2a +1) (2b +1) of each point (x, y), namely:
wherein s and t are variables,and
Figure FDA0002199695240000038
respectively, the average of their functions;
the second derivative is made to be zero, and then the modulation function can be obtained:
Figure FDA0002199695240000039
and finally, obtaining the denoised space domain infrared image through the step 4.
8. The method of claim 1, wherein in step 5, the improved YOLOv3 target detection network is trained by setting different learning rates, weighting attenuation coefficients and momentum coefficients, and then 10 different models are generated simultaneously.
9. The method of claim 8, wherein in step 6, the 10 models are validated on the cross-validation set to obtain their respective loss function values, and the model with the minimum value is identified as the optimal model.
10. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing a computer program stored in the memory, and when executed, implementing the method of any of claims 1-9.
CN201910860797.5A 2019-09-11 2019-09-11 Mining locomotive pedestrian detection method based on multi-information fusion Active CN110795991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910860797.5A CN110795991B (en) 2019-09-11 2019-09-11 Mining locomotive pedestrian detection method based on multi-information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860797.5A CN110795991B (en) 2019-09-11 2019-09-11 Mining locomotive pedestrian detection method based on multi-information fusion

Publications (2)

Publication Number Publication Date
CN110795991A true CN110795991A (en) 2020-02-14
CN110795991B CN110795991B (en) 2023-03-31

Family

ID=69427241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860797.5A Active CN110795991B (en) 2019-09-11 2019-09-11 Mining locomotive pedestrian detection method based on multi-information fusion

Country Status (1)

Country Link
CN (1) CN110795991B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382683A (en) * 2020-03-02 2020-07-07 东南大学 Target detection method based on feature fusion of color camera and infrared thermal imager
CN111553289A (en) * 2020-04-29 2020-08-18 中国科学院空天信息创新研究院 Remote sensing image cloud detection method and system
CN111832489A (en) * 2020-07-15 2020-10-27 中国电子科技集团公司第三十八研究所 Subway crowd density estimation method and system based on target detection
CN111898427A (en) * 2020-06-22 2020-11-06 西北工业大学 Multispectral pedestrian detection method based on feature fusion deep neural network
CN111950475A (en) * 2020-08-15 2020-11-17 哈尔滨理工大学 Yalhe histogram enhancement type target recognition algorithm based on yoloV3
CN111986240A (en) * 2020-09-01 2020-11-24 交通运输部水运科学研究所 Drowning person detection method and system based on visible light and thermal imaging data fusion
CN112070111A (en) * 2020-07-28 2020-12-11 浙江大学 Multi-target detection method and system adaptive to multiband images
CN112183265A (en) * 2020-09-17 2021-01-05 国家电网有限公司 Electric power construction video monitoring and alarming method and system based on image recognition
CN112287839A (en) * 2020-10-29 2021-01-29 广西科技大学 SSD infrared image pedestrian detection method based on transfer learning
CN112418358A (en) * 2021-01-14 2021-02-26 苏州博宇鑫交通科技有限公司 Vehicle multi-attribute classification method for strengthening deep fusion network
CN112528934A (en) * 2020-12-22 2021-03-19 燕山大学 Improved YOLOv3 traffic sign detection method based on multi-scale feature layer
CN112989924A (en) * 2021-01-26 2021-06-18 深圳市优必选科技股份有限公司 Target detection method, target detection device and terminal equipment
CN114529879A (en) * 2022-02-08 2022-05-24 安徽理工大学 Real-time detection method for road conditions of electric locomotive in mine based on YOLOv4-Tiny
CN115311241A (en) * 2022-08-16 2022-11-08 天地(常州)自动化股份有限公司 Coal mine down-hole person detection method based on image fusion and feature enhancement
CN116664604A (en) * 2023-07-31 2023-08-29 苏州浪潮智能科技有限公司 Image processing method and device, storage medium and electronic equipment
CN117315453A (en) * 2023-11-21 2023-12-29 南开大学 Underwater small target detection method based on underwater sonar image
CN117830141A (en) * 2024-03-04 2024-04-05 奥谱天成(成都)信息科技有限公司 Method, medium, equipment and device for removing vertical stripe noise of infrared image

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815886A (en) * 2019-01-21 2019-05-28 南京邮电大学 A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3
CN109919058A (en) * 2019-02-26 2019-06-21 武汉大学 A kind of multisource video image highest priority rapid detection method based on Yolo V3
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
US20190265714A1 (en) * 2018-02-26 2019-08-29 Fedex Corporate Services, Inc. Systems and methods for enhanced collision avoidance on logistics ground support equipment using multi-sensor detection fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190265714A1 (en) * 2018-02-26 2019-08-29 Fedex Corporate Services, Inc. Systems and methods for enhanced collision avoidance on logistics ground support equipment using multi-sensor detection fusion
CN109815886A (en) * 2019-01-21 2019-05-28 南京邮电大学 A kind of pedestrian and vehicle checking method and system based on improvement YOLOv3
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN109919058A (en) * 2019-02-26 2019-06-21 武汉大学 A kind of multisource video image highest priority rapid detection method based on Yolo V3

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王殿伟等: "改进的YOLOv3红外视频图像行人检测算法", 《西安邮电大学学报》 *
谭康霞等: "基于YOLO模型的红外图像行人检测方法", 《激光与红外》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382683A (en) * 2020-03-02 2020-07-07 东南大学 Target detection method based on feature fusion of color camera and infrared thermal imager
CN111553289A (en) * 2020-04-29 2020-08-18 中国科学院空天信息创新研究院 Remote sensing image cloud detection method and system
CN111898427A (en) * 2020-06-22 2020-11-06 西北工业大学 Multispectral pedestrian detection method based on feature fusion deep neural network
CN111832489A (en) * 2020-07-15 2020-10-27 中国电子科技集团公司第三十八研究所 Subway crowd density estimation method and system based on target detection
CN112070111A (en) * 2020-07-28 2020-12-11 浙江大学 Multi-target detection method and system adaptive to multiband images
CN112070111B (en) * 2020-07-28 2023-11-28 浙江大学 Multi-target detection method and system adapting to multi-band image
CN111950475A (en) * 2020-08-15 2020-11-17 哈尔滨理工大学 Yalhe histogram enhancement type target recognition algorithm based on yoloV3
CN111986240A (en) * 2020-09-01 2020-11-24 交通运输部水运科学研究所 Drowning person detection method and system based on visible light and thermal imaging data fusion
CN112183265A (en) * 2020-09-17 2021-01-05 国家电网有限公司 Electric power construction video monitoring and alarming method and system based on image recognition
CN112287839A (en) * 2020-10-29 2021-01-29 广西科技大学 SSD infrared image pedestrian detection method based on transfer learning
CN112528934A (en) * 2020-12-22 2021-03-19 燕山大学 Improved YOLOv3 traffic sign detection method based on multi-scale feature layer
CN112418358A (en) * 2021-01-14 2021-02-26 苏州博宇鑫交通科技有限公司 Vehicle multi-attribute classification method for strengthening deep fusion network
CN112989924A (en) * 2021-01-26 2021-06-18 深圳市优必选科技股份有限公司 Target detection method, target detection device and terminal equipment
CN112989924B (en) * 2021-01-26 2024-05-24 深圳市优必选科技股份有限公司 Target detection method, target detection device and terminal equipment
CN114529879A (en) * 2022-02-08 2022-05-24 安徽理工大学 Real-time detection method for road conditions of electric locomotive in mine based on YOLOv4-Tiny
CN114529879B (en) * 2022-02-08 2024-07-16 安徽理工大学 Mine electric locomotive road condition real-time detection method based on YOLOv-Tiny
CN115311241B (en) * 2022-08-16 2024-04-23 天地(常州)自动化股份有限公司 Underground coal mine pedestrian detection method based on image fusion and feature enhancement
CN115311241A (en) * 2022-08-16 2022-11-08 天地(常州)自动化股份有限公司 Coal mine down-hole person detection method based on image fusion and feature enhancement
CN116664604B (en) * 2023-07-31 2023-11-03 苏州浪潮智能科技有限公司 Image processing method and device, storage medium and electronic equipment
CN116664604A (en) * 2023-07-31 2023-08-29 苏州浪潮智能科技有限公司 Image processing method and device, storage medium and electronic equipment
CN117315453A (en) * 2023-11-21 2023-12-29 南开大学 Underwater small target detection method based on underwater sonar image
CN117315453B (en) * 2023-11-21 2024-02-20 南开大学 Underwater small target detection method based on underwater sonar image
CN117830141A (en) * 2024-03-04 2024-04-05 奥谱天成(成都)信息科技有限公司 Method, medium, equipment and device for removing vertical stripe noise of infrared image
CN117830141B (en) * 2024-03-04 2024-05-03 奥谱天成(成都)信息科技有限公司 Method, medium, equipment and device for removing vertical stripe noise of infrared image

Also Published As

Publication number Publication date
CN110795991B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN110795991B (en) Mining locomotive pedestrian detection method based on multi-information fusion
CN110956126B (en) Small target detection method combined with super-resolution reconstruction
CN103870818B (en) Smog detection method and device
CN113554089A (en) Image classification countermeasure sample defense method and system and data processing terminal
KR101533925B1 (en) Method and apparatus for small target detection in IR image
CN103226832B (en) Based on the multi-spectrum remote sensing image change detecting method of spectral reflectivity mutation analysis
CN106251344A (en) A kind of multiple dimensioned infrared target self-adapting detecting method of view-based access control model receptive field
JP7327077B2 (en) Road obstacle detection device, road obstacle detection method, and road obstacle detection program
CN112862845A (en) Lane line reconstruction method and device based on confidence evaluation
CN112307984B (en) Safety helmet detection method and device based on neural network
CN109117746A (en) Hand detection method and machine readable storage medium
CN105590301A (en) Impulse noise elimination method of self-adaption normal-inclined double cross window mean filtering
KR101993085B1 (en) Semantic image segmentation method based on deep learing
EP3671635B1 (en) Curvilinear object segmentation with noise priors
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
CN110956602B (en) Method and device for determining change area and storage medium
CN117333459A (en) Image tampering detection method and system based on double-order attention and edge supervision
CN115170824A (en) Change detection method for enhancing Siamese network based on space self-adaption and characteristics
CN115358952A (en) Image enhancement method, system, equipment and storage medium based on meta-learning
Wu et al. Pyramid edge detection based on stack filter
CN101739670A (en) Non-local mean space domain time varying image filtering method
CN104616034B (en) A kind of smog detection method
CN109785312B (en) Image blur detection method and system and electronic equipment
CN115984712A (en) Multi-scale feature-based remote sensing image small target detection method and system
CN115829875A (en) Anti-patch generation method and device for non-shielding physical attack

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant